I know I rarely talk about work on this blog, but today’s issue was a doozy and I want to be sure the information is out there someplace that can be found with a Google search.
Today was February 3rd, and one of our services started blowing up with errors involved in date parsing. The obvious answers were immediately discarded: there had been no change over the weekend, no change to this service in weeks in fact, and yes, it HAD been working previously. It wasn’t Jan 1, or Feb 29, or any of the usual suspects for date mixups. Both Feb 3 and March 2 exist, so it’s not parsing the format wrong for some international format. What the heck was going on?
We were able to trace the error back as far as the ColdFusion built-in function ParseDateTime, but no further. It’s part of the language, so it’s not like we could pop it open and see under the hood. The line was simple:
dateEffective = ParseDateTime(replace(xmlArticle.startDate.xmlText,”T”,””));
ParseDateTime is supposed to be a handy utility function to make a ColdFusion date object out of any conceivable formatted date string. Unfortunately, it can’t handle the XML date-time format (2014-02-03T09:00:00) natively, so we helped it out by removing the T to make the format something it could parse.
The more astute among you will have seen our mistake already, but I’ll explain it anyway.
I don’t know what ParseDateTime looks like under the covers, but after today’s debugging session, I feel qualified to give a fairly good rough guess. I’m going to say step one of their algorithm involves a regex. Something simple, similar to (but not exactly like*) the following:
(19|20)?\d\d[- /.]0?([1-9]|1[012])[- /.]0?([1-9]|[12][0-9]|3[01])[- /.]?((0?[0-9]|1[0-2]):?){0,3}
You know, your basic date parsing. This will only rule out absurd dates; not all date-like strings are valid dates, but it will rule out things like 2013-13-40 while still allowing invalid dates like 2013-02-31. That’s where phase two comes in: date validation. A series of if ladders, something like
if month == 2 && day > 29 return invalid date
if month == 2 && day == 29 && !isLeapYear(year) return invalid date
and so on and so forth. Standard stuff, really. All textbook, nothing to worry about here, no rigorous testing needed.
Until we passed it this date: 2014-02-0308:50:46.
Now the clever among you have definitely figured out what’s wrong, but it took four of us over an hour to figure out what had happened, and I only eventually guessed because I suspected they were using a regex and so went through common regex debugging questions. You see, regexes like to be greedy. They like to gobble up as many characters for a single piece as possible. So instead of breaking that date into February 03, 2014, at 8:50:46 in the morning, it broke it into Feb 030, 2014, at 8:50:46 in the morning. That is to say, when it read the string “2014-02-0308:50:46″, instead of treating that bolded 0 as the leading 0 for the hour 8, it read it as the final 0 in the date 30, which also had a leading 0 for no good reason. The regex matched on Feb 30, which the resulting if-ladder determined was an invalid date.
I was able to verify that the timestamp “2014-04-0310:50:46” also came back with “invalid date”, so it’s not just Feb 3, but it is every single hour on Feb 3 as opposed to two narrow windows per month (between 10am and 1pm and between 10pm and midnight on the 3rd of any month with only 30 days in it), so it was noticed today and only today.
The moral of this story is twofold: firstly, never trust your regexes to parse things without rigorous testing of edge cases, and secondly, always put a space between your date and time before passing a string to ColdFusion’s ParseDateTime.
The bolded sentence is being added to our Standards and Guidelines.
*I know for a fact my quick regex is not identical because the real one can handle things like MM-DD-YYYY instead of YYYY-MM-DD, but at this point I’m not certain if it’s one monster regex or a series of “valid” date patterns that it iterates over until it gets a hit, so I threw something together for the sake of example and moved on.