Copyright © 2004, 2006, 2008 Gene Michael Stover. All rights reserved. Permission to copy, store, & view this document unmodified & in its entirety is granted.
fixme: Does not parse yyyymmddTHHMMSS Z god damn it all. Needs an overhaul. - gene 2006 April 8
fixme: Should also recognize partial dates, such as just the year or the year & the month without the day. - gene 2006 April
I needed a function to parse strings containing dates & return a Common Lisp UNIVERSAL TIME . While I was at it, I wrote a function to convert universal times to strings.
If you have a date-&-time string, such as ``Saturday, July 10, 2004, 6:45 PM PDT'', you can call my PARSE-TIME function to obtain the equivalent universal time. Here's an example:
lisp> (parse-time "July 10, 2004, 6:45 PM") 3298499100 lisp> (decode-universal-time *) 0 ; second 45 ; minute 18 ; hour 10 ; day 7 ; month 2004 ; year 5 T ; Daylight Stupid Time ??? 8 ; time zone lisp>
There is also a FORMAT-TIME function which converts universal times to text. It is the reverse of PARSE-TIME, & it's analogous to the Common Lisp function FORMAT . Here is an example of FORMAT-TIME in use:
lisp> (format-time nil "%H:%M on %A, %d %B" (get-universal-time)) "20:55 on Tuesday, 08 June" lisp>
%z (lower-case z) field.
The source code I describe in this article is released in accordance with the GNU Lesser General Public License.
This article is not covered by the Gnu Lesser General Public License. It is covered by its own copyright notice which is at the beginning of the article.
The source code I describe here is in these files1
Each of these source files is licensed according to the terms of the GNU Lesser General Public License.
To quickly test everything on your Lisp, download all the files, ``cd'' to that directory, run Lisp, & do this:
lisp> (load "loadall.lisp") ;; see commands about loading some files lisp> (check) ;; see list of test program names T
If CHECK returns NIL or dumps you into the debugger, my time formatter or parser doesn't work on your Lisp. Or maybe I broke it when making improvements. Let me know, & I'll see what I can do.
To see a table of some time strings in various formats & their parsed results (converted back to strings to you can read them), do this:
lisp> (demo)
Table 5 shows the output from DEMO.
The time.lisp file defines a package called CYBERTIGGYR-TIME & exports two functions & a bunch of data from it. The two functions are FORMAT-TIME & PARSE-TIME.
Use the FORMAT-TIME function to convert universal times to strings. There are also some global variables which might be useful when formatting times; they are described in Section 7.2.
To parse times, you just need the PARSE-TIME function. You might IMPORT it or call it by its full name: CYBERTIGGYR-TIME:PARSE-TIME.
fixme this entire section.
what it says now:
lisp> (asdf:operate 'asdf:load-op "cybertiggyr-time")
what it used to say:
When loading time.lisp, I recommend the symbolic pathname string ``CL-LIBRARY:COM;CYBERTIGGYR;TIME;TIME.LISP''. So you'd load it like this:
lisp> (load "CL-LIBRARY:COM;CYBERTIGGYR;TIME;TIME.LISP")
Or if you, like me, prefer explicitly translated logical pathnames, you'd do this:
lisp> (load
(translate-logical-pathname
"CL-LIBRARY:COM;CYBERTIGGYR;TIME;TIME.LISP"))
FORMAT-TIME is analogous to the Common Lisp function FORMAT . It's declaration looks like this:
;; in-package "CYBERTIGGYR-TIME" (defun format-time (strm fmt &optional (ut (get-universal-time)) (zone *format-time-default-zone*) (language *format-time-default-language*)) ...)
As with the FORMAT function, STRM identifies the destination of the function's output. It may be:
The FMT argument of FORMAT-TIME will most often be a string, but it may also be a list. It contains format fields & literals. It is described in detail in Section 7.1.
The UT argument should be a Lisp Universal Time or absent. If it is absent, it defaults to the current time.
The ZONE argument defaults to the local time zone. In the future, it will allow you to specify the time zone for the output string, but for now, it's ignored.
The LANGUAGE argument specifies the natural language in which some output fields are converted. For example, if the language is English, the day of week is Monday, & the FMT argument contains the string ``%A'', the output will contain ``Monday'', but if the language is French, the output will contain ``Lundi''. You can add your own languages, & I'll give some examples later.
For convenience & standardization, I've created some format string constants which can be used for the FMT argument of FORMAT-TIME.
FORMAT-TIME uses a format string argument to tell the form of the output string. A format string contains format fields & literals.
A format field consists of a percent character
(%) followed by
another character. The second character identifies
the type of the format field. It is case-sensitive,
so ``%A'' is distinct from ``%a''.
Literals are anything other than format fields. They are case-sensitive.
Table 2 gives brief descriptions of the format fields for both FORMAT-TIME & PARSE-TIME. The format fields resemble those from the Standard C function strftime.2
|
The format strings for FORMAT-TIME
resemble those for the Standard C function strftime.
I haven't implemented all the format fields from strftime. Some of the missing ones are %j (Julian
day) & %c. A Julian format field might be useful at
times, but I think a %c is unnecessary.
Instead of %c, use one of the !!!.
FORMAT-TIME has a strict interpretation of the format fields. If the second column Table 2 says that a field is two digits, then FORMAT-TIME will produce exactly two digits, with a leading zero if necessary. Also, literals (the parts of the format string which are not format fields) are literal.
For convenience & standardization3, I put some useful format strings in global variables. Table 3 shows those standardized format symbols & their values. All those symbols are within package CYBERTIGGYR-TIME & exported by it.
|
My personal favorite is *FORMAT-TIME-ISO8601-LONG*. It is an international standard & unambiguous. If you sort the resulting strings, you get a chronological list. The format can be read by humans without pain, though it's not very pleasing to the eye. My main complaint with it is that it numbers the month; I'd prefer an abbreviated month name.
If you need a format that is more pleasing to the human eye, or if you just don't like the *format-time-iso8601-long*, I recommend the *format-time-full* format.
The *FORMAT-TIME-CEE* format is similar to the
%c
format of Standard C's strftime function. It
is not identical, though. The main difference is
that time zones are numbers, not symbolic as with
strftime's %c. Also,
*FORMAT-TIME-CEE* uses strictly two-digit
numbers for the day of month, whereas %c
might use single-digit numbers where possible or
might pad the day-of-month with a space on the
left when appropriate. *FORMAT-TIME-CEE*
is pretty close to strftime's %c, though.
The Lisp Universal Time for Wednesday, 2036 May 8, 23:28:16 daylight savings time, US Pacific zone is 4302916096. When daylight savings time is in effect, US Pacific time is seven hours behind Greenwich Mean time. Table 4 shows examples of the output of evaluating (FORMAT-TIME NIL FMT 4302916096) for various values of FMT when English is the default language.
|
A spcial example from Table 4 is the penultimate one, which uses a list of strings as the FMT argument. When the FMT argument is a single string, FORMAT-TIME parses it into a list of tokens like in that one example, then it processes each token. The tokens may be strings (as shown), symbols, or whatever. So if you can stomache the horrible syntax, you get more flexibility & run-time efficiency by using lists instead of strings.
As of Sunday, 26 September 2004, this section might be inaccurate. I have recently updated the documentation, & I have not double-checked this section. So beware.
FORMAT-TIME converts the format string to a list of format fields
& literal fields, though when it splits the format string into those
fields, FORMAT-TIME doesn't make note of which fields are literal
& which are format fields. Then it iterates over the fields. It tries
to find each field in a table called *FORMAT-TIME-FNS*, which
is a hash table whose keys are format fields such as ``%Y''.
To add new format fields, create a function to do the work for that field. Choose an unused format field string. Then insert the new field string & the new function into the *FORMAT-TIME-FNS* table. (Optional: If your new format field is probably of use to others, send it to me & I'll include it in the next release of this library.)
Let's do an example.
Let's say you want to implement the two-digit year format
field for FORMAT-TIME. The format field string will
be %y.
Now we need to write the function which actually does the
formatting for %y.
When FORMAT-TIME calls a function for a field, it provides three arguments. They are:
Because the BROKEN-TIME contains the time's components ``broken'' out of the universal time, the job of the new function is simple. We just need to create a string from the last two digits of the year. Here's how we could do that:
(defun format-time-evil-year (bro language strm) (declare (ignore language)) (format strm "~2,'0D" (mod (broken-time-yy bro) 100)) ;; The return value doesn't matter. In such cases, ;; I like to return the function's name. It's okay ;; if you want to return something else. 'format-time-evil-year)
The final step is to insert the new function into the *FORMAT-TIME-FNS* table. Here's how:
(setf (gethash "%y" *format-time-fns*)
#'format-time-evil-year)
You will also want to write one or more test programs for your new format field. Add them as functions to test.lisp.
You can add the names of months & weekdays from other languages, too. To do that, add them to the *FORMAT-TIME-MONTHS* or *FORMAT-TIME-WEEKDAYS* tables, whichever table is appropriate.
Each key in those tables is a list of two elements. The first element is the number of the item, such as 1 for January or 6 for Sunday. The second element of a key is the natural language. I recommend symbols from the keywords package, such as :ENGLISH, :JIVE, or :WELSH.
Each value is a list of two strings. The first item in a value list is the full name of the month or weekday. The second is the abbreviated name.
If you add any other languages & wanted to send them to me so I could include them in the next release, that'd be peachy.
Function PARSE-TIME scans a date-&-time encoded in a human-readable string to determine the equivalent UNIVERSAL TIME . PARSE-TIME returns the universal time when it can determine it; otherwise, PARSE-TIME returns NIL.
PARSE-TIME understands a variety of formats, & it can fill-in items that are missing. It also understands some convenience words (today & now). Here are some examples:
lisp> (parse-time "2004-apr-4 19:11") 3291329460 lisp> (parse-time "4 april 2004") 3291260400 lisp> (parse-time "today") 3295684800 lisp> (parse-time "now") 3295742590
PARSE-TIME is declared like this:
(defun parse-time (str &optional
(recognizers *default-recognizers*))
...)
The str argument is a string. It contains the date-&-time to parse.
The optional recognizers argument is a list of function which know how to parse specific formats. If you don't supply a list of recognizers, you get the default list. I suspect that the default list of recognizers will be most common under normal use.
If PARSE-TIME can recognize the date-&-time in the string, it converts them into a UNIVERSAL TIME & returns it. If PARSE-TIME can't recognize the string, it returns NIL.
Table 5 shows examples of some strings that were fed to PARSE-TIME. The first column shows the strings that were given to PARSE-TIME. The second column shows the output of using FORMAT-TIME to convert the resulting universal time back to a readable string.
|
I produced Table 5 by running the DEMO function from demo.lisp. The time zone on my computer when I ran it was U.S. Pacific Daylight Stupid time, which is 7 hours west of Greenwich Mean Time (GMT). It's also 7 hours behind GMT.
Notice the second & third rows in Table 5. PARSE-TIME interprets a string of ``now'' as the current time.
PARSE-TIME interprets a string of ``today'' as noon in the GMT time zone on the current day. So if you feed ``today'' to PARSE-TIME on 2004 July 4, 13:00 in London, you'll get a universal time corresponding to 2004 July 4, 12:00. If someone in Seattle feeds ``today'' to PARSE-TIME at exactly the same time, they'll get the same universal time, which will be 2004 July 4, 5:00 PDT.
So why does ``today'' inject noon in the GMT instead of using the current time as ``now'' does? I wanted ``today'' to evaluate to the same universal time regardless of the time zone in which it was parsed. So parsing ``today'' on a particular day gets you the same universal time anywhere on the world. That's why it uses a particular time of day instead of the current time of day. For that particular time of day, I choose noon in London because it is unlikely (impossible?) to find a time zone that is more than 11 hours distant from London. So when it is noon in London, all other time zones will show different times of the day, but they will all show the same day.
I wrote a function similar to PARSE-TIME for C years ago. I used yacc for part of the work. If I remember correctly, I never did work out a single grammar that could describe all these formats. Maybe I didn't try hard enough; nevertheless, I don't want to do it that way for Lisp.
My PARSE-TIME function uses a list of recognizers.4 Each recognizer5 is a function that can extract the date & time from one or more formats, but it never incorrectly extracts information from another format. It returns a UNIVERSAL TIME or a BROKEN-TIME structure.
It's important that a recognizer does not accidentally or incorrectly extract & return information from a format that it doesn't understand. As long as all recognizers know when they don't understand a format, the order in which PARSE-TIME tries the recognizers doesn't matter & maintenance will be easier.
For example, a recognizer for the yyyy-mm-dd format, which contains the year, the month, the day, but no hours, minutes, seconds, or time zone, should extract & return information from ``2004-02-01'', but it must be sure to return NIL for ``2004-02-01 9:21''.
PARSE-TIME tries each recognizer in its list of recognizes until one of them understands the parse string.
On the one hand, it's pretty grotty6 It's brute-force code, & I did a lot of it by copy-&-paste programming.7
On the other hand, the implementation is extensible. If you want PARSE-TIME to understand a new format, write a recognizer for it & push it onto *DEFAULT-RECOGNIZERS*. If you can write a more elegant recognizer, you can push it onto *DEFAULT-RECOGNIZERS*, too.
I originally wanted to use a more intelligent system, maybe one
that analyzed each term in the parse string & figured out what
each term was. For example, in the parse string were ``May 5 2004'',
the ``May'' is certainly a month, & the 2004 is probably a year,
so the ``5'' is probably a day. In ``5/5/4'', the slashes suggest
that the string is in Stupid American Date Format, so the first
5 is May, the second 5 is the day, & the 4 suggests a year
that is closest to the current year &
.
If this reasoning recognizer produced more than one
parse for the string, it could attach a confidence value to
each estimate, & PARSE-TIME could return the result
with the highest confidence value.
That would be cool, but it would take a lot of programming. It would take research. It also sounds like it would be an all-or-nothing type of algorithm. Until I got the entire thing working, none of it would work. The approach I've chosen allows recognizers to be added one at a time, so it was easy to get something working, & new recognizers can be created, debugged, & added in isolation.
Within the architecture I've chosen, It might be possible to use a pattern-matching library to write the recognizers. That might remove the need to create a named function for each recognizer. If you look at all the recognizers in time.lisp, you'll see that I have some long, hard-to-type names for some of those recognizers, & I suspect the situation will get worse as I add more recognizers. If I had a macro that compiled patterns to functions, I might be able to create anonymous recognizer functions like this:
;; Pattern-based recognizer that matches strings
;; of the form "Friday, 13 July 2029".
(push
(defrecognizer
((?isWeekday ?dow) "," (?isDay ?day) (?isMonth ?mon) (?isYear ?year)))
*default-recognizers*)
I don't suppose such pattern-based recognizers would be more efficient than the hand-crafted ones I've used, & PARSE-TIME would probably need as many of them as it does the hand-crafted ones, but these pattern-based ones might be easier to maintain because their definitions would be their documentation. They might be easier to debug, too, if the mythical DEFRECOGNIZER macro required only the one parameter I have shown here.
A suitable pattern-matching library might be the one described in the section ``A Pattern Matching Tool'' from Peter Norvig's Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp.
This is old stuff I wrote for this article. I might remove it entirely some day. 1 June 2004
ISO 8601:1988 is a standard for specifying dates, times, &
periods of time.
is an unofficial
but useful description of that standard.
PARSE-TIME should understand a restricted version of the ISO 8601 format. The full format is:
where
The hyphens & colons must all be present or must all be
missing. The
& its preceeding colon are
optional, but if the
are missing, their preceeding
colon must be missing, too.
If the
is missing, the hyphens & colons must be missing & the seconds
must be present.
If
is present, then
.
Otherwise,
.
If
is missing, PARSE-TIME assumes GMT.
I think those rules boil down to these cases:
Table 6 shows examples of this modified ISO 8601 format.
|
If the AMs, PMs, & daylight savings times in some of those examples are confusing, consider it a reminder of how stupid a 12-hour clock & daylight savings time are. I'm not sure time zones haven't outlived their utility, for that matter.
Verbose natural formats are those written in long-hand. Here are the rules:
Table 7 shows examples.
|
For the example marked ``see note A'', PARSE-TIME should assume the current year. It ignores the ``Friday'' because ``18 apr'' has enough information. So ``10 o'clock Friday 18 apr'', if parsed during the year 2004 in the timezone of the United States's Pacific Coast, is equivalent to 2004-Apr-18T17:00.
The terse formats often use numbers for months & only two digits for years. The month may be numbered, named in full, or abbreviated, but there will be two other numbers. One of them is the day; it must be one or two digits. The other is the year; it must be two or four digits.
I think these formats stink because people so often number the months & also use two-digit years. This creates ambiguity. When this occurs, the separators determine the order. There are always two separators. Both must be hyphens (-) or both must be slashes (/). Hyphens indicate the European order of day, month, year. Slashes indicate the stupid American format of month, day, year.
Times may be present in the same way as for the verbose natural formats.
Table 8 shows examples
|
I dislike numbered months & two-digit years because of the potential for misunderstanding. In addition, I dislike the American convention of month/day/year. That's like saying 123 is three hundred twelve.
Gene Michael Stover 2008-04-20