Copyright © 2005 Gene Michael Stover. All rights reserved. Permission to copy, store, & view this document unmodified & in its entirety is granted.
An INI file is a text file containing
a sequence of sections. Each
section (with the possible exception of the first section)
begins with a header, which names the section. The label is
a string of characters delimited
between [brackets]. The first section need
not have a label; in this case, the parser will assume that
section's label is the empty string. All sections after the
first must have a label.
After the label, a section contains attribute-value pairs. An attribute is a string, a value is a string, & they are separated (or joined) by an equal sign (=).
Labels, attributes, & values can have almost any length except zero. They may contain any text character except those that form end-of-line sequence in the local computer's convention. In other words, on, say, Windows, labels may not contain a carriage return or a line feed. On unix1, the end-of-line sequence is just one character, the carriage return. On Macintosh, the end-of-line sequence is just one character, the line feed. To be portable back to Windows, both unix & Macintosh should exclude both carriage return & line feed from labels, attributes, & values.
INI files may have comments. A comment begins with a semicolon & extends to the end-of-line. The end-of-line is not part of the comment. Comments may start at any column in the line. Labels, attributes, & values may not include semicolons.
Here is a formal grammar for INI files. Because I did not use a parser generator for this project, I cannot be sure that this grammar is correct.
| (1) | |||
| (2) | |||
| (3) | |||
| (4) | |||
| (5) | |||
| (6) | |||
| (7) | |||
| (8) | |||
| (9) |
Let's look at some example INI files & the Lisp data I would like the parser to produce for each of them.
Figure 1 shows an example INI file. It contains just one section, & that section does not have a label.
The INI file in Figure 1 contains one section without a label. That section defines three attribute-value pairs. The first attribute is ``attr0'', & its value is ``val0''. The second attribute is ``attr1'', & its value is ``val1''. The third attribute is ``attr2'', & its value is ``val2 containing multiple words''.
If I fed Figure 1, I'd like it to return a Lisp datum as I've shown in Figure 2.
Right off the bat, I notice that, while the list-of-lists structure I've specified here might be great for generic manipulation, more basic use of it, in which a program just wants to see what value is specified for a particular program parameter, might be clumsy. We could write some functions that would make that use of INI files more convenient. So I guess we'll stick with this list-of-lists format for now.
Figure 3 is a more complex example. It contains two named sections & also some comments.
In Figure 3, the first three lines are comments; they will be ignored entirely. The blank lines will be ignored, too. Notice that the blank line between ``attr0.1'' & ``attr0.2'' does not change to a new section. We don't change to a new section until we find a section label.
Figure
shows the Lisp expression
that the INI file parser should return for
Figure 3.
Notice that the Lisp expression in Figure 4 does not contain the blank lines or the comments from Figure 3.
From the examples in Section 3, I'd say the basic parsing algorithm should be:
You know, if the lexer recognized the kind of line we have, the algorithm would be even simpler. Maybe the lexer could get the next line. It could automagically strip comments & skip over empty lines. So it would never return an empty line; it would silently gobble them up. It could return a single string for label lines, a dotted pair for attribute-value lines, & a special value for end-of-input.2 The lexer might need to return something else to indicate error.
With such a lexer function, the parser algorithm simplifies to this:
The top-level function will be LOAD-INI. Given the pathname for an INI file, it returns the parsed contents of that file. Here is the Lisp source code for LOAD-INI:
(defun load-ini (pathname)
(with-open-file (strm pathname)
(read-ini strm)))
LOAD-INI is super-simple because it passes the buck to READ-INI.
(defun read-ini (strm)
(let ((lst ((""))))
(do ((x (lex strm) (lex strm)))
((eq x strm))
(typecase x
(string
;; New section
(push (list x) lst))
(cons
;; Append this pair to the current
;; section.
(setf (first lst)
(append (first lst) (list x))))
(otherwise
;; Error
(cerror "~&~A" x))))
lst))
READ-INI needs one other function, LEX. LEX is the lexical analyzer. It returns a single string for labels, a dotted pair for attribute-value pairs, or the stream for end-of-input. It never returns comments or empty lines, & it has to recognize syntax errors.
(labels
((lex-end-of-input (line strm)
(and (eq line strm) strm))
(lex-label (line)
(and (>= (length line) 3)
(equal (char line 0) #\[)
(equal (char line (1- (length line))) #\])
(subseq line 1 (1- (length line)))))
(lex-attribute-value (line)
(and (find #\= line) ; contains =
(plusp (find #\= line)) ; the = is after first column
(cons
(subseq line 0 (1- (find #\= line)))
(subseq line (1+ (find #\= line)))))))
(defun lex (strm)
;; The NEXT-LINE function removes comments &
;; skips blank lines. So it returns a line
;; with something in it, or STRM on end-of-input.
(let ((x (next-line strm)))
(or (lex-end-of-input x strm)
(lex-label x)
(lex-attribute-value x)
'ini-lex-error))))
Function LEX uses a bunch of helpers, but I've defined most of them in a LABELS which wraps LEX. I hope they are self-explanatory. Each uses AND instead of IF to check conditions &, if they are all true, finally to return the desired value. Otherwise, they return NIL.
(defun next-line (strm)
;; Get a line from the stream...
(do ((x (strip (read-line strm nil strm))
(strip (read-line strm nil strm))))
;; ...until we find a non-empty string, or
;; a non-string.
((and (stringp x) (equal x ""))
x)))
NEXT-LINE uses a function called STRIP to remove comments & excess spaces. If STRIP's argument is not a string, it doesn't puke; it just returns that argument unmodified.
(defun strip (x)
(if (stringp x)
(string-trim
" "
(if (find #\; x)
;; There is a comment character, so
;; remove it & everything after it.
(subseq x 0 (1- (find #\; x)))
;; There's no comment, so don't
;; truncate the string.
x))
x))
I decided to hand-code the INI file parser, as you've seen, because it's a simple job, & a full-blown compiler compiler would be over-kill. Curiosity might force me to implement an INI file parser with a compiler compiler some other time.
Gene Michael Stover 2008-04-20