You want to work with comma separated value records, such as those exported from popular spreadsheet and database programs. (n.b. This recipe works for any similarly delimited format).
The naive implementation might be something like this:
(string-tokenize csv-string (char-set-complement (char-set #\,)))
The shortcomings show up pretty quickly. For starters, CSV format encloses fields in quotes, so fields can contain commas. On top of that, quoted fields can contain quotes, escaped with a backslash (#\\). Let's imagine a CSV format for books, where the format is author,title,ISBN,publisher:
> (define csv "David Halberstam, \"War in a Time of Peace: Bush, Clinton, and the Generals\", B0000C37EA, Scribner")
> (string-tokenize csv (char-set-complement (char-set #\,)))
("David Halberstam" " \"War in a Time of Peace: Bush" " Clinton" " and the Generals\"" " B0000C37EA" " Scribner")
Clearly, the easy solution won't work for the general case. Essentially, we need to do state machine processing for this. As we traverse the string, we'll encounter the following states:
in-field (when the current position is inside a field),
in-quote (when we're inside a quoted string),
delim (when we encounter a delimiter), and
escape-char (when we encounter a backslash).
A couple proposed solutions:
http://mail.gnu.org/archive/html/guile-user/2003-10/msg00001.html
--
GordonWeakliem - 20 Apr 2004
I just released a portable library of CSV utilities. Will write up
a cookbook recipe or two (maybe a separate recipe for "Converting CSV to XML") soon. In the meantime, please give it a try with your CSV files and let me know how well it works.
http://www.neilvandyke.org/csv-scm/
--
NeilVanDyke - 31 May 2004