To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.off-topic.geekOpen lugnet.off-topic.geek in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 Off-Topic / Geek / 2075
2074  |  2076
Subject: 
Re: CSV delimiters
Newsgroups: 
lugnet.admin.database, lugnet.off-topic.geek, lugnet.publish
Date: 
Sat, 23 Sep 2000 08:59:22 GMT
Viewed: 
13 times
  
In lugnet.admin.database, Matthew Miller writes:
There exists (in CPAN) a Text-CSV perl module. From its documentation:
[...]

Aha!  Excellent.  Those docs sound well thought out.

Hmm, that reminds me...  Duh, this is prolly a common Perl thing.  I bet
Freidl[1] as some stuff on CSV?  Lessee...  <dig dig dig>  Ahh, here we go:
pp. 204-208, 227, 231, 290.  And (haha!) it even handles the special case of
adding a final empty field if the line ends in a trailing comma.  :)

OK, so now I have a good definition, thanks to Matt, and some trustable code,
thanks to Freidl[1].  Now I wonder how (or if!) you can reliably and exactly
detect whether a given CSV input stream uses \" or "" escapement of " -- I saw
some Perl examples earlier which output \" instead of "" .

Are there any ambiguous input lines?  This evil case comes close, but not
quite...

   "foo",98.6,"bletch \"",3.14,"",

...as it's well-formed for \" but not for "", and this one...

   "foo",98.6,"bletch \"",3.14,"","

...is well-formed for "" but not for \" .

(Not that either of those are actually very likely to occur, but feh anyway.)

If there aren't any ambiguous cases (meaning well-formed for both \" and ""
and yielding different parsing), then a smart parser may need to try one and
see if it fails (if it's not well-formed) and then try the other and see.

--Todd

[1] O'Reilly: http://www.oreilly.com/catalog/regex/ [2]
    Amazon: http://www.amazon.com/exec/obidos/ASIN/1565922573

[2] I love that URL :-)



Message is in Reply To:
  Re: CSV delimiters
 
There exists (in CPAN) a Text-CSV perl module. From its documentation: This module is based upon a working definition of CSV format which may not be the most general. 1 Allowable characters within a CSV field include 0x09 (tab) and the inclusive (...) (24 years ago, 23-Sep-00, to lugnet.admin.database, lugnet.off-topic.geek, lugnet.publish)

12 Messages in This Thread:




Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    
Active threads in Database

 
LUGNET Guide updates (Wed 27 Nov 2024)
16 hours ago
Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR