Subject:
|
Re: how does a line ends?
|
Newsgroups:
|
lugnet.cad
|
Date:
|
Sat, 7 Apr 2007 18:00:03 GMT
|
Viewed:
|
1724 times
|
| |
| |
In lugnet.cad, Chris Phillips wrote:
|
Ive been writing text parsing programs for over 20 years, and have found
that the approach Ive suggested works very well at detecting line ends in a
consistent manner. Counting lines in a file by this method agrees with every
compiler and editor that Ive used, where other methods do not. I have seen
files that are terminated with CR, with LF, with CR+LF, and with LF+CR.
|
Youll note that I didnt really say that there was anything wrong with your
parsing routine (other than some personaly negative feelings about fgetc and
ungetc). After fixing the bugs, it will do exactly what you say it will do.
Its just that since I havent ever run into CR or LF+CR line endings in any
LDraw file in close to seven years, I dont personally feel that it is
necessary.
|
It is very easy for a programmer (regardless of the OS they are using) to
create programs that do not use proper line breaks. If you doubt it,
answer this quiz without using any reference materials:
1. Which of the following is "correct"?
a. printf("Hello, world!\r");
b. printf("Hello, world!\n");
c. printf("Hello, world!\r\n");
d. printf("Hello, world!\n\r");
2. In each of the above cases, what byte sequence will appear in the output stream?
|
For number 1, the correct way in C is always to use b above, and let the
standard C libraries take care of the rest for you. If you use fprintf instead
of printf, you do need to be aware of the concept of binary mode when opening
the file, and deal accordingly. As long as you dont have the file in binary
mode, libc will do the right thing. If the file is open in binary mode
without an understanding of the consequences, then its the fault of the
developer.
The answer to number 2 is that it depends on libc, and that is how it should be.
The correct byte sequence for the environment will show up in the file, as
long as the file isnt in binary mode. (I realize that this doesnt apply to
the question at hand; more below.)
|
In truth, there is no absolute answer for either question. Depending on the
library implementation and the OS, you may get different results.
|
This is true, which is where this whole issue comes from in the first place.
Im not saying theres anything wrong with your line parser. Im just saying
that I dont personally feel that CR and LF+CR line endings need to be supported
in LDraw files. This is my opinion, which you dont share. Theres nothing
wrong with that.
|
You can pick apart my code until the cows come home, but the underlying
heuristic works. Either CR or LF indicate end of line, and if the other
character immediately follows, clump them together as a single line break.
|
I apologize. I wasnt trying to pick apart your algorithm. Theres nothing
wrong with it; my main argument was that I felt that fgets was acceptable
instead for LDraw files.
|
How big is a typical CAD file? How often do you need to load one from disk?
Is the microscopic performance difference even noticeable? Maybe back in the
|
Actually, quite a bit of file I/O goes into reading an LDraw file, due to the
way the parts are formatted. This can be observed by loading a medium size file
after a fresh reboot (timing the load), and then repeating the process after the
files end up in the cache. The second load will be a little faster. On my
computer, LDView takes 4-5 seconds to do the file reading for the 8464.mpd that
comes with LDView the first time you load it, and 1-2 seconds the second time.
(LDView says Loading... in the status bar during the file reading, then switches
to Parsing... after that stage is complete.)
|
days of 5 meg hard drives this made a difference, but on todays hardware
this is not even an issue. It takes the user 100 times longer to browse to
the file than it takes the computer to read it into memory and parse the
contents. (Ive surely wasted more time typing this sentence than I have
spent waiting for fgetc() calls to return over the past 20 years.)
|
I can agree that may be true, but only by spent waiting you mean the extra
time spent waiting vs. fgets (which I think is what you mean).
|
Splitting hairs over a few CPU cycles in some infrequently-used routines does
little or nothing for the overall performance of the program. OTOH, if the
program has a nervous breakdown because of an entirely predictable situation,
the user can waste a lot of time trying to work around the problem.
|
Ill tell you what. Ill drop your algorithm into LDView and do some empirical
tests on the timing, and get back to you. Ill post the final version of my
fgets replacement along with the timing results.
|
I guess the point Im trying to make is that truly great software goes the
extra mile to handle special cases so that the user doesnt ever have to
worry about them. If some users are having problems with line termination
(and I assume they are since this is the second discussion thread on this
topic in less than 2 weeks) then the software should be fixed. Changing the
spec doesnt help a user to load a poorly-formed file, it only gives the
developer an excuse not to care.
|
While youre correct here, my main point wasnt that the program shouldnt be
made to take care of line endings, but that CR and LF+CR dont seem to ever show
up in LDraw files. And while its true that good programs should handle unusual
input conditions, you didnt mention the flip side, which is that every extra
line of code is an opportunity for new bugs.
--Travis
|
|
Message has 2 Replies: | | Re: how does a line ends?
|
| (...) OK, as promised, here it my fgets replacement: char *myFgets(char *buf, int bufSize, FILE *file) int i; for (i = 0; i < bufSize - 1; i++) int char1 = fgetc(file); if (feof(file)) bufi = 0; if (i > 0) return buf; else return NULL; if (char1 == (...) (18 years ago, 7-Apr-07, to lugnet.cad, FTX)
| | | Re: how does a line ends?
|
| (...) Yes, sorry that my post sounded overly defensive. I have something of a chip on my shoulder from years of working alongside programmers who want to take shortcuts at the expense of their users. Very few seem to appreciate that it is worth a (...) (18 years ago, 7-Apr-07, to lugnet.cad, FTX)
|
Message is in Reply To:
| | Re: how does a line ends?
|
| (...) I've been writing text parsing programs for over 20 years, and have found that the approach I've suggested works very well at detecting line ends in a consistent manner. Counting lines in a file by this method agrees with every compiler and (...) (18 years ago, 7-Apr-07, to lugnet.cad, FTX)
|
24 Messages in This Thread:
- Entire Thread on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
This Message and its Replies on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
|
|
|
|