Subject:
|
Re: how does a line ends?
|
Newsgroups:
|
lugnet.cad
|
Date:
|
Sat, 7 Apr 2007 10:48:36 GMT
|
Viewed:
|
1648 times
|
| |
| |
In lugnet.cad, Travis Cobbs wrote:
|
In lugnet.cad, Chris Phillips wrote:
|
There was some discussion about this issue recently. Unfortunately, this is
not the kind of problem that can be solved by tightening up the spec. The
problem is that people use different text editors on different operating
systems to create LDRAW files, resulting in files that do not have
consistent line termination. Changing the specification will not fix all
the files that are already wrong, and it wont fix all the possible text
editors, either.
A better solution, in my opinion, is for all tools to be able to handle any
combination of LF or CR characters as a proper line feed. (And the spec
should spell this out.) This is not all that hard to do, and it makes life
easier for your users. The amount of programming effort to code defensively
is probably less than the effort just one of your users will go through if
they have to work around this problem.
|
I dont fully agree with this. In the nearly seven years since I released
LDView 0.1, Ive never once run into an LDraw file that LDView couldnt
handle due to line terminations, and it requires the newline character to be
present at the end of each line. (The preceding carriage return is
optional.) Im not saying such files dont exist, but if so, theyre
exceedingly rare. Additionally, I suspect that they wont work with any of
the major LDraw editors (although they might work with Bricksmith).
|
Ive been writing text parsing programs for over 20 years, and have found that
the approach Ive suggested works very well at detecting line ends in a
consistent manner. Counting lines in a file by this method agrees with every
compiler and editor that Ive used, where other methods do not. I have seen
files that are terminated with CR, with LF, with CR+LF, and with LF+CR.
It is very easy for a programmer (regardless of the OS they are using) to create
programs that do not use proper line breaks. If you doubt it, answer this
quiz without using any reference materials:
1. Which of the following is "correct"?
a. printf("Hello, world!\r");
b. printf("Hello, world!\n");
c. printf("Hello, world!\r\n");
d. printf("Hello, world!\n\r");
2. In each of the above cases, what byte sequence will appear in the output stream?
In truth, there is no absolute answer for either question. Depending on the
library implementation and the OS, you may get different results.
|
Consequently, I feel that fgets is perfectly legitimate for reading lines, as
long as the program can deal with both DOS and Unix line endings. I dont
actually know for sure, but it was my understanding that fgetc is slow.
Additionally, ungetc is documented to not be guaranteed to work, although in
practice this appears to only be a problem if you call it twice in a row
without an intervening read.
|
Sure, fgetc() is likely to be slower than fgets() because the program is making
a library call for each byte instead of each line. And yet, I always read
text files a byte at a time, and I have never noticed a performance problem.
Yes, ungetc() can typically only push back a single character at a time, which
is exactly how I am using it here. The code I posted can easily be modified to
avoid using ungetc() if that is an issue for you. Ive implemented this
algorithm many different ways over the years.
You can pick apart my code until the cows come home, but the underlying
heuristic works. Either CR or LF indicate end of line, and if the other
character immediately follows, clump them together as a single line break.
How big is a typical CAD file? How often do you need to load one from disk? Is
the microscopic performance difference even noticeable? Maybe back in the days
of 5 meg hard drives this made a difference, but on todays hardware this is not
even an issue. It takes the user 100 times longer to browse to the file than it
takes the computer to read it into memory and parse the contents. (Ive surely
wasted more time typing this sentence than I have spent waiting for fgetc()
calls to return over the past 20 years.)
Splitting hairs over a few CPU cycles in some infrequently-used routines does
little or nothing for the overall performance of the program. OTOH, if the
program has a nervous breakdown because of an entirely predictable situation,
the user can waste a lot of time trying to work around the problem.
I guess the point Im trying to make is that truly great software goes the extra
mile to handle special cases so that the user doesnt ever have to worry about
them. If some users are having problems with line termination (and I assume
they are since this is the second discussion thread on this topic in less than 2
weeks) then the software should be fixed. Changing the spec doesnt help a user
to load a poorly-formed file, it only gives the developer an excuse not to care.
|
|
Message has 2 Replies: | | Re: how does a line end?
|
| (...) 1. I always use alt. b. "printf("Hello, world!n");" But as a self-made hobby programmer not knowing all the tweeks and geeks, I wouldn't be surprised if the interpretation of this may vary from one IDE to another, or if there is an .ini file (...) (18 years ago, 7-Apr-07, to lugnet.cad, FTX)
| | | Re: how does a line ends?
|
| (...) You'll note that I didn't really say that there was anything wrong with your parsing routine (other than some personaly negative feelings about fgetc and ungetc). After fixing the bugs, it will do exactly what you say it will do. It's just (...) (18 years ago, 7-Apr-07, to lugnet.cad, FTX)
|
Message is in Reply To:
| | Re: how does a line ends?
|
| (...) I don't fully agree with this. In the nearly seven years since I released LDView 0.1, I've never once run into an LDraw file that LDView couldn't handle due to line terminations, and it requires the newline character to be present at the end (...) (18 years ago, 7-Apr-07, to lugnet.cad, FTX)
|
24 Messages in This Thread:
- Entire Thread on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
This Message and its Replies on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
|
|
|
|