To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.cadOpen lugnet.cad in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 CAD / 14521
14520  |  14522
Subject: 
Re: how does a line ends?
Newsgroups: 
lugnet.cad
Date: 
Sat, 7 Apr 2007 18:00:03 GMT
Viewed: 
1724 times
  
In lugnet.cad, Chris Phillips wrote:
   I’ve been writing text parsing programs for over 20 years, and have found that the approach I’ve suggested works very well at detecting line ends in a consistent manner. Counting lines in a file by this method agrees with every compiler and editor that I’ve used, where other methods do not. I have seen files that are terminated with CR, with LF, with CR+LF, and with LF+CR.

You’ll note that I didn’t really say that there was anything wrong with your parsing routine (other than some personaly negative feelings about fgetc and ungetc). After fixing the bugs, it will do exactly what you say it will do. It’s just that since I haven’t ever run into CR or LF+CR line endings in any LDraw file in close to seven years, I don’t personally feel that it is necessary.


   It is very easy for a programmer (regardless of the OS they are using) to create programs that do not use “proper” line breaks. If you doubt it, answer this quiz without using any reference materials:
1.  Which of the following is "correct"?
    a.  printf("Hello, world!\r");
    b.  printf("Hello, world!\n");
    c.  printf("Hello, world!\r\n");
    d.  printf("Hello, world!\n\r");

2.  In each of the above cases, what byte sequence will appear in the output stream?

For number 1, the correct way in C is always to use b above, and let the standard C libraries take care of the rest for you. If you use fprintf instead of printf, you do need to be aware of the concept of binary mode when opening the file, and deal accordingly. As long as you don’t have the file in binary mode, libc will do “the right thing”. If the file is open in binary mode without an understanding of the consequences, then it’s the fault of the developer.

The answer to number 2 is that it depends on libc, and that is how it should be. The “correct” byte sequence for the environment will show up in the file, as long as the file isn’t in binary mode. (I realize that this doesn’t apply to the question at hand; more below.)


   In truth, there is no absolute answer for either question. Depending on the library implementation and the OS, you may get different results.

This is true, which is where this whole issue comes from in the first place. I’m not saying there’s anything wrong with your line parser. I’m just saying that I don’t personally feel that CR and LF+CR line endings need to be supported in LDraw files. This is my opinion, which you don’t share. There’s nothing wrong with that.


   You can pick apart my code until the cows come home, but the underlying heuristic works. Either CR or LF indicate end of line, and if the other character immediately follows, clump them together as a single line break.

I apologize. I wasn’t trying to pick apart your algorithm. There’s nothing wrong with it; my main argument was that I felt that fgets was acceptable instead for LDraw files.

  
How big is a typical CAD file? How often do you need to load one from disk? Is the microscopic performance difference even noticeable? Maybe back in the

Actually, quite a bit of file I/O goes into reading an LDraw file, due to the way the parts are formatted. This can be observed by loading a medium size file after a fresh reboot (timing the load), and then repeating the process after the files end up in the cache. The second load will be a little faster. On my computer, LDView takes 4-5 seconds to do the file reading for the 8464.mpd that comes with LDView the first time you load it, and 1-2 seconds the second time. (LDView says Loading... in the status bar during the file reading, then switches to Parsing... after that stage is complete.)


   days of 5 meg hard drives this made a difference, but on today’s hardware this is not even an issue. It takes the user 100 times longer to browse to the file than it takes the computer to read it into memory and parse the contents. (I’ve surely wasted more time typing this sentence than I have spent waiting for fgetc() calls to return over the past 20 years.)

I can agree that may be true, but only by “spent waiting” you mean the extra time spent waiting vs. fgets (which I think is what you mean).


   Splitting hairs over a few CPU cycles in some infrequently-used routines does little or nothing for the overall performance of the program. OTOH, if the program has a nervous breakdown because of an entirely predictable situation, the user can waste a lot of time trying to work around the problem.

I’ll tell you what. I’ll drop your algorithm into LDView and do some empirical tests on the timing, and get back to you. I’ll post the final version of my fgets replacement along with the timing results.


   I guess the point I’m trying to make is that truly great software goes the extra mile to handle special cases so that the user doesn’t ever have to worry about them. If some users are having problems with line termination (and I assume they are since this is the second discussion thread on this topic in less than 2 weeks) then the software should be fixed. Changing the spec doesn’t help a user to load a poorly-formed file, it only gives the developer an excuse not to care.

While you’re correct here, my main point wasn’t that the program shouldn’t be made to take care of line endings, but that CR and LF+CR don’t seem to ever show up in LDraw files. And while it’s true that good programs should handle unusual input conditions, you didn’t mention the flip side, which is that every extra line of code is an opportunity for new bugs.

--Travis



Message has 2 Replies:
  Re: how does a line ends?
 
(...) OK, as promised, here it my fgets replacement: char *myFgets(char *buf, int bufSize, FILE *file) int i; for (i = 0; i < bufSize - 1; i++) int char1 = fgetc(file); if (feof(file)) bufi = 0; if (i > 0) return buf; else return NULL; if (char1 == (...) (18 years ago, 7-Apr-07, to lugnet.cad, FTX)
  Re: how does a line ends?
 
(...) Yes, sorry that my post sounded overly defensive. I have something of a chip on my shoulder from years of working alongside programmers who want to take shortcuts at the expense of their users. Very few seem to appreciate that it is worth a (...) (18 years ago, 7-Apr-07, to lugnet.cad, FTX)

Message is in Reply To:
  Re: how does a line ends?
 
(...) I've been writing text parsing programs for over 20 years, and have found that the approach I've suggested works very well at detecting line ends in a consistent manner. Counting lines in a file by this method agrees with every compiler and (...) (18 years ago, 7-Apr-07, to lugnet.cad, FTX)

24 Messages in This Thread:











Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact

This Message and its Replies on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR