To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.cadOpen lugnet.cad in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 CAD / 14515
14514  |  14516
Subject: 
Re: how does a line ends?
Newsgroups: 
lugnet.cad
Date: 
Sat, 7 Apr 2007 10:48:36 GMT
Viewed: 
1648 times
  
In lugnet.cad, Travis Cobbs wrote:
   In lugnet.cad, Chris Phillips wrote:
   There was some discussion about this issue recently. Unfortunately, this is not the kind of problem that can be solved by tightening up the spec. The problem is that people use different text editors on different operating systems to create LDRAW files, resulting in files that do not have consistent line termination. Changing the specification will not fix all the files that are already “wrong”, and it won’t fix all the possible text editors, either.

A better solution, in my opinion, is for all tools to be able to handle any combination of LF or CR characters as a proper line feed. (And the spec should spell this out.) This is not all that hard to do, and it makes life easier for your users. The amount of programming effort to code defensively is probably less than the effort just one of your users will go through if they have to work around this problem.

I don’t fully agree with this. In the nearly seven years since I released LDView 0.1, I’ve never once run into an LDraw file that LDView couldn’t handle due to line terminations, and it requires the newline character to be present at the end of each line. (The preceding carriage return is optional.) I’m not saying such files don’t exist, but if so, they’re exceedingly rare. Additionally, I suspect that they won’t work with any of the major LDraw editors (although they might work with Bricksmith).

I’ve been writing text parsing programs for over 20 years, and have found that the approach I’ve suggested works very well at detecting line ends in a consistent manner. Counting lines in a file by this method agrees with every compiler and editor that I’ve used, where other methods do not. I have seen files that are terminated with CR, with LF, with CR+LF, and with LF+CR.

It is very easy for a programmer (regardless of the OS they are using) to create programs that do not use “proper” line breaks. If you doubt it, answer this quiz without using any reference materials:
1.  Which of the following is "correct"?
    a.  printf("Hello, world!\r");
    b.  printf("Hello, world!\n");
    c.  printf("Hello, world!\r\n");
    d.  printf("Hello, world!\n\r");

2.  In each of the above cases, what byte sequence will appear in the output stream?
In truth, there is no absolute answer for either question. Depending on the library implementation and the OS, you may get different results.

   Consequently, I feel that fgets is perfectly legitimate for reading lines, as long as the program can deal with both DOS and Unix line endings. I don’t actually know for sure, but it was my understanding that fgetc is slow. Additionally, ungetc is documented to not be guaranteed to work, although in practice this appears to only be a problem if you call it twice in a row without an intervening read.

Sure, fgetc() is likely to be slower than fgets() because the program is making a library call for each byte instead of each line. And yet, I always read text files a byte at a time, and I have never noticed a performance problem. Yes, ungetc() can typically only push back a single character at a time, which is exactly how I am using it here. The code I posted can easily be modified to avoid using ungetc() if that is an issue for you. I’ve implemented this algorithm many different ways over the years.

You can pick apart my code until the cows come home, but the underlying heuristic works. Either CR or LF indicate end of line, and if the other character immediately follows, clump them together as a single line break.

How big is a typical CAD file? How often do you need to load one from disk? Is the microscopic performance difference even noticeable? Maybe back in the days of 5 meg hard drives this made a difference, but on today’s hardware this is not even an issue. It takes the user 100 times longer to browse to the file than it takes the computer to read it into memory and parse the contents. (I’ve surely wasted more time typing this sentence than I have spent waiting for fgetc() calls to return over the past 20 years.)

Splitting hairs over a few CPU cycles in some infrequently-used routines does little or nothing for the overall performance of the program. OTOH, if the program has a nervous breakdown because of an entirely predictable situation, the user can waste a lot of time trying to work around the problem.

I guess the point I’m trying to make is that truly great software goes the extra mile to handle special cases so that the user doesn’t ever have to worry about them. If some users are having problems with line termination (and I assume they are since this is the second discussion thread on this topic in less than 2 weeks) then the software should be fixed. Changing the spec doesn’t help a user to load a poorly-formed file, it only gives the developer an excuse not to care.



Message has 2 Replies:
  Re: how does a line end?
 
(...) 1. I always use alt. b. "printf("Hello, world!n");" But as a self-made hobby programmer not knowing all the tweeks and geeks, I wouldn't be surprised if the interpretation of this may vary from one IDE to another, or if there is an .ini file (...) (18 years ago, 7-Apr-07, to lugnet.cad, FTX)
  Re: how does a line ends?
 
(...) You'll note that I didn't really say that there was anything wrong with your parsing routine (other than some personaly negative feelings about fgetc and ungetc). After fixing the bugs, it will do exactly what you say it will do. It's just (...) (18 years ago, 7-Apr-07, to lugnet.cad, FTX)

Message is in Reply To:
  Re: how does a line ends?
 
(...) I don't fully agree with this. In the nearly seven years since I released LDView 0.1, I've never once run into an LDraw file that LDView couldn't handle due to line terminations, and it requires the newline character to be present at the end (...) (18 years ago, 7-Apr-07, to lugnet.cad, FTX)

24 Messages in This Thread:











Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact

This Message and its Replies on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR