Computer Aided Design : 14545


CAD / 14545	14544 \| 14546

Subject:	Re: how does a line ends?
Newsgroups:	lugnet.cad
Date:	Sun, 8 Apr 2007 12:04:22 GMT
Viewed:	3062 times

In lugnet.cad, Anders Isaksson wrote:

Travis Cobbs wrote:

It seems to work, but I’m not 100% confident in its lack of bugs. Timing with 8464.mpd results in about 550ms for loading using fgets and 750ms using myFgets from above. On the one hand, thats a 50% slowdown based purely on that one change. On the other hand, 750ms isn’t very long, and while 8464.mpd isn’t exactly a huge file, it’s big enough to prove your point that the performance is fine.

As long as you only have one character to ‘unget’ you could probably speed it up by introducing a static char which holds the ‘ungetted’ char (or null), instead of going through ungetc() -- fgetc(). OTOH, the if-statement to check if there is something in that variable will also take time (see below for better ways).

I thought about using a static instead of ungetc() in the first version of readLine() that I posted. This means that you can only read from one open file at a time, but that is usually an OK restriction as long as the programmer is aware of it. Otherwise it’s a lurking time bomb, though.

You might also win a bit of speed by using a switch instead of the if-statments. Switch statements are usually well optimised by the compiler. Having one case for ‘r’ first, and another case for ‘n’ first will also eliminate some more of the if-statements (mis-predicted branching is expensive on todays CPU:s).

The ordering of the if statements was one area where I thought this code might be optimized, and probably where the most speed could be reclaimed from this implementation. (But see below...)

But you would probably get the best performance by opening the file in binary mode, reading full disk blocks into a buffer, and implement MyFgets on top of that. No library calls, ungetc() is only a Ptr--; and so on.

This is a great idea, but a much more complex routine to implement and debug/verify. Frankly, I would expect a decent standard library implementation to do this anyway.

But really, the first rule of optimization is that before you start optimizing how you’re doing it, optimize what you’re doing to eliminate unneccessary steps. If we relax the semantics of fgets() slightly so that we can ignore empty lines, there is a much simpler way to do this that doesn’t use ungetc() at all. Recognize either CR or LF as a newline, and simply discard any newlines that occur at the start of a line:

char *myFgets(char *buf, int bufSize, FILE *file)
{
   int i = 0;
   int c;

   while (i < bufSize - 1)
   {
      c = fgetc(file);

      if (c == EOF)
      {
         buf[i] = 0;
         if (i > 0)
            return buf;
         else
            return NULL;
      }
      else if (c == '\r' || c == '\n')
      {
         if (i > 0)
         {
            buf[i] = '\n';
            buf[i + 1] = 0;
            return buf;
         }
         // else discard extra newlines at start/end of line
      }
      else
         buf[i++] = (char)c;
   }

   buf[bufSize - 1] = 0;
   return buf;
}

This implementation can avoid any support for ungetc() whatsoever. If we were then to optimize this further using the fread() buffering optimization, it should be even faster.

Message is in Reply To:

		Re: how does a line ends?
(...) As long as you only have one character to 'unget' you could probably speed it up by introducing a static char which holds the 'ungetted' char (or null), instead of going through ungetc() -- fgetc(). OTOH, the if-statement to check if there is (...) (18 years ago, 8-Apr-07, to lugnet.cad)

24 Messages in This Thread:

Entire Thread on One Page:: Nested: All | Brief | Compact | Dots
Linear: All | Brief | Compact

Custom Search