To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.admin.generalOpen lugnet.admin.general in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 Administrative / General / 8483
8482  |  8484
Subject: 
Re: posting weirdness
Newsgroups: 
lugnet.admin.general
Date: 
Sun, 17 Dec 2000 18:49:45 GMT
Viewed: 
523 times
  
In lugnet.admin.general, Dan Boger writes:
yah, that's what it looks like to me.  Since it would match any newline,
and start adding up the whitespace until it found the beginning of a new
paragraph.  it'll backtrack to the next blank line, try again, and fail
once more.... if you have 80 lines, it'd have to do that 80 times...

It's actually infinitely worse than that.  The NFA it produced actually had
exponential performance, so it would try to do it 2^80 (2 raised to the 80th
power) times, which would take about 302,200,000,000,000,000 seconds or about
ten thousand million years.  (:-u

Freidl gives an example[1] in the hip owls book of an NFA/input combo that
he calculated would take "three hundred thousand million billion trillion
years" to run on his system.


Wouldn't this work in it's stead?

  $body =~ s/\n\s*$//;

That's a beautiful optimization!  Yes, it looks like it executes in O(n) time
on perl 5.004_04 -- and that stands to reason since it should compile down to
a very simple DFA.


this way, you don't have to deal with the whole *+ thing, which is always
bad...

I rewrote the pattern match like this instead...

   $body =~ s/(?:\n[ \t]*)+$//;

heh, this works of course too :)  I'm not sure, but arn't there some other
"whitespace" chars that fall within the \s class?

Ya[2] -- but I think I only really want to trim off spaces and tabs.  If there
are any form-feeds, for example, I'd like to leave them, because if someone
knows enough to put one in, then they probably really want it for a reason.
(Although I can't imagine a reason why someone would want to put an FF at the
bottom without a non-whitespace character following it.)  ASCII CR's can't
occur here because CRLF's have already been collapsed to LF's.  Besides 10
(newline), that leaves 32 (space) and 9 (tab).

All things considered, I think I like your s/\n\s*$// suggestion above best --
it's the most fault-tolerant and maintenance-friendly of all.

--Todd

[1] http://www.amazon.com/exec/obidos/ASIN/1565922573/  p. 144.

[2] $ perl -e 'print "@{[grep{chr($_)=~m/\s/}(0..255)]}\n"'
9 10 12 13 32



Message is in Reply To:
  Re: posting weirdness
 
(...) yah, that's what it looks like to me. Since it would match any newline, and start adding up the whitespace until it found the beginning of a new paragraph. it'll backtrack to the next blank line, try again, and fail once more.... if you have (...) (24 years ago, 17-Dec-00, to lugnet.admin.general)

8 Messages in This Thread:


Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR