To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.admin.generalOpen lugnet.admin.general in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 Administrative / General / 2679
2678  |  2680
Subject: 
Re: Search for "BLOCKQUOTE" in lugnet.faq
Newsgroups: 
lugnet.admin.general
Date: 
Sat, 21 Aug 1999 10:14:14 GMT
Viewed: 
191 times
  
In lugnet.admin.general, Todd Lehman writes:
In lugnet.admin.general, "Robert Munafo" <munafo@gcctech.com> writes:
[...]
I'm not too concerned about being able to search for non-alphanumeric
characters in LUGNET messages, except maybe the hyphen issue someone else
has brought up. However, alphanumeric words like "BLOCKQUOTE" should be
findable even if they have non-alphanumeric characters adjacent to them,
as long as it's something the author originally typed in their message.

Agreed!  Again, I'm sorry for the "bug" and I don't know what I was
thinking when I told the indexer to ignore HTML tags.  I'll fix this when
I re-work the indexer and add timestamps to the index.

I'm tinkering with the indexer a bit and I think I remember now what I
was thinking when I told it to ignore HTML tags before.  I think I wasn't
trying to filter out HTML...  I think I was trying to filter out message
IDs!  (Durnit for not having written that in a comment.)

Because message IDs a real bummer of garbage as far as indexing is concerned.
There are quite a few messages with lines like:

   Frank Filz wrote in message <37A9B055.60E3@mindspring.com>...
   Larry Pieniazek wrote in message <37BAD526.FADBED5A@voyager.net>...

and they just make clutter in the index because they're random strings.
I must've figured that filtering out /<.*?>/ would have the benevolent side-
effect of filtering out HTML.  Of course, a better regex for message ID's
is /<.*?\@.*?>/.  Unfortunately, that still filters out e-mail addresses
written in angle brackets.  Probably better to match on

   m/^(.*)?wrote in message <.*?\@.*?>.*$/

and keep $1, throwing away the rest of the line.  Or something like that.

--Todd



Message is in Reply To:
  Re: Search for "BLOCKQUOTE" in lugnet.faq
 
(...) Agreed! Again, I'm sorry for the "bug" and I don't know what I was thinking when I told the indexer to ignore HTML tags. I'll fix this when I re-work the indexer and add timestamps to the index. --Todd (25 years ago, 26-Jul-99, to lugnet.admin.general)

7 Messages in This Thread:

Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR