| | Text::Query Todd Lehman
|
| | (...) No, it's using a homebrew. I probably should look into Text::Query though, if it's loaded with callbacks and/or easy member-function overloading. What I threw together isn't very smart about word roots, and it isn't very fast either. But it (...) (25 years ago, 29-Jun-99, to lugnet.general, lugnet.off-topic.geek)
|
| | |
| | | | Re: Text::Query Todd Lehman
|
| | | | (...) OK, I just took a look at Text::Query. From what I can tell from reading the docs and the source, it looks as though it's a brute force text scanner rather than an inverted-index generator. So that means it's about 3 to 4 orders of magnitude (...) (25 years ago, 22-Aug-99, to lugnet.off-topic.geek)
|
| | | | |
| | | | | | Re: Text::Query Jeremy Sproat
|
| | | | (...) Yah. I actually was getting excited over the similarity in syntax; I hadn't even thought of the need for indexing such a huge database as LUGNET. BTW, what kind of scanner are you using for the index builder? Does it just break apart words (...) (25 years ago, 24-Aug-99, to lugnet.off-topic.geek)
|
| | | | |
| | | | | | Re: Text::Query Todd Lehman
|
| | | | (...) Not very magical, no. It breaks text apart by anything non-alphanumeric, where the "alpha" part includes ISO-8859-1 international letters like ã, ñ, ß, and ø, etc. It converts everything to lowercase for indexing and collapses apostrophes. (...) (25 years ago, 25-Aug-99, to lugnet.off-topic.geek)
|
| | | | |
| | | | | | reverse indexes (was Re: Text::Query) Robert Munafo
|
| | | | (...) I'm having trouble figuring out how that's even possible. For example, the first sample line in the 'jeremy' file above appears to say that 'jeremy' occurs as the 19th, 590th, 595th 600th and 605th words of (URL) (which looks more or less (...) (25 years ago, 25-Aug-99, to lugnet.off-topic.geek)
|
| | | | |
| | | | | | Re: reverse indexes (was Re: Text::Query) Todd Lehman
|
| | | | (...) There is a lot of overhead, yes, but it's cancelled out by: 1) NNTP headers are ignored during indexing (except the Subject and From values). 2) In the NNTP article body, the following are ignored during indexing: - Canonically quoted content (...) (25 years ago, 26-Aug-99, to lugnet.off-topic.geek)
|
| | | | |
| | | | | | Re: reverse indexes (was Re: Text::Query) Matthew Miller
|
| | | | (...) What about "Keywords"? I'm not sure if anyone actually uses that, but seems worth keeping... (25 years ago, 26-Aug-99, to lugnet.off-topic.geek)
|
| | | | |
| | | | | | Indexing on Keywords and Summary headers (was: Re: reverse indexes) Todd Lehman
|
| | | | (...) Oooho. Yessss...! Across all the articles posted so far, I only see one instance* of the Keywords header, but I still agree that it seems worth keeping and indexing on -- I'd even argue that it's a bug not to index on it. Alrighty then, I (...) (25 years ago, 27-Aug-99, to lugnet.off-topic.geek, lugnet.admin.general)
|
| | | | |