Subject:
|
Re: Text::Query
|
Newsgroups:
|
lugnet.off-topic.geek
|
Date:
|
Sun, 22 Aug 1999 06:26:40 GMT
|
Viewed:
|
1675 times
|
| |
| |
In lugnet.general, Todd Lehman writes:
> In lugnet.general, Jeremy H. Sproat writes:
> > In lugnet.build, Todd Lehman writes:
> > > Try this search:
> > > http://www.lugnet.com/?q=%2Btan+ledit+ldraw+faq
> >
> > Whoa! Todd dude! You're using Text::Query, aren't you? Come on,
> > fess up.
>
> No, it's using a homebrew. I probably should look into Text::Query though,
> if it's loaded with callbacks and/or easy member-function overloading.
> What I threw together isn't very smart about word roots, and it isn't very
> fast either. But it was easy to write and it'll get the job done for a
> while. Text::Query is probably a far superior solution.
OK, I just took a look at Text::Query. From what I can tell from reading
the docs and the source, it looks as though it's a brute force text scanner
rather than an inverted-index generator. So that means it's about 3 to 4
orders of magnitude slower than what I need. :-( (But it still looks nice
for small bodies of text (say, less than a megabyte) :-)
For dynamic content like news, the must-have's of an index and retrieval
system are:
- Rapid retrieval of matches (this implies an inverted index on disk).
- Rapid incremental, real-time indexing (the index should not have to be
rebuilt from scratch periodically).
- Low to moderate disk consumption (say, less than 2x the original overall
content size).
- Easily handle from 100MB to 10GB of indexed text.
This is what I have now, but its top-level interface isn't very smart; it
doesn't do soundex or stemming or fuzzy matching or "funny characters."
Here's a product which (I'm not considering but) sounds like a something that
would work great on gobs and gobs of content like news:
http://www.1source.com/~pollarda/findex/
If I can find something like that which comes with source, then that would be
a good alternative what I cobbled together.
--Todd
|
|
Message has 1 Reply: | | Re: Text::Query
|
| (...) Yah. I actually was getting excited over the similarity in syntax; I hadn't even thought of the need for indexing such a huge database as LUGNET. BTW, what kind of scanner are you using for the index builder? Does it just break apart words (...) (25 years ago, 24-Aug-99, to lugnet.off-topic.geek)
|
Message is in Reply To:
| | Text::Query
|
| (...) No, it's using a homebrew. I probably should look into Text::Query though, if it's loaded with callbacks and/or easy member-function overloading. What I threw together isn't very smart about word roots, and it isn't very fast either. But it (...) (25 years ago, 29-Jun-99, to lugnet.general, lugnet.off-topic.geek)
|
11 Messages in This Thread:
- Entire Thread on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
This Message and its Replies on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
|
|
|
|