To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.off-topic.geekOpen lugnet.off-topic.geek in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 Off-Topic / Geek / 409
408  |  410
Subject: 
Re: Text::Query
Newsgroups: 
lugnet.off-topic.geek
Date: 
Sun, 22 Aug 1999 06:26:40 GMT
Viewed: 
1570 times
  
In lugnet.general, Todd Lehman writes:
In lugnet.general, Jeremy H. Sproat writes:
In lugnet.build, Todd Lehman writes:
Try this search:
   http://www.lugnet.com/?q=%2Btan+ledit+ldraw+faq

Whoa!  Todd dude!  You're using Text::Query, aren't you?  Come on,
fess up.

No, it's using a homebrew.  I probably should look into Text::Query though,
if it's loaded with callbacks and/or easy member-function overloading.
What I threw together isn't very smart about word roots, and it isn't very
fast either.  But it was easy to write and it'll get the job done for a
while.  Text::Query is probably a far superior solution.

OK, I just took a look at Text::Query.  From what I can tell from reading
the docs and the source, it looks as though it's a brute force text scanner
rather than an inverted-index generator.  So that means it's about 3 to 4
orders of magnitude slower than what I need.  :-(  (But it still looks nice
for small bodies of text (say, less than a megabyte) :-)

For dynamic content like news, the must-have's of an index and retrieval
system are:

   - Rapid retrieval of matches (this implies an inverted index on disk).
   - Rapid incremental, real-time indexing (the index should not have to be
     rebuilt from scratch periodically).
   - Low to moderate disk consumption (say, less than 2x the original overall
     content size).
   - Easily handle from 100MB to 10GB of indexed text.

This is what I have now, but its top-level interface isn't very smart; it
doesn't do soundex or stemming or fuzzy matching or "funny characters."

Here's a product which (I'm not considering but) sounds like a something that
would work great on gobs and gobs of content like news:

   http://www.1source.com/~pollarda/findex/

If I can find something like that which comes with source, then that would be
a good alternative what I cobbled together.

--Todd



Message has 1 Reply:
  Re: Text::Query
 
(...) Yah. I actually was getting excited over the similarity in syntax; I hadn't even thought of the need for indexing such a huge database as LUGNET. BTW, what kind of scanner are you using for the index builder? Does it just break apart words (...) (25 years ago, 24-Aug-99, to lugnet.off-topic.geek)

Message is in Reply To:
  Text::Query
 
(...) No, it's using a homebrew. I probably should look into Text::Query though, if it's loaded with callbacks and/or easy member-function overloading. What I threw together isn't very smart about word roots, and it isn't very fast either. But it (...) (25 years ago, 29-Jun-99, to lugnet.general, lugnet.off-topic.geek)

11 Messages in This Thread:

Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact

This Message and its Replies on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR