To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.off-topic.geekOpen lugnet.off-topic.geek in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 Off-Topic / Geek / 2606
2605  |  2607
Subject: 
News search function reactivated (was: News search function temporarily disabled)
Newsgroups: 
lugnet.admin.general, lugnet.off-topic.geek, lugnet.announce
Followup-To: 
lugnet.admin.general, lugnet.off-topic.geek
Date: 
Tue, 2 Jan 2001 07:46:46 GMT
Highlighted: 
!! (details)
Viewed: 
33 times
  
The LUGNET News search function is now re-enabled.  I completely revamped
the index data structures and list-merge algorithm and rewrote the core
query engine in C.  It's a much more solid implementation.

Everyone's patience during the outage is much appreciated!


Functional improvements:

* Word proximity sensitivity -- two or more words closer together match better
  than the same words far apart.

* Word-order sensitivity -- words in a specific order match better than the
  same words out of order.  For example, "new lego" returns different matches
  than "lego new" (try it!).

* And you can still prefix words with + or - to require inclusion or
  exclusion, respectively.


Cosmetic improvements:

* Graphical horizontal bar showing match rankings.  (Close matches appear
  with wider bars than lesser matches.)

* More streamlined and easy-to-read results header.


Internal improvements:

* Approximately 100 times faster (once the query engine receives the
  request).  Typical CPU utilization is less than 0.1 seconds even for
  queries that generate tens of thousands of word hits.  (Actual times
  may vary depending on disk activity as word-hit lists are accessed.)

* The query engine can take any arbitrary list of news articles as a search
  filter (include or exclude).  This is how subgroup-searches are handled
  now and will pave the way for cooler things later.


To do soon:

* Implement date range restrictions on searches.  Currently searches entire
  corpus of documents and assigns equal date-weight to all documents
  regardless of age.  This is actually working now at the inner levels and
  at the URL level, but there is not yet a forms-based "advanced search"
  user interface for specifying a target date and proximity.


To do someday:

* Facilitate searching within specific threads.  This is a low-level data
  list issue.

* Facilitate searching within arbitrary collections of groups (as opposed to
  a single group or group hierarchy).  This is mostly a user-interface issue.

* Facilitate searching within search results (i.e., "search only within these
  results below").

* Rework the text indexer so that it doesn't throw out "funny characters" in
  words and then reindex the entire document corpus from scratch.  (Currently,
  words like "won't" are converted to "wont" and words like "S@H" are ignored
  entirely.)

* Filter out canceled articles from returned results.

--Todd


p.s.  The article index database currently contains more than 10,000,000
word-hits.



Message has 4 Replies:
  Re: News search function reactivated (was: News search function temporarily disabled)
 
(...) Very good! Although the new search doesn't return most recent articles first like it used to. Is that how it should work? Now I can't see most recent posts that contain the keyword I want to search for, which makes the search pretty much (...) (24 years ago, 2-Jan-01, to lugnet.admin.general, lugnet.off-topic.geek)
  Re: News search function reactivated (was: News search function temporarily disabled)
 
(...) Could I suggest some amendments to the "To do someday" list (for an 'advanced search' only)? - Search by author - Search by subject line contents - Search by date range (or open-ended-- i.e. after date X or before date X) - Search for articles (...) (24 years ago, 2-Jan-01, to lugnet.admin.general)
  Re: News search function reactivated (was: News search function temporarily disabled)
 
(...) Could you tell us the URL syntax for those of us willing to modify URLs? (24 years ago, 2-Jan-01, to lugnet.admin.general, lugnet.off-topic.geek)
  Re: News search function reactivated (was: News search function temporarily disabled)
 
(...) All geeks capitulate sooner or later on perl vs C. Of course Larry (and many others I work with) would tell you to write that stuff in Java but that would be a step backwards. Congratulations! Of course it's stability depends greatly on your (...) (24 years ago, 3-Jan-01, to lugnet.off-topic.geek)

Message is in Reply To:
  News search function temporarily disabled
 
The system crashed again. Either it's running out of memory and that's causing a downward spiral or it's running out of CPU cycles and starving enough processes to build up and cause a meltdown. Either way some tuning needs to be done. This is going (...) (24 years ago, 11-Dec-00, to lugnet.admin.general, lugnet.off-topic.geek, lugnet.announce) ! 

45 Messages in This Thread:
















Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact

This Message and its Replies on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR