General Discussions about LUGNET : 8607


Administrative / General / 8607	8606 \| 8608

Subject:	News search function reactivated (was: News search function temporarily disabled)
Newsgroups:	lugnet.admin.general, lugnet.off-topic.geek, lugnet.announce
Followup-To:	lugnet.admin.general, lugnet.off-topic.geek
Date:	Tue, 2 Jan 2001 07:46:46 GMT
Highlighted:	!! (details)
Viewed:	15094 times

The LUGNET News search function is now re-enabled. I completely revamped the index data structures and list-merge algorithm and rewrote the core query engine in C. It's a much more solid implementation. Everyone's patience during the outage is much appreciated! Functional improvements: * Word proximity sensitivity -- two or more words closer together match better than the same words far apart. * Word-order sensitivity -- words in a specific order match better than the same words out of order. For example, "new lego" returns different matches than "lego new" (try it!). * And you can still prefix words with + or - to require inclusion or exclusion, respectively. Cosmetic improvements: * Graphical horizontal bar showing match rankings. (Close matches appear with wider bars than lesser matches.) * More streamlined and easy-to-read results header. Internal improvements: * Approximately 100 times faster (once the query engine receives the request). Typical CPU utilization is less than 0.1 seconds even for queries that generate tens of thousands of word hits. (Actual times may vary depending on disk activity as word-hit lists are accessed.) * The query engine can take any arbitrary list of news articles as a search filter (include or exclude). This is how subgroup-searches are handled now and will pave the way for cooler things later. To do soon: * Implement date range restrictions on searches. Currently searches entire corpus of documents and assigns equal date-weight to all documents regardless of age. This is actually working now at the inner levels and at the URL level, but there is not yet a forms-based "advanced search" user interface for specifying a target date and proximity. To do someday: * Facilitate searching within specific threads. This is a low-level data list issue. * Facilitate searching within arbitrary collections of groups (as opposed to a single group or group hierarchy). This is mostly a user-interface issue. * Facilitate searching within search results (i.e., "search only within these results below"). * Rework the text indexer so that it doesn't throw out "funny characters" in words and then reindex the entire document corpus from scratch. (Currently, words like "won't" are converted to "wont" and words like "S@H" are ignored entirely.) * Filter out canceled articles from returned results. --Todd p.s. The article index database currently contains more than 10,000,000 word-hits.

Message has 4 Replies:

		Re: News search function reactivated (was: News search function temporarily disabled)
(...) Very good! Although the new search doesn't return most recent articles first like it used to. Is that how it should work? Now I can't see most recent posts that contain the keyword I want to search for, which makes the search pretty much (...) (25 years ago, 2-Jan-01, to lugnet.admin.general, lugnet.off-topic.geek)

		Re: News search function reactivated (was: News search function temporarily disabled)
(...) Could I suggest some amendments to the "To do someday" list (for an 'advanced search' only)? - Search by author - Search by subject line contents - Search by date range (or open-ended-- i.e. after date X or before date X) - Search for articles (...) (25 years ago, 2-Jan-01, to lugnet.admin.general)

		Re: News search function reactivated (was: News search function temporarily disabled)
(...) Could you tell us the URL syntax for those of us willing to modify URLs? (25 years ago, 2-Jan-01, to lugnet.admin.general, lugnet.off-topic.geek)

		Re: News search function reactivated (was: News search function temporarily disabled)
(...) All geeks capitulate sooner or later on perl vs C. Of course Larry (and many others I work with) would tell you to write that stuff in Java but that would be a step backwards. Congratulations! Of course it's stability depends greatly on your (...) (25 years ago, 3-Jan-01, to lugnet.off-topic.geek)

Message is in Reply To:

		News search function temporarily disabled
The system crashed again. Either it's running out of memory and that's causing a downward spiral or it's running out of CPU cycles and starving enough processes to build up and cause a meltdown. Either way some tuning needs to be done. This is going (...) (25 years ago, 11-Dec-00, to lugnet.admin.general, lugnet.off-topic.geek, lugnet.announce) !

45 Messages in This Thread:

Entire Thread on One Page:: Nested: All | Brief | Compact | Dots
Linear: All | Brief | Compact
This Message and its Replies on One Page:: Nested: All | Brief | Compact | Dots
Linear: All | Brief | Compact

Custom Search