General Discussions about LUGNET : 8619


Administrative / General / 8619	8618 \| 8620

Subject:	Re: News search function reactivated (was: News search function temporarily disabled)
Newsgroups:	lugnet.admin.general
Date:	Wed, 3 Jan 2001 02:48:21 GMT
Viewed:	1833 times

In lugnet.admin.general, David Eaton writes: > In lugnet.admin.general, Todd Lehman writes: > > Is that the sort of functionality you're looking for? > > Pretty much... actually, I was thinking more along the lines of an advanced > search form though: > > Search for: ______________________ (uses +'s and -'s as is) > Search for text in subject line [] (checkbox) > Posted by: _______________________ (uses +'s and -'s... or no symbols, too) > Search only for heads of a thread [] (checkbox) > Posted before: ___ /____ / ____ > Posted after: ___ / ____/ ____ > > But throwing in symbols/wildcards on the command line instead of in a form > works for me too :) Ya, something like that'd be good to slap on top after the base functionality. :-) Nobody wants to *have* to remember how all the squiggly and square brace thingums in a search box work. :-) > > If you search for > > > > david eaton <10 > > > > then it'll show you things matching "david eaton" that were posted "about 10 > > days ago" (plus or minus 10 days -- with higher matches given to those that > > are closer to the 10-days-ago-point). > > So-- looks like if today is day 100, "david eaton <10" would search days > 80-100, giving highest precedence to things closest to day 90... Ya, precisely. It was originally (summer of '99) a smooth bell-shaped curve y = exp(-.5 * x^2) (x being amount of deviation from the target and y being the output function giving the fitness value) but that was chewing up 10^-6 seconds in one of the tight inner loops (i.e., wasting 0.07 CPU seconds on a word like 'lego' with ~70000 hits), so I threw that out and changed it to a linear 1-|x| shaped spike curve y = max(0, 1-|x|) instead. That doubled the overall throughput and still gave decent results. The main advantage of the bell curve y=exp(-.5*x^2) is that y>0 for all x, but that can also be a disadvantage. The sharp y=max(0,1-|x|) curve has a nice sudden cutoff at x<1 and x>1. :-) In a way I'm kinda glad the bell curve was so slow to compute > cool! I > think that's actually really useful :) > > > I'm thinkin' this'll be quite useful for digging up stuff that's "about a > > week ago" or "about 3 months ago" or "oh, about 2 years ago" -- when it's > > tough to remember an exact date. > > Poifect! That's something I had been wanting for a while since I'll know > (for example) that I posted something last spring, but I want to make sure > NOT to search for anything after, say, June, or before March... very cool > indeed :) Ya, I can't count the number of times I've wanted to go about "about so many days or weeks" to look for something. I'll remember something and not know exactly what date it was posted, but I'll remember roughly how long ago it was. So it's effectively doing a fuzzy date search with variable focus (wide, narrow, etc.). > > > - Search for articles containing a URL > > Not sure how to handle this yet. > Yeah, that's just one of those "Oh, if only" things-- mostly for when I'm > looking for someone who posted a link to their site... occasionally I've > tried to do this by putting "http" on the query string, although it doesn't > rule out posts that give their URLs as "www.foo.com/~blah/cool_page.html". I'm not sure why I included "http" in the stopword list. I apologize for that. (Probably because it would have generated zillions upon zillions of word hits. "com" is the most frequently indexed word here.) "http" could certainly stand to go back in now that the query engine is so much faster. Maybe even other words like "it" and "that" and "the." Here's a list of stopwords, BTW...are there any you see here that stand out in your mind as having given you problems in the past? a an the it its it's this that what i i'm im my we me us you do be am is are was can has of for from with to in out on off at as if and but or not no have so http www One thing I *really* hate about stopword lists is that they're so language- centric (i.e., a stopword in one language might be a darn-tootin' regular good word in another language). Also, all single-letter words are ignored (i.e., "a" and "i" for English and "y" for Spanish, etc.). > > > - option to ONLY return the heads of threads > > Ahh yes -- I'll put that on the "must do" list. That won't be too hard once > > the thread lists are generated internally for other purposes. > Cool! I've planned ahead here. For each group, there'll be a list of articles that comprise the heads-of-threads for those groups. That list can the be used to generate more compact views into the group, or it can also be fed into the query engine as an "include only these" filter. In memory, once loaded, the article filter lists are 1-bit flags -- 1 bit per article position -- so even a list of a quarter million articles consumes only 30 KB of memory for the fraction of a second that it's needed. --Todd

Message has 2 Replies:

		Article bit-flags (was: Re: News search function reactivated)
(...) Oh, one other thing...planning ahead: Another potential application of article bit-flags is read/unread lists on a person-by-person basis via the web interface. I know this is something that people have been asking for for a long time. When (...) (24 years ago, 3-Jan-01, to lugnet.admin.general)

		Re: News search function reactivated (was: News search function temporarily disabled)
(...) As an algorithmical guess, I think I'd probably attempt something a bit different... If someone enters: the I'd probably want to ignore it. But if they entered: the best design I might want to consider the 'the'. Dunno. I'd probably test an (...) (24 years ago, 3-Jan-01, to lugnet.admin.general)

Message is in Reply To:

		Re: News search function reactivated (was: News search function temporarily disabled)
(...) Pretty much... actually, I was thinking more along the lines of an advanced search form though: Search for: ___...___ (uses +'s and -'s as is) Search for text in subject line [] (checkbox) Posted by: ___...___ (uses +'s and -'s... or no (...) (24 years ago, 2-Jan-01, to lugnet.admin.general)

45 Messages in This Thread:

Entire Thread on One Page:: Nested: All | Brief | Compact | Dots
Linear: All | Brief | Compact
This Message and its Replies on One Page:: Nested: All | Brief | Compact | Dots
Linear: All | Brief | Compact

Custom Search