Subject:
|
Re: News search function reactivated (was: News search function temporarily disabled)
|
Newsgroups:
|
lugnet.admin.general
|
Date:
|
Wed, 3 Jan 2001 01:51:23 GMT
|
Viewed:
|
1206 times
|
| |
| |
In lugnet.admin.general, Frank Filz writes:
> Todd Lehman wrote:
> > I know what you mean, though, about being able to restrict a search to
> > _specifically_ some exact subject or author. I'll think about how I might
> > be able to handle this in the future -- it would be a separate index
> > database for each of the two fields.
>
> Do you index the name in "X-real-life-name"?
Ya, let's see...as it assembles the text to index, first it grabs
X-Real-Life-Name:, then it grabs either Original-From: or From:, then
Subject:, then Keywords:, then Summary:, and then finally the non-quoted
and non-sig parts of the body.
So, for example, on your post that I'm replying to, it would generate
frank filz frank filz re news search functoin reactivated was news search
function temporarily disabled do you index the name in x real life name
one thought index the special strings from and subject the the serach
etc., etc.
And then it would remove a few stopwords and then feed that to the actual
indexer.
> One thought, index the special strings "from:" and "subject:". The the
> search:
>
> from: ffilz
>
> Should rank my posts highly due to proximity. Of course it would be
> better to index the real life name as if it was preceded by from: also
> so that you could search:
>
> from: filz
>
> and find my posts.
Ah. That's a neat trick! It's a little English-centric, though, but it's
still a very simple and elegant partial solution...one of those "80% of the
benefits for only 20% of the work" types of things. Unfortunately, it would
mean reindexing the entire news corpus from scratch, because they'd be
insertions in the numeric word-order lists -- so it's probably something to
do along with other additions the next time the index is rebuilt from scratch.
The last time I rebuilt it, I think it took a whole day, and that was with
about 1/4 as many articles in the system. (The indexer is optimized for fast
incremental indexing rather than fast one-time building.)
--Todd
|
|
Message is in Reply To:
45 Messages in This Thread:
- Entire Thread on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
|
|
|
|