| | | | |
| |
| In lugnet.admin.general, Dan Jezek writes:
> > [...]
> > example, to limit posts to the last 10 days, use
> >
> > &qs=864000
>
> It works great! ... but the &qs doesn't carry over to the next page of
> results. So if I want to see more pages, I have to edit the querystring on
> each page.
oops, doy! I didn't put in the propagation of that URL term. I don't consider
it 100% "documented" yet (it's still subject to change without notice), but I
still shouldn't have missed that. Thanks. I'll fix that.
The reason it's subject to change is partially because the letter 's' in 'qs'
is named after the word (or greek letter, rather) 'sigma' -- sigma being 1
standard deviation in the bell curve function f(x) = exp(-x^2/2) -- and that
formula isn't being used anymore in the searches, and partially because 'qs'
might better someday be used for "query subject." Anyway, it's still not 100%
in stone. But it'll work until it breaks.
> Since you already have the inner workings of this in place, it
> would be really easy to just add a textbox named "qs" and add the &qs= to
> the bottom "5 more, 10 more"... links. With a little more effort, you could
> include radio buttons to have the user select how many days, months or years
> they want to go back and have your search engine convert it to milliseconds
> depending on what the user selects.
Yup, that's the idea!!! Say, where is that old article about sigma and
advanced options...ah! so easy to find now! :-)
http://news.lugnet.com/?q=url+query+qs+qt+sigma+%3C//1.5
(See topmost result and related thread for more background.)
> > It's actually in the nature of search engines to generate thousands of
> > results.
>
> If given thousands of results, most search engines have some advanced
> options like sorting.
Well, they -are- sorted. They're always sorted -- always highest probability
of relevance first, lowest last. Usually, the metric for relevance is a
combination of non-temporal factors such as word frequencies, word proximities,
and word orderings. I don't know of any search engine that doesn't sort (on
some criteria) the matches it finds. But anyway, I think you meant sorting
by time?
I wonder if a little link at the top to re-deploy the search taking recentness
into account (or conversely, turning it off if it's on) would be useful?
> > What's more important is the first page returned -- i.e., the ranking.
> > Typically one doesn't dig down past the first few, so you rarely
> > actually go visit all the thousands.
>
> I'd be interested in seeing some statistics on how far the average user goes
> when given back let's say 10, 100 and 1,000 pages of results. It would help
> in the design of an effective search engine.
Me too. I'd expect a f(x)=1/x type of curve, but it would be fun to see actual
numbers. :-)
--Todd
| | | | | | | | | | | | |
| |
| In lugnet.admin.general, Todd Lehman writes:
> oops, doy! I didn't put in the propagation of that URL term. I don't >consider
> it 100% "documented" yet (it's still subject to change without notice), but I
> still shouldn't have missed that. Thanks. I'll fix that.
> The reason it's subject to change is partially because the letter 's' in 'qs'
> is named after the word (or greek letter, rather) 'sigma' -- sigma being 1
> standard deviation in the bell curve function f(x) = exp(-x^2/2) -- and that
> formula isn't being used anymore in the searches, and partially because 'qs'
> might better someday be used for "query subject." Anyway, it's still not 100%
> in stone. But it'll work until it breaks.
Wow! So you have terms for the ampersand options in a URL? My standpoint
on this would be to put everything in a form and kill 2 birds with 1 stone -
not having to think of how to name URL terms (unless you enjoy doing that)
and having the search more user-friendly (not everyone will remember the
options or find it easy to edit the URL).
> > If given thousands of results, most search engines have some advanced
> > options like sorting.
>
> Well, they -are- sorted. They're always sorted -- always highest probability
> of relevance first, lowest last. Usually, the metric for relevance is a
> combination of non-temporal factors such as word frequencies, word >proximities,
> and word orderings. I don't know of any search engine that doesn't sort (on
> some criteria) the matches it finds. But anyway, I think you meant sorting
> by time?
No, I meant having the option to pick between what I want the results to be
sorting on. Dejanews has a great power search:
http://www.deja.com/home_ps.shtml
which includes the option to sort by relevance, subject, forum, author and
date. That's how I would like to see the sort options here. But knowing
that you most likely don't have the resources that dejanews has and how
flawlessly Lugnet runs on the current setup, I'm satisfied with editing the
URL for now :-)
> > I'd be interested in seeing some statistics on how far the average user goes
> > when given back let's say 10, 100 and 1,000 pages of results. It would help
> > in the design of an effective search engine.
>
> Me too. I'd expect a f(x)=1/x type of curve, but it would be fun to see >actual numbers. :-)
It could be done. Include another version of jump.cgi into the 5 more, 10
more... on the search results page and log the number of results returned,
the IP address and the query subject. Then run an average, min, max query
grouped by all 3 fields. Sounds complicated, depends on how badly you want
to see the results. I wouldn't want to go through the process of
implementing that but would really like to see the results :-)
| | | | | | | | | | | | | | | | | |
| |
| In lugnet.admin.general, Dan Jezek writes:
> Wow! So you have terms for the ampersand options in a URL? My standpoint
> on this would be to put everything in a form and kill 2 birds with 1 stone -
> not having to think of how to name URL terms (unless you enjoy doing that)
> and having the search more user-friendly (not everyone will remember the
> options or find it easy to edit the URL).
Ya, exactly -- first name the URL components carefully and then put a user-
friendly level on top of it. Best of both worlds.
> No, I meant having the option to pick between what I want the results to be
> sorting on. Dejanews has a great power search:
>
> http://www.deja.com/home_ps.shtml
>
> which includes the option to sort by relevance, subject, forum, author and
> date. That's how I would like to see the sort options here.
Ah, I see. Yeah, that could be helpful in certain cases, if you're scouring
tons of results! I've needed to look things up on Deja.com, so I know what
you mean.
> But knowing
> that you most likely don't have the resources that dejanews has and how
> flawlessly Lugnet runs on the current setup, I'm satisfied with editing the
> URL for now :-)
There's an alternate form that avois the &qs= thingie, so you don't have to
edit the URLs:
http://news.lugnet.com/admin/general/?n=8613
> It could be done. Include another version of jump.cgi into the 5 more, 10
> more... on the search results page and log the number of results returned,
> the IP address and the query subject.
These don't actually run through jump.cgi. But they're already logged by
httpd anyway. (That's how the jump.cgi logging is implemented as well.)
> Then run an average, min, max query
> grouped by all 3 fields. Sounds complicated, depends on how badly you want
> to see the results. I wouldn't want to go through the process of
> implementing that but would really like to see the results :-)
Hmm, it's all there now, except for logging the number of results produced.
I guess it could be as simple as open for append, flock, print, and close on
a filehandle inside of the search page...lemme think about it. Analyzing the
results and making a graph would be a snap with gnuplot.
I think it would be especially fun to compare the graph now to the way it was
(would have been) before the change...but alas, that data was never captured
for the old query engine and it's too late now.
--Todd
| | | | | | | | | | | | | | | |
| |
| In lugnet.admin.general, Todd Lehman writes:
> > > It's actually in the nature of search engines to generate thousands of
> > > results.
> >
> > If given thousands of results, most search engines have some advanced
> > options like sorting.
>
> Well, they -are- sorted. They're always sorted -- always highest probability
> of relevance first, lowest last. Usually, the metric for relevance is a
> combination of non-temporal factors such as word frequencies, word
> proximities, and word orderings. I don't know of any search engine that
> doesn't sort (on some criteria) the matches it finds. But anyway, I think
> you meant sorting by time?
>
> I wonder if a little link at the top to re-deploy the search taking recentness
> into account (or conversely, turning it off if it's on) would be useful?
Todd, this would be really useful. I'll often search for a recent post, only
remembering the poster's name and maybe one or two key-words, and that the
post was in the past few days. I don't need two year old messages nearly as
frequently. Could you change the display so that when results have the same
score more recent posts are displayed first? I think a score weighting for
recentness would be even more useful as part of the default setting --
obviously that would be up to you.
--DaveL
| | | | | | |