To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.admin.generalOpen lugnet.admin.general in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 Administrative / General / 4854
4853  |  4855
Subject: 
Re: Article scoring
Newsgroups: 
lugnet.admin.general
Date: 
Mon, 6 Mar 2000 04:06:35 GMT
Highlighted: 
(details)
Viewed: 
941 times
  
Upon more reflection, I'm thinking that the "square-rooted-average"

         sum(V,1,n)
   w  =  ----------
          sqrt(n)

isn't so great afterall.  Its biggest plus is that it distinguishes nicely
between sets of votes which have the same average and a different number of
voters, for example three +1's versus three hundred +1's.  Its biggest minus,
however, is that the amount of distinction is just too likely to be too much
for everyday use.

For example, I'm very concerned about what would happen to an astoundingly
great message posted to a relatively low-traffic group such as one of the
European .loc groups.  Even if the entire population of readers of such a
group rated a message positively, it still wouldn't stand a chance against
a lesser-ranked article with ten or twenty times as many readers (voters in
this case).  And this would be doubly unfortunate if the message contained
something understandable to non-speakers of the language (for example, links
to model photos).

So:  Let's rephrase the original requirements of the scoring/weighting/
ranking fuction...

1.  A large set of equal votes, e.g., {+1,+1,+1,+1,+1,+1,+1,+1,+1,+1}, *must*
    produce a higher rating than a relatively small set of equal votes, e.g.,
    {+1,+1,+1}.

2.  However, a large set of equal votes should not produce a rating *too* much
    higher than a relatively small set of votes.

3.  A single vote should count for *something* (it should move the object's
    rating away from 0 assuming a nonzero vote), but a single vote should not
    count -too- much.  In particular, a single vote must *not* be able to
    "peg" either end of the rating scale (e.g., +1 or -1).

4.  For all n>0, n equal votes of +1 must produce a rating slightly higher
    than n-1 equal votes of +1.  As n -> infinity, the difference between such
    ratings for successive values of n should approach zero.

OK, that ought to about do it.

Now, the old way (dividing sum(V,1,n) by sqrt(n)) could produce arbitrarily
large ratings.  Most would tend to be beteween -10 and +10, but theoretically
there is no bound on the magnitude of the numbers.  This is a lot like the
way Warp Factors used to work on the old Star Trek shows:  You'd take the
Warp Factor, cube it, and that's how many times the speed of light you were
going (so, Warp 8 was a little more than twice as fast as Warp 6).  The new
way I'm considering (a modified classic average) is a lot like the new Warp
Factors on Star Trek -- where you can't go faster than Warp 10.  In fact,
there isn't anything faster than Warp 10; you can get as close as you want
to 10, but you can never reach it (except for the cheat they did in a hokey
ST:Voy episode...anyway...)  Warp 9.999 is now *much* faster than Warp 9.99,
and Warp 9.99 is much faster than Warp 9.9, etc....

OK, how does that apply to averages?  First, let's say we want to restrict the
final rating to the unit interval [-1,+1] (+0.99 would be like Warp 9.9).
Just taking a simple classic average sum(V,1,n)/n gives values in this range,
but for small values of n (1 in particular, and often 2, and sometimes even 3
and 4), an average can often "peg" the scale at exactly -1 or exactly +1.
That's not good -- and it's super-duper bad when n=1.

So let's "soften" the average by multiplying it by a number greater than 0
but less than 1.  And let's compute that number such that, for small values
of n, that number is small, and for large values of n, that number gets very
close to 1.  This sounds like a simple single-variable hyperbolic curve of
the 1/x variety -- except that we need it to approach 1 as x->infinity rather
than 0 as x->infinity -- so let's try the variant  1 - 1/x.

      n    1 - 1/n
   ----  -----------
      1     0.000
      2     0.500
      3     0.667
      4     0.750
      5     0.800
      6     0.833
      7     0.857
      :      :
     10     0.900
     20     0.950
    100     0.990
    200     0.995
   1000     0.999

That's pretty good!  The only problem is that with n=1, 1-1/n comes out to
zero, which means that the first vote would be multiplied by zero, giving it
no impact at all (until other votes came in later).

Let's try another variant...let's pretend that there are 2 votes when there
is actually only 1, and 3 when there are only 2, and 4 when there are only
3, etc.  So let's add 1 to n and try 1 - 1/(n+1) instead...

      n  1 - 1/(n+1)
   ----  -----------
      1     0.500
      2     0.667
      3     0.750
      4     0.800
      5     0.833
      6     0.857
      7     0.875
      :      :
     10     0.909
     20     0.952
    200     0.995
   1000     0.999

That's better!  Now, having only one vote produces a nonzero output (just
like we want), but not too strongly as to peg the scale at either end.  Having
two votes is even better, and having three is even better, etc.

And now, having only 5 votes is only 0.157 different from having 100 votes --
still plenty enough of a difference to sort by ratings, but not so much of a
difference as to go unnoticed by linear-threshold-based filters.

OK, let's put this  1 - 1/(n+1)  multiplier to work!  (Here's where this gets
really fun!...)

The regular classic "average" function takes the sum of votes V[1] through
V[n] and divides that by the number of votes n.

         sum(V,1,n)
   w  =  ----------
              n

But we don't want that -- that's very bad for small values of n.  We want
*almost* that, but we want it multiplied by this new  1 - 1/(n+1)  softener:


         sum(V,1,n) (           1    )
   w  =  ---------- (  1  -  ------- )
              n     (         n + 1  )

That mess can be simplified:

         sum(V,1,n) (  n + 1         1    )
      =  ---------- ( -------  -  ------- )
              n     (  n + 1       n + 1  )

         sum(V,1,n) (    n    )
      =  ---------- ( ------- )
              n     (  n + 1  )

         sum(V,1,n)
      =  ----------
            n + 1

Now, that's a nice cool tiny formula, to be sure, but there's actually
something extra-cool about it:  Summing from 1 to n over a set V[1..n] of
votes is the same as summing from 1 to n+1 over a set V[1..n+1] whose n+1'st
member is always zero.  In other words, we can write:

         sum(V,1,n) + 0
      =  --------------  ,    V[1..n] defined
              n + 1

          sum(V,1,n+1)
      =  -------------- ,     V[1..n+1] defined, with V[n+1]=0
              n + 1

In other words, what's written above is actually *still* the classic average
formula -- it's just disguised!  To un-disguise it, we can simply proscribe
that every object being voted upon has n votes from humans and one vote
(always zero) from the system -- or, equivalently, that every object being
voted upon has n votes, n-1 of which are from humans and one of which (always
equal to zero) is from the system.  That brings things full-circle back to
this:

          sum(V,1,n)
      =  ------------ ,     V[1..n] defined, with V[1]=0
               n

What amuses me most about this is that the 1 - 1/(n+1)  softener can actually
exist as *part of* the classic pure-linear averaging function just by a very
slight and trivial alteration of the way votes are accumulated.

--Todd



Message has 3 Replies:
  Re: Article scoring
 
In lugnet.admin.general, Todd Lehman writes: [snip a lot of math] (...) where in here does it take into account how many people are in the group? Can you weigh in the number of people who are subscribed to that group? Since they are the ones most (...) (25 years ago, 6-Mar-00, to lugnet.admin.general)
  Re: Article scoring
 
Wow, that's cool! The geek in me is waking up... So basically what you said here, is that every message gets one vote from the system as a softner - and that the vote is always zero. Wow. Again, cool. -Shiri (a real geek at heart) (25 years ago, 6-Mar-00, to lugnet.admin.general)  
  Re: Article scoring
 
(...) Isn't that redundant? Steve Bliss (25 years ago, 6-Mar-00, to lugnet.off-topic.fun)

Message is in Reply To:
  Re: Article scoring
 
(...) Oh -- while I was out grocery shopping I just realized this: What is sqrt(n)? It's nothing more than a fancy way of writing n^(1/2). So: What if the exponent didn't have to be exactly 1/2, but instead was allowed to vary? Then what dynamics (...) (25 years ago, 4-Mar-00, to lugnet.admin.general)  

52 Messages in This Thread:



















Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact

This Message and its Replies on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR