General Discussions about LUGNET : 4854


Administrative / General / 4854	4853 \| 4855

Subject:	Re: Article scoring
Newsgroups:	lugnet.admin.general
Date:	Mon, 6 Mar 2000 04:06:35 GMT
Highlighted:	(details)
Viewed:	2286 times

Upon more reflection, I'm thinking that the "square-rooted-average" sum(V,1,n) w = ---------- sqrt(n) isn't so great afterall. Its biggest plus is that it distinguishes nicely between sets of votes which have the same average and a different number of voters, for example three +1's versus three hundred +1's. Its biggest minus, however, is that the amount of distinction is just too likely to be too much for everyday use. For example, I'm very concerned about what would happen to an astoundingly great message posted to a relatively low-traffic group such as one of the European .loc groups. Even if the entire population of readers of such a group rated a message positively, it still wouldn't stand a chance against a lesser-ranked article with ten or twenty times as many readers (voters in this case). And this would be doubly unfortunate if the message contained something understandable to non-speakers of the language (for example, links to model photos). So: Let's rephrase the original requirements of the scoring/weighting/ ranking fuction... 1. A large set of equal votes, e.g., {+1,+1,+1,+1,+1,+1,+1,+1,+1,+1}, *must* produce a higher rating than a relatively small set of equal votes, e.g., {+1,+1,+1}. 2. However, a large set of equal votes should not produce a rating *too* much higher than a relatively small set of votes. 3. A single vote should count for *something* (it should move the object's rating away from 0 assuming a nonzero vote), but a single vote should not count -too- much. In particular, a single vote must *not* be able to "peg" either end of the rating scale (e.g., +1 or -1). 4. For all n>0, n equal votes of +1 must produce a rating slightly higher than n-1 equal votes of +1. As n -> infinity, the difference between such ratings for successive values of n should approach zero. OK, that ought to about do it. Now, the old way (dividing sum(V,1,n) by sqrt(n)) could produce arbitrarily large ratings. Most would tend to be beteween -10 and +10, but theoretically there is no bound on the magnitude of the numbers. This is a lot like the way Warp Factors used to work on the old Star Trek shows: You'd take the Warp Factor, cube it, and that's how many times the speed of light you were going (so, Warp 8 was a little more than twice as fast as Warp 6). The new way I'm considering (a modified classic average) is a lot like the new Warp Factors on Star Trek -- where you can't go faster than Warp 10. In fact, there isn't anything faster than Warp 10; you can get as close as you want to 10, but you can never reach it (except for the cheat they did in a hokey ST:Voy episode...anyway...) Warp 9.999 is now *much* faster than Warp 9.99, and Warp 9.99 is much faster than Warp 9.9, etc.... OK, how does that apply to averages? First, let's say we want to restrict the final rating to the unit interval [-1,+1] (+0.99 would be like Warp 9.9). Just taking a simple classic average sum(V,1,n)/n gives values in this range, but for small values of n (1 in particular, and often 2, and sometimes even 3 and 4), an average can often "peg" the scale at exactly -1 or exactly +1. That's not good -- and it's super-duper bad when n=1. So let's "soften" the average by multiplying it by a number greater than 0 but less than 1. And let's compute that number such that, for small values of n, that number is small, and for large values of n, that number gets very close to 1. This sounds like a simple single-variable hyperbolic curve of the 1/x variety -- except that we need it to approach 1 as x->infinity rather than 0 as x->infinity -- so let's try the variant 1 - 1/x. n 1 - 1/n ---- ----------- 1 0.000 2 0.500 3 0.667 4 0.750 5 0.800 6 0.833 7 0.857 : : 10 0.900 20 0.950 100 0.990 200 0.995 1000 0.999 That's pretty good! The only problem is that with n=1, 1-1/n comes out to zero, which means that the first vote would be multiplied by zero, giving it no impact at all (until other votes came in later). Let's try another variant...let's pretend that there are 2 votes when there is actually only 1, and 3 when there are only 2, and 4 when there are only 3, etc. So let's add 1 to n and try 1 - 1/(n+1) instead... n 1 - 1/(n+1) ---- ----------- 1 0.500 2 0.667 3 0.750 4 0.800 5 0.833 6 0.857 7 0.875 : : 10 0.909 20 0.952 200 0.995 1000 0.999 That's better! Now, having only one vote produces a nonzero output (just like we want), but not too strongly as to peg the scale at either end. Having two votes is even better, and having three is even better, etc. And now, having only 5 votes is only 0.157 different from having 100 votes -- still plenty enough of a difference to sort by ratings, but not so much of a difference as to go unnoticed by linear-threshold-based filters. OK, let's put this 1 - 1/(n+1) multiplier to work! (Here's where this gets really fun!...) The regular classic "average" function takes the sum of votes V[1] through V[n] and divides that by the number of votes n. sum(V,1,n) w = ---------- n But we don't want that -- that's very bad for small values of n. We want *almost* that, but we want it multiplied by this new 1 - 1/(n+1) softener: sum(V,1,n) ( 1 ) w = ---------- ( 1 - ------- ) n ( n + 1 ) That mess can be simplified: sum(V,1,n) ( n + 1 1 ) = ---------- ( ------- - ------- ) n ( n + 1 n + 1 ) sum(V,1,n) ( n ) = ---------- ( ------- ) n ( n + 1 ) sum(V,1,n) = ---------- n + 1 Now, that's a nice cool tiny formula, to be sure, but there's actually something extra-cool about it: Summing from 1 to n over a set V[1..n] of votes is the same as summing from 1 to n+1 over a set V[1..n+1] whose n+1'st member is always zero. In other words, we can write: sum(V,1,n) + 0 = -------------- , V[1..n] defined n + 1 sum(V,1,n+1) = -------------- , V[1..n+1] defined, with V[n+1]=0 n + 1 In other words, what's written above is actually *still* the classic average formula -- it's just disguised! To un-disguise it, we can simply proscribe that every object being voted upon has n votes from humans and one vote (always zero) from the system -- or, equivalently, that every object being voted upon has n votes, n-1 of which are from humans and one of which (always equal to zero) is from the system. That brings things full-circle back to this: sum(V,1,n) = ------------ , V[1..n] defined, with V[1]=0 n What amuses me most about this is that the 1 - 1/(n+1) softener can actually exist as *part of* the classic pure-linear averaging function just by a very slight and trivial alteration of the way votes are accumulated. --Todd

Message has 3 Replies:

		Re: Article scoring
In lugnet.admin.general, Todd Lehman writes: [snip a lot of math] (...) where in here does it take into account how many people are in the group? Can you weigh in the number of people who are subscribed to that group? Since they are the ones most (...) (26 years ago, 6-Mar-00, to lugnet.admin.general)

		Re: Article scoring
Wow, that's cool! The geek in me is waking up... So basically what you said here, is that every message gets one vote from the system as a softner - and that the vote is always zero. Wow. Again, cool. -Shiri (a real geek at heart) (26 years ago, 6-Mar-00, to lugnet.admin.general)

		Re: Article scoring
(...) Isn't that redundant? Steve Bliss (26 years ago, 6-Mar-00, to lugnet.off-topic.fun)

Message is in Reply To:

		Re: Article scoring
(...) Oh -- while I was out grocery shopping I just realized this: What is sqrt(n)? It's nothing more than a fancy way of writing n^(1/2). So: What if the exponent didn't have to be exactly 1/2, but instead was allowed to vary? Then what dynamics (...) (26 years ago, 4-Mar-00, to lugnet.admin.general)

52 Messages in This Thread:

Entire Thread on One Page:: Nested: All | Brief | Compact | Dots
Linear: All | Brief | Compact
This Message and its Replies on One Page:: Nested: All | Brief | Compact | Dots
Linear: All | Brief | Compact

Custom Search