Subject:
|
Re: .loc.au stats for October
|
Newsgroups:
|
lugnet.loc.au
|
Date:
|
Tue, 5 Nov 2002 14:02:09 GMT
|
Viewed:
|
1106 times
|
| |
| |
In lugnet.loc.au, Kerry Raymond writes:
> > Of course, to do this we'd need to work out how to measure the quality per
> > word or post quality values...
>
> The quality of a post can be coarsely approximated by applying quality metrics
> to each individual word, giving an overall posting content quality. So, say, a
> message involving quality words like "sensate", "post-expressionalism" and
> "45.7%" would generate a higher quality score than a message involving words
> like "lots", "stuff", and "wow". Indeed, one can even apply severe negative
> volume metrics to such words as "wasssssssssuuupppp" and "insectoids", thus
> reducing such any postings involving such words to hideously low levels,
> unredeemable even in such contexts as "45.7% of all insectoid sets incorporate
> post-expressionalism detectable to the more sensate builder". Any message
> involving lots of numbers is almost certainly good, as these are either set
> numbers, part numbers, or Richie's monthly statistics, so numbers will
> generate
> high quality word ratings.
>
> However, the posting content quality is not the final score. You must then
> multiply it by bytes of new content divided by the bytes of included quoted
> content. This ensures that "Me too" messages are forced down the quality >metric
> no matter how frequently post-expressionalism is mentioned.
>
> Then you take any quoted content and recursively apply the quality analysis to
> it, and then compare the quality of the new content with the quoted content.
> The quality of the quoted content is then a ceiling on the possible quality of
> the new content. This is intended to discourage the proliferation of threads >of
> low quality by downgrading all subsequent contributions no matter how sensate.
>
> This produces the primary quality score for the posting.
>
> Finally some people are fundamentally low-quality authors of postings, and >this
> must be incorporated to produce the modified quality score. This cannot be >done
> initially but can be introduced once a certain volume of primary quality score
> data has been collected. By determining average primary quality for a given
> author, one can determine an author quality as a moving average. You then take
> the ratio of specific author quality divided by average author quality to get
> the author-multiplier which is then (as the name suggests) multiplied to the
> primary quality score to produce the modified quality score.
>
> Note. It is very important to determine the specific author quality and >average
> author quality scores from the *primary* quality scores, and not from the
> modified quality scores. As can be appreciated, the use of modified quality
> scores to determine specific author quality and average author quality will
> introduce unbounded escalating feedback.
>
> Having determined the modified quality score for each posting, it is then
> entirely mechanical to determine the total quality of posts by that author in
> any given time period (e.g. a month) as well as to derive the average quality
> of posts by that author over the same time period.
>
> However, over a sufficiently long period, it is likely that the average >quality
> of posts over a given period is likely to approximate the author-multiplier
> (as
> a relative but not absolute scale). This makes it very difficult for anyone to
> significantly lift their game quality-wise. So, the use of moving average for
> computing the author-multiplier will need (over time, but probably not
> immediately) to incorporate an aging of older post quality data, probably >based
> on some kind of inverse Poisson differential decay. I'm not sure what chi->value
> to use to stretch the probability curve, but I would think we'd be aiming for >a
> half-life of around 3 months, so probably something in the range of 1.5 to 2
> should be OK (using base "e" not 10, of course).
>
> So, while some people will argue that quantitative evaluation of quality is
> fundamentally flawed no matter what interpretation of Chomsky conceptual
> analysis you use, I say (quoting from The Matrix here) "never send a person to
> do a machine's job". My method can be entirely automated (essential for >ongoing
> maintenance) and we could have aggregated ratings on each newsgroup, allowing
> the LUGnet traffic page to be based on quality rather than quantity metrics.
> Then when loc.au wants to take on the Italians (say), then we have to do it
> based on superior quality of postings and not just hitting send a lot.
My concern here with taking on the Italians is that they post in Italian.
Therefore any quality metric has to be multi lingual. That goes without
saying so that's presumably why Kerry didn't say it, but I needed to boost
my new/quoted material ratio, you know.
More importantly though is that the metric needs to take into account the
variance between languages in such areas as average word length, average
sentence length, average vocabulary and so forth, so that posts in German
don't unfairly score higher than they deserve because they use words like
"Hauptbahnhoffwurstyunge" where English would use "central station sausage
seller" in everyday usage.
Clearly English would be penalised here for using shorter words. Although it
should be noted, there would be some advantage from increased sentence
length, and a full statistical analysis would be needed to determine whether
this was a sufficient offset.
> Indeed, increasing the quality ratings on a newsgroup can be achieved either
> by
> sending many high-quality posts, or by discouraging the sending of low-quality
> postings.
Yes but how do you do this? Who bells the cat?
> Thus some newsgroups could achieve a higher quality aggregate through
> having no postings than most of the market newsgroups (which are likely to
> generate large negative aggregates).
>
> Waddya reckon?
Good start but this should have been posted to off-topic.geek...
++Lar (who in college once wrote a program to count average word length,
sentence length and word frequency, and who then fed the corpus of Genesis
lyrics in as well as the corpus of Foreigner lyrics, and entirely
straightfacedly, asserted to his linguistics class that, based on the
analysis, (wider word usage, less repetition, longer sentence length)
Genesis was a more literate band than Foreigner, and who therefore has some
sympathy for this undertaking of Kerry's)
|
|
Message has 1 Reply: | | Re: .loc.au stats for October
|
| (...) I had thought of discussing the I18N issues but decided against it for two cogent reasons: 1. if the Italians were not alerted to it, we could have quickly slipped the quality word metrics into place using only English -- thus all postings in (...) (22 years ago, 5-Nov-02, to lugnet.loc.au)
|
Message is in Reply To:
| | Re: .loc.au stats for October
|
| (...) The quality of a post can be coarsely approximated by applying quality metrics to each individual word, giving an overall posting content quality. So, say, a message involving quality words like "sensate", "post-expressionalism" and "45.7%" (...) (22 years ago, 5-Nov-02, to lugnet.loc.au)
|
21 Messages in This Thread:
- Entire Thread on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
This Message and its Replies on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
|
|
|
|