To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.loc.auOpen lugnet.loc.au in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 Local / Australia / 10059
10058  |  10060
Subject: 
Re: .loc.au stats for October
Newsgroups: 
lugnet.loc.au
Date: 
Tue, 5 Nov 2002 23:37:12 GMT
Viewed: 
1023 times
  
My concern here with taking on the Italians is that they post in Italian.
Therefore any quality metric has to be multi lingual. That goes without
saying so that's presumably why Kerry didn't say it, but I needed to boost
my new/quoted material ratio, you know.

I had thought of discussing the I18N issues but decided against it for two
cogent reasons:

1. if the Italians were not alerted to it, we could have quickly slipped the
quality word metrics into place using only English -- thus all postings in
Italian would rate as 0 quality. Of course, this would be cheating but if we
want to beat the Italians, we may need to play the ungame and not just the
game.

2. if I covered every aspect in my original post, there would have been no
opportunity for high quality scoring follow-ups such as Larry's

More importantly though is that the metric needs to take into account the
variance between languages in such areas as average word length, average
sentence length, average vocabulary and so forth, so that posts in German
don't unfairly score higher than they deserve because they use words like
"Hauptbahnhoffwurstyunge" where English would use "central station sausage
seller" in everyday usage.

Or in parts of North America, the even more colloquial  "hotdog stand" might be
used, and that would clearly be a loser in any quality metric.

But on the other hand, any posting written in German is likely to be of higher
quality than one written in English. Frankly anyone who can maintain a
vocabulary of words with such variety and length as those in German clearly
isn't going to waste that learning by merely saying "me too" (or appropriately
translated) in LUGnet. As you rightly point out, German has a specific word for
such a precise concept as "Hauptbahnhoffwurstyunge" instead the far more
pedestrian English version requiring 4 words of pretty average quality.
Similarly, there are words in German such as "infobahnwurstyune" for which
there is no known English translation as English speakers just cannot
conceptualise such a thing. Surely a German posting using such a word ought to
rate more highly than an English posting which cannot even express the concept.

Indeed, increasing the quality ratings on a newsgroup can be achieved either
by
sending many high-quality posts, or by discouraging the sending of • low-quality
postings.

Yes but how do you do this? Who bells the cat?

Well, I like to lead by example here, Larry. Look at the responses to my
posting on quality metrics. Do you see quality responses? Yes! Do you see
anyone mentioning "wasssssuuuuupppp" or "insectoids"? No! If enough of us do
our bit by sending quality posts, then with luck, we will intellectually
intimidate the low-quality posters into purchasing a dictionary (or at least
finding one on-line) before replying. It's hard to say "me too" if you haven't
the faintest idea what you are agreeing to.

++Lar (who in college once wrote a program to count average word length,
sentence length and word frequency, and who then fed the corpus of Genesis
lyrics in as well as the corpus of Foreigner lyrics, and entirely
straightfacedly, asserted to his linguistics class that, based on the
analysis, (wider word usage, less repetition, longer sentence length)
Genesis was a more literate band than Foreigner, and who therefore has some
sympathy for this undertaking of Kerry's)

Here at DSTC, we have a group that does research into word adjacency in various
text corpii (yes, it is a marvellous word, isn't it?) in order to intuite the
semantic context of the text. The motivation for this work is to then create
search engines that, instead of trying to match the actual keywords supplied in
a WWW page, translates the keywords into a semantic context vector and then
matches the WWW pages against the vector. So, if you typed in "Antarctic bird
breeding", it could match a page that did not mention any of those 3 word but
did mention "penguin nesting", based on the observed high level of adjacency of
such words in many corpii of text. I don't work in that area myself, but I do
attend their seminars etc.

Kerry



Message has 2 Replies:
  Re: .loc.au stats for October
 
You are of course correct. "Me too" posts are generally of a fairly low quality. I suggest that we use "I concur" instead, or "you are of course correct." Regarding online dictionaries: I use a little .exe file that latches itself to IE. When you (...) (22 years ago, 6-Nov-02, to lugnet.loc.au)
  Re: .loc.au stats for October
 
(...) It would appear it's working (URL) .debate be far behind? And then .general? The mind boggles. Good show, Kerry! (22 years ago, 6-Nov-02, to lugnet.loc.au)

Message is in Reply To:
  Re: .loc.au stats for October
 
(...) My concern here with taking on the Italians is that they post in Italian. Therefore any quality metric has to be multi lingual. That goes without saying so that's presumably why Kerry didn't say it, but I needed to boost my new/quoted material (...) (22 years ago, 5-Nov-02, to lugnet.loc.au)

21 Messages in This Thread:











Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact

This Message and its Replies on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR