|
Todd, I've read several messages from the thread, but not the all of them,
so please excuse me if it is already mentioned.
I think the way web browsers automatically handles URLs in the message body
is also suitable for this mission. if you type the whole URL starting with
"http://" ,then the browser makes the sentence a hypertext link, but if you
only type "lugnet.com" they do nothing.. So, in the same way, if the poster
want his/her set numbers mentioned in the message to be automatically shown
somewhere in the page, he/she should use a format which could be identified
by your scripts, like (I'm just making it up):
"I've find a 6331 for 10 USD last week in blah blah.." ---do nothing
"I've find a //6331// for 10 USD last week in blah blah.." ---show its
picture thumbnail.
Selçuk
Todd Lehman wrote in message <3788db9e.1624282@lugnet.com>...
> In lugnet.publish, Kevin Loch <kloch@NOSPMkl.net> writes:
> > That's a great idea! I would love to see this! How about detecting any
> > three or four digit numbers and linking them to the search page.
> >
> > just run this every 5-10 min.:
> >
> > #!/bin/sh
> > filetmp1=/tmp/hypertmp1
> > filetmp2=/tmp/hypertmp2
> > filelist=/tmp/hyperlist
> > newsroot=/var/news/
> > #
> > cd ${newsroot}
> > du -a > ${filelist}
> > for f in ${filelist}
> > do
> > sed 's/[0-9][0-9][0-9][0-9]/<a
> > href=http:\/\/www.lugnet.com\/pause\/search\/?=&><\/a>/g' ${f} >
> > ${filetmp1}
> > sed 's/[0-9][0-9][0-9]/<a
> > href=http:\/\/www.lugnet.com\/pause\/search\/?=&><\/a>/g' ${filetmp1} >
> > ${filetmp2}
> > mv ${filetmp} ${f}
> > done
> >
> > That would be really REALLY cool!
>
>
> Yikes! -- Kevin, surely you jest! :)
>
>
> Problems with the general approach above:
>
> * Iterates over tens of thousands of files every few minutes, rewriting
> each one whether it needs to or not. Doesn't track which articles
> have and have not yet been modified. Wastes 1-2 minutes of CPU time
> on each run and does thousands of unnecessary disk I/O operations.
>
> * Destroys the original article content as originally posted by the
> user. Makes it impossible to tell whether the original article had
> just a number or a full hyperlink. Makes it impossible to revert to
> the original content if a bug is discovered. Also destroys the
> original file timestamp.
>
> * Turns most lines of text into unreadable >80 column messes.
>
> * Rewrites the original raw NNTP article with embedded HTML, which
> won't display correctly on correctly working newsreaders because the
> article content-type is still text/plain.
>
> * Rewrites the text with hard-coded URLs that are subject to change.
>
> * Only updates articles periodically rather than instantaneously. Someone
> view the article on the homepage 2 minutes after it's posted and they
> don't see the hyperlinks. Someone else views it 10 minutes later and
> they -do- see the hyperlinks. (This is the problem that the threading
> display currently has -- it's on a 1-minute cron job.)
>
>
> Bugs in the code above:
>
> * Doesn't check whether the previous invocation from cron is still
> running or has completed. If any run were to take longer than the
> cron interval, two copies would then run simultaneously, and then
> three, and then four, and within a few hours the system could crash.
>
> * Matches 3- and 4-digit numbers within words, for example "12345678"
> or "foo6991bar". Worse, matches same within URLs that it itself
> wrote earlier, causing unrestrained and infinite expansion.
>
>
> Most of these problems could be avoided by processing the article right as
> it arrives rather than during a periodic cron sweep later, but altering the
> body-content of the actual article is still playing with fire.
>
> --Todd
>
> p.s. (Don't get me wrong -- I agree in theory that having links somehow
> would be a nice thing...I've thought this through carefully in the past many
> times before and don't have a good solution yet.)
|
|
Message is in Reply To:
12 Messages in This Thread:
- Entire Thread on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
|
|
|
|