To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.generalOpen lugnet.general in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 General / 5703
5702  |  5704
Subject: 
Re: Perl-based bulk set IDer...
Newsgroups: 
lugnet.general, lugnet.admin.general
Date: 
Wed, 21 Jul 1999 06:24:53 GMT
Viewed: 
1187 times
  
In lugnet.general, Micah Jaffe writes:
[...]
Some notes: it's pretty dumb about scanning for sets.  It picks up any 3-4
digit numbers and assumes it's a Lego set id.  It's also pretty dumb about
scanning the HTML returned from the lugnet search query, meaning if the
format of the HTML were to change drastically, then the script is b0rken.
What can I say, it was something I whipped up while avoiding something less
interesting at work for a couple hours.  It also doesn't try in anyway try
to deal with id collisions (i.e. if you feed it 6077, it'll just give spit
it out as an unknown set, called "(???)").

Say, did you know about the "output=plain" option?  For example,

   http://www.lugnet.com/pause/search/?query=6848&output=plain

It's a simpler output mode which makes parsing easier, and it's much less
bandwidth (about 1/10 the page size).  Actually, rather than parsing, it
was really created for the purpose of just dumping the HTTP output to STDOUT.


This is a work independent of Lugnet and is not affiliated officially in
any way.  As far as I can see this doesn't violate any of the Lugnet Terms
of Use statement and I've tried to liberally acknowledge that all set
information is coming from Lugnet.

No, it doesn't directly violate any of the written terms of use, but it's
not always considered good netiquette to write scripts that query other
servers via HTTP, especially repeated bulk queries.  If it starts to bog
down the server, I'll have to block requests of that type, so please try
to keep it to reasonable levels.  (I've already had to block several poorly
written crawlers/robots which were trying to clomp through all the news
articles via HTTP at high speed -- a bad crawler nono).

If you plan to run queries frequently, I'd suggest downloading local copies
of the tab-delimited listings at <http://www.lugnet.com/pause/lists.html> and
then simply looking the info up from there rather than making across-the-net
HTML queries on-the-fly.

--Todd



Message has 2 Replies:
  Re: Perl-based bulk set IDer...
 
(...) parser I'd written to "canonicalize" my personal set database. (...) Even better. Will it be eventually tied in with parts inventories, too (I hope I hope)? Also - how frequently are the set lists updated? -Tim (25 years ago, 23-Jul-99, to lugnet.general, lugnet.admin.general)
  Re: Perl-based bulk set IDer...
 
Todd Lehman: (...) to "text/plain"? (right now it says "text/html") Play well, Jacob ---...--- -- E-mail: sparre@cats.nbi.dk -- -- Web...: <URL:(URL) -- ---...--- (25 years ago, 26-Jul-99, to lugnet.general, lugnet.admin.general)

Message is in Reply To:
  Perl-based bulk set IDer...
 
Hello fellow Lego freaks^H^H^H^H^Hans, (Warning this may appeal to Unix and Perl geeks only...) Not long ago when I was trying in a feverish sort of way to re-establish a Lego collection that I dreamt of as a kid, I put together a Perl script that (...) (25 years ago, 21-Jul-99, to lugnet.general)

6 Messages in This Thread:


Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact

This Message and its Replies on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR