To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.faqOpen lugnet.faq in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 FAQ / 83
82  |  84
Subject: 
Re: Raw FAQ data format (Was: Format of FAQ items)
Newsgroups: 
lugnet.faq
Date: 
Sat, 24 Apr 1999 16:58:01 GMT
Viewed: 
1685 times
  
In lugnet.faq, Jacob Sparre Andersen writes:
I'll try to stick to the raw data format here, and list the
ideas I have been able to extract from the discussion.

Sounds mostly good.  Catch my exceptions down below.

- The content of the entries should be marked up as a
  subset of HTML ("lynx -dump" is a possible tool for
  translation to plain text).

Or some other tool; but I agree, a well-defined subset of HTML can and should
be used.

- These header entries have been suggested:
     Subject          [the question]
     Category         [category (and sub-category?) name]
     Content-Language [ISO 639 language code]
     Topic-Level      [integer, 0 is beginner/easy/simple]
     Version:         [author and ISO date]
     Newsgroups:      [comma-separated list of newsgroups
                       the question could appear in]
     Translation:     [translator, from language, latest
                       version string from the translated
                       entry]

(Please keep in mind Jacob, that these are nits I'm picking.  :-)

"Newsgroups" would be more appropriately named "Location", indicating not just
a ng but specific directories in the LUGNET data heirarchy.  Also,
something like "Original-Language" makes more sense than "Translation".

Also, I'm leaning more towards "Revision" rather than "Version".  BTW, what
*is* the format of an ISO date?  Good idea Jacob -- we might as well fit
within existing standards.  :-,

Plus:
       Include:         [applies the headers of the included file]

- ASCII + HTML entities are allowed in the headers.

At least the ® -style chars.  I don't see much need for more HTML in the
headers.

Now I have some questions and ideas:
- Should we use ASCII or Latin-1 for the content character
  set?
- The content should of cause not be a full HTML document.

Both of these are starting to get over my head.  My knee-jerk reaction to the
ASCII question is to just use the lower 128 (not counting the very lowest 32
of course :-), and use some form of encoding for any other characters -- at
least for the raw FAQ format.

Cheers,
- jsproat



Message has 2 Replies:
  Re: Raw FAQ data format (Was: Format of FAQ items)
 
(...) Oh man, I'm HOT on "lynx -dump -force_html"!! It doesn't do an absolutely perfect perfect job, but it comes *so* close, and I'll bet it can get even closer by specifying a custom config file on the command line. (...) Agreed -- only &xxx; (...) (25 years ago, 25-Apr-99, to lugnet.faq)
  Re: Raw FAQ data format (Was: Format of FAQ items)
 
Sproaticus: [...] (...) Fine. Location: [comma-separated list of Lugnet relative URI's] (...) What about Translated-From: [ISO 639 language code] Translator: [translator, ISO date] so Revision: Todd Lehman, 1997-12-24 Revision: Minx Kelly, (...) (25 years ago, 26-Apr-99, to lugnet.faq)

Message is in Reply To:
  Raw FAQ data format (Was: Format of FAQ items)
 
Todd Lehman (lehman@javanet.com) and Sproaticus (jsproat@geocities.com) writes lots of stuff: Guys! It sounds like you are mixing the raw data format and the presentation format. I'll try to stick to the raw data format here, and list the ideas I (...) (25 years ago, 24-Apr-99, to lugnet.faq)

82 Messages in This Thread:
























Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact

This Message and its Replies on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR