To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.faqOpen lugnet.faq in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 FAQ / 88
87  |  89
Subject: 
Re: Raw FAQ data format (Was: Format of FAQ items)
Newsgroups: 
lugnet.faq
Date: 
Sun, 25 Apr 1999 04:15:25 GMT
Viewed: 
1524 times
  
In lugnet.faq, jsproat@geocities.com (Sproaticus) writes:
- The content of the entries should be marked up as a
  subset of HTML ("lynx -dump" is a possible tool for
  translation to plain text).

Or some other tool; but I agree, a well-defined subset of HTML can
and should be used.

Oh man, I'm HOT on "lynx -dump -force_html"!!  It doesn't do an absolutely
perfect perfect job, but it comes *so* close, and I'll bet it can get even
closer by specifying a custom config file on the command line.


- ASCII + HTML entities are allowed in the headers.

At least the ® -style chars.  I don't see much need for more HTML in the
headers.

Agreed -- only &xxx; entities ought to be allowed in the headers, IMO...
And if the content charset is Latin-1 instead of pure 7-bit ASCII, then this
can be further reduced to < > " &.


Now I have some questions and ideas:
- Should we use ASCII or Latin-1 for the content character
  set?
- The content should of cause not be a full HTML document.

Both of these are starting to get over my head.  My knee-jerk reaction to the
ASCII question is to just use the lower 128 (not counting the very lowest 32
of course :-), and use some form of encoding for any other characters -- at
least for the raw FAQ format.

Hmm.  Well, either way, the following three characters will have to be
written as entities:

   &  =>  &
   <  =>  &lt;
   >  =>  &gt;

and *perhaps* the double-quote character should be forced to be written as
an entity as well:

   "  =>  &quot;

But apart from those, wouldn't it simplify editing a ton (and make it much
much safer) if characters above 128 were just written directly in their
Latin-1 encoding, i.e.--?

   ®  instead of  &reg;
   å  instead of  &aring;
   ü  instead of  &uuml;
   ñ  instead of  &ntilde;

I can convert  HTML <=> Latin-1  extremely easily on the fly.

--Todd



Message has 1 Reply:
  Re: Raw FAQ data format (Was: Format of FAQ items)
 
(...) If we ban HTML _elements_ from the headers, then we don't need to escape '<' and '>'. There has never been a need to escape '"'. If we want to allow numeric character references outside Latin-1 (like '&#805;') we still have to escape (...) (25 years ago, 26-Apr-99, to lugnet.faq)

Message is in Reply To:
  Re: Raw FAQ data format (Was: Format of FAQ items)
 
(...) Sounds mostly good. Catch my exceptions down below. (...) Or some other tool; but I agree, a well-defined subset of HTML can and should be used. (...) (Please keep in mind Jacob, that these are nits I'm picking. :-) "Newsgroups" would be more (...) (25 years ago, 24-Apr-99, to lugnet.faq)

82 Messages in This Thread:
























Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact

This Message and its Replies on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR