Subject:
|
Re: Raw FAQ data format (Was: Format of FAQ items)
|
Newsgroups:
|
lugnet.faq
|
Date:
|
Sun, 25 Apr 1999 04:15:25 GMT
|
Viewed:
|
1765 times
|
| |
| |
In lugnet.faq, jsproat@geocities.com (Sproaticus) writes:
> > - The content of the entries should be marked up as a
> > subset of HTML ("lynx -dump" is a possible tool for
> > translation to plain text).
>
> Or some other tool; but I agree, a well-defined subset of HTML can
> and should be used.
Oh man, I'm HOT on "lynx -dump -force_html"!! It doesn't do an absolutely
perfect perfect job, but it comes *so* close, and I'll bet it can get even
closer by specifying a custom config file on the command line.
> > - ASCII + HTML entities are allowed in the headers.
>
> At least the ® -style chars. I don't see much need for more HTML in the
> headers.
Agreed -- only &xxx; entities ought to be allowed in the headers, IMO...
And if the content charset is Latin-1 instead of pure 7-bit ASCII, then this
can be further reduced to < > " &.
> > Now I have some questions and ideas:
> > - Should we use ASCII or Latin-1 for the content character
> > set?
> > - The content should of cause not be a full HTML document.
>
> Both of these are starting to get over my head. My knee-jerk reaction to the
> ASCII question is to just use the lower 128 (not counting the very lowest 32
> of course :-), and use some form of encoding for any other characters -- at
> least for the raw FAQ format.
Hmm. Well, either way, the following three characters will have to be
written as entities:
& => &
< => <
> => >
and *perhaps* the double-quote character should be forced to be written as
an entity as well:
" => "
But apart from those, wouldn't it simplify editing a ton (and make it much
much safer) if characters above 128 were just written directly in their
Latin-1 encoding, i.e.--?
® instead of ®
å instead of å
ü instead of ü
ñ instead of ñ
I can convert HTML <=> Latin-1 extremely easily on the fly.
--Todd
|
|
Message has 1 Reply: | | Re: Raw FAQ data format (Was: Format of FAQ items)
|
| (...) If we ban HTML _elements_ from the headers, then we don't need to escape '<' and '>'. There has never been a need to escape '"'. If we want to allow numeric character references outside Latin-1 (like '̥') we still have to escape (...) (26 years ago, 26-Apr-99, to lugnet.faq)
|
Message is in Reply To:
| | Re: Raw FAQ data format (Was: Format of FAQ items)
|
| (...) Sounds mostly good. Catch my exceptions down below. (...) Or some other tool; but I agree, a well-defined subset of HTML can and should be used. (...) (Please keep in mind Jacob, that these are nits I'm picking. :-) "Newsgroups" would be more (...) (26 years ago, 24-Apr-99, to lugnet.faq)
|
82 Messages in This Thread:
- Entire Thread on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
This Message and its Replies on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
|
|
|
|