Subject: 
  | 
            Re: Raw FAQ data format (Was: Format of FAQ items)
  | 
             
            Newsgroups: 
  | 
            lugnet.faq
  | 
             
            Date: 
  | 
            Sun, 25 Apr 1999 04:15:25 GMT
  | 
             
            Viewed: 
  | 
            2892 times
  | 
              
     |      | 
             |       |  
      In lugnet.faq, jsproat@geocities.com (Sproaticus) writes: 
> > - The content of the entries should be marked up as a 
> >   subset of HTML ("lynx -dump" is a possible tool for 
> >   translation to plain text). 
>  
> Or some other tool; but I agree, a well-defined subset of HTML can 
> and should be used. 
 
Oh man, I'm HOT on "lynx -dump -force_html"!!  It doesn't do an absolutely 
perfect perfect job, but it comes *so* close, and I'll bet it can get even 
closer by specifying a custom config file on the command line. 
 
 
> > - ASCII + HTML entities are allowed in the headers. 
>  
> At least the ® -style chars.  I don't see much need for more HTML in the 
> headers. 
 
Agreed -- only &xxx; entities ought to be allowed in the headers, IMO... 
And if the content charset is Latin-1 instead of pure 7-bit ASCII, then this 
can be further reduced to < > " &. 
 
 
> > Now I have some questions and ideas: 
> > - Should we use ASCII or Latin-1 for the content character 
> >   set? 
> > - The content should of cause not be a full HTML document. 
>  
> Both of these are starting to get over my head.  My knee-jerk reaction to the 
> ASCII question is to just use the lower 128 (not counting the very lowest 32 
> of course :-), and use some form of encoding for any other characters -- at 
> least for the raw FAQ format. 
 
Hmm.  Well, either way, the following three characters will have to be 
written as entities: 
 
   &  =>  & 
   <  =>  < 
   >  =>  > 
 
and *perhaps* the double-quote character should be forced to be written as 
an entity as well: 
 
   "  =>  " 
 
But apart from those, wouldn't it simplify editing a ton (and make it much 
much safer) if characters above 128 were just written directly in their 
Latin-1 encoding, i.e.--? 
 
   ®  instead of  ® 
   å  instead of  å 
   ü  instead of  ü 
   ñ  instead of  ñ 
 
I can convert  HTML <=> Latin-1  extremely easily on the fly. 
 
--Todd 
 |  
       |  
           
   
        Message has 1 Reply:        |    | Re: Raw FAQ data format (Was: Format of FAQ items)
  |  
  |  (...) If we ban HTML _elements_ from the headers, then we don't need to escape '<' and '>'. There has never been a need to escape '"'. If we want to allow numeric character references outside Latin-1 (like '̥') we still have to escape (...)   (27 years ago, 26-Apr-99, to lugnet.faq)   
   |         
        Message is in Reply To:
            |    | Re: Raw FAQ data format (Was: Format of FAQ items)
  |  
  |  (...) Sounds mostly good. Catch my exceptions down below. (...) Or some other tool; but I agree, a well-defined subset of HTML can and should be used. (...) (Please keep in mind Jacob, that these are nits I'm picking. :-) "Newsgroups" would be more (...)   (27 years ago, 24-Apr-99, to lugnet.faq)   
   |         
      82 Messages in This Thread:                   
                   
                        
            
             
           
      
              
            
                      
               
                          
               
           
           
                           
             
           
             
           
         
      
         
        
 
      - Entire Thread on One Page:
      
        
- Nested: 
        All | Brief | Compact | Dots
        
 Linear: 
        All | Brief | Compact
          This Message and its Replies on One Page:
      
        - Nested: 
        All | Brief | Compact | Dots
        
 Linear: 
        All | Brief | Compact
           
         | 
        
  | 
      
 
   | 
           |