To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.publishOpen lugnet.publish in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 Publishing / 1994
1993  |  1995
Subject: 
Re: ???Question???
Newsgroups: 
lugnet.publish, lugnet.admin.general
Date: 
Fri, 19 May 2000 09:40:03 GMT
Viewed: 
558 times
  
do we need a admin.geek group?

In lugnet.admin.general, Todd Lehman writes:
In lugnet.publish, Matthew Miller writes:
Urg. It may not be obvious to those of you viewing this message with MS
Windows, but the above message isn't ascii text (or ISO 8859-1 Latin-1,
either -- even though the header claims it is!). It's Microsoft's
non-standard [1] character set. This makes the message look pretty weird
when viewed on a non-MS system -- all of the apostrophes show up as question
marks (or don't show up at all).

Since asking everyone to not use Microsoft products to read LUGnet is
probably a bit harsh [2], Todd, how about automatically scanning for this
and correcting it when messages are posted?

Hmmmm.  I agree that it's pretty horrendous for plaintext, but I think that
so-called "smart quotes" are a pretty great thing for HTML (as long as the
correct standard character entities are output, of course! :) when done
properly.

What currently happens in the web interface when someone views a message with
these is that they get mapped into HTML entities like this:

  145 --> ‘
  146 --> ’
  147 --> “
  148 --> ”

Unfortunately, those positions don't seem to be defined in HTML 3.2, so
they'll only show up "correctly" (meaning, as intended by the author of
the message) on non-MS systems when someone uses MS fonts or fonts with
equivalent character mappings.

I'm happy to see that HTML 4.0 defines[1] these...

  &lsquo;  <==>   &#8216;   (equivalent to 145)
  &rsquo;  <==>   &#8217;   (equivalent to 146)
  &ldquo;  <==>   &#8220;   (equivalent to 147)
  &rdquo;  <==>   &#8221;   (equivalent to 148)

...but I haven't tested these in popular browsers to see if they're worth
using yet.  I switched from &#153; to &#8482; for the TM symbol a while back
and that has worked well.


There's an already existing tool:
<http://www.fourmilab.ch/webtools/demoroniser/>
(That page also has more good info on the problem.)

I looked quickly at the source (admittedly, not a thorough scouring); it
looks like the mapping it applies is non-invertible, especially in the case
of 147 and 148.  :-(

It may be better simply to reject MS-moronised messages altogether than to
attempt to convert it at the receiving end, because at least that way the
original meaning isn't destroyed.  (Actually, I'm not in favor of either of
those options anywhere near as much leaving the conversion up to each
individual client on-the-fly at display-time.)

--Todd

[1] http://www.w3.org/TR/1998/REC-html40-19980424/sgml/entities.html



Message is in Reply To:
  Re: ???Question???
 
(...) Hmmmm. I agree that it's pretty horrendous for plaintext, but I think that so-called "smart quotes" are a pretty great thing for HTML (as long as the correct standard character entities are output, of course! :) when done properly. What (...) (24 years ago, 19-May-00, to lugnet.publish, lugnet.admin.general)

11 Messages in This Thread:




Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR