Publishing : 1993


Publishing / 1993	1992 \| 1994

Subject:	Re: ???Question???
Newsgroups:	lugnet.publish, lugnet.admin.general
Date:	Fri, 19 May 2000 03:38:21 GMT
Viewed:	1796 times

In lugnet.publish, Matthew Miller writes: > Urg. It may not be obvious to those of you viewing this message with MS > Windows, but the above message isn't ascii text (or ISO 8859-1 Latin-1, > either -- even though the header claims it is!). It's Microsoft's > non-standard [1] character set. This makes the message look pretty weird > when viewed on a non-MS system -- all of the apostrophes show up as question > marks (or don't show up at all). > > Since asking everyone to not use Microsoft products to read LUGnet is > probably a bit harsh [2], Todd, how about automatically scanning for this > and correcting it when messages are posted? Hmmmm. I agree that it's pretty horrendous for plaintext, but I think that so-called "smart quotes" are a pretty great thing for HTML (as long as the correct standard character entities are output, of course! :) when done properly. What currently happens in the web interface when someone views a message with these is that they get mapped into HTML entities like this: 145 -->  146 -->  147 -->  148 -->  Unfortunately, those positions don't seem to be defined in HTML 3.2, so they'll only show up "correctly" (meaning, as intended by the author of the message) on non-MS systems when someone uses MS fonts or fonts with equivalent character mappings. I'm happy to see that HTML 4.0 defines[1] these... ‘ <==> ‘ (equivalent to 145) ’ <==> ’ (equivalent to 146) “ <==> “ (equivalent to 147) ” <==> ” (equivalent to 148) ...but I haven't tested these in popular browsers to see if they're worth using yet. I switched from  to ™ for the TM symbol a while back and that has worked well. > There's an already existing tool: > <http://www.fourmilab.ch/webtools/demoroniser/> > (That page also has more good info on the problem.) I looked quickly at the source (admittedly, not a thorough scouring); it looks like the mapping it applies is non-invertible, especially in the case of 147 and 148. :-( It may be better simply to reject MS-moronised messages altogether than to attempt to convert it at the receiving end, because at least that way the original meaning isn't destroyed. (Actually, I'm not in favor of either of those options anywhere near as much leaving the conversion up to each individual client on-the-fly at display-time.) --Todd [1] http://www.w3.org/TR/1998/REC-html40-19980424/sgml/entities.html

Message has 2 Replies:

		Re: ???Question???
do we need a admin.geek group? (...) (26 years ago, 19-May-00, to lugnet.publish, lugnet.admin.general)

		Re: ???Question???
(...) My understanding is that the ISO Latin 1 8-bit character set reserves those characters (among others) for control codes. I can't actually check, because the standard isn't available online (paper version costs about 56 CHF....). But this is (...) (26 years ago, 20-May-00, to lugnet.publish, lugnet.admin.general)

Message is in Reply To:

		Re: ???Question???
(...) Urg. It may not be obvious to those of you viewing this message with MS Windows, but the above message isn't ascii text (or ISO 8859-1 Latin-1, either -- even though the header claims it is!). It's Microsoft's non-standard [1] character set. (...) (26 years ago, 11-May-00, to lugnet.publish, lugnet.admin.general)

11 Messages in This Thread:

Entire Thread on One Page:: Nested: All | Brief | Compact | Dots
Linear: All | Brief | Compact
This Message and its Replies on One Page:: Nested: All | Brief | Compact | Dots
Linear: All | Brief | Compact

Custom Search