Publishing : 1996


Publishing / 1996	1995 \| 1997

Subject:	Re: ???Question???
Newsgroups:	lugnet.publish, lugnet.admin.general
Date:	Sat, 20 May 2000 03:36:49 GMT
Reply-To:	mattdm@mattdm.org{avoidspam}
Viewed:	685 times

Todd Lehman <lehman@javanet.com> wrote: > 145 -->  > 146 -->  > 147 -->  > 148 -->  > Unfortunately, those positions don't seem to be defined in HTML 3.2, so > they'll only show up "correctly" (meaning, as intended by the author of > the message) on non-MS systems when someone uses MS fonts or fonts with > equivalent character mappings. My understanding is that the ISO Latin 1 8-bit character set reserves those characters (among others) for control codes. I can't actually check, because the standard isn't available online (paper version costs about 56 CHF....). But this is certainly the case for Unicode. Those characters are: 145 -> Private use one 146 -> Private use two 147 -> Set Transmit State 148 -> Cancel Character (Ref: <http://charts.unicode.org/PDF/U0080.pdf>) > I'm happy to see that HTML 4.0 defines[1] these... > ‘ <==> ‘ (equivalent to 145) > ’ <==> ’ (equivalent to 146) > “ <==> “ (equivalent to 147) > ” <==> ” (equivalent to 148) = Unicode <http://charts.unicode.org/PDF/U2000.pdf>. > I looked quickly at the source (admittedly, not a thorough scouring); it > looks like the mapping it applies is non-invertible, especially in the case > of 147 and 148. :-( Mapping them to the Unicode entities may be preferable. They work for me in Navigator 4.7 on Win98 (have to wait til I get home to test on Linux). But even if it doesn't work on some platforms yet, at least it's breaking because the client isn't yet up to standards. One the news side -- NNTP is technically 7-bit ascii, but almost always is 8-bit clean, and people certainly treat it that way. RFC 2130 (is there a more recent document on this topic?) suggests that news messages specify the charater set they are using in the header -- unfortunately, MS products actually *lie*. > It may be better simply to reject MS-moronised messages altogether than to > attempt to convert it at the receiving end, because at least that way the > original meaning isn't destroyed. (Actually, I'm not in favor of either of > those options anywhere near as much leaving the conversion up to each > individual client on-the-fly at display-time.) How is the client supposed to know that it is to do conversion? One partial fix would be to correct wrong headers to say "MS-Latin-1".... -- Matthew Miller ---> mattdm@mattdm.org Quotes 'R' Us ---> http://quotes-r-us.org/ Boston University Linux ---> http://linux.bu.edu/

Message has 1 Reply:

		Re: ???Question???
PS: if my tone seems annoyed or even antagonistic in the past few messages, it's not at anyone here -- it's at Microsoft. I try to avoid MS-bashing as much as I can, but this is blatently evil [1]. It's like the Kerberos thing, but arguably worse -- (...) (24 years ago, 20-May-00, to lugnet.publish, lugnet.admin.general)

Message is in Reply To:

		Re: ???Question???
(...) Hmmmm. I agree that it's pretty horrendous for plaintext, but I think that so-called "smart quotes" are a pretty great thing for HTML (as long as the correct standard character entities are output, of course! :) when done properly. What (...) (24 years ago, 19-May-00, to lugnet.publish, lugnet.admin.general)

11 Messages in This Thread:

Entire Thread on One Page:: Nested: All | Brief | Compact | Dots
Linear: All | Brief | Compact
This Message and its Replies on One Page:: Nested: All | Brief | Compact | Dots
Linear: All | Brief | Compact

Custom Search