Subject:
|
Re: LEGOfactory.com launches... and it's big!
|
Newsgroups:
|
lugnet.cad
|
Date:
|
Sat, 27 Nov 2004 00:54:22 GMT
|
Viewed:
|
986 times
|
| |
| |
In lugnet.cad, Chris Dee wrote:
> There are problems with the XML files. The encoding is defined as UTF-8, yet the
> Danish language descriptions contain accented characters (ASCII values > 127)
> which are not supported by UTF-8, so cause IE and Perl XML::Simple to choke.
> Changing the encoding to ISO-8859-1 makes them much easier to process
> programatically.
>
> Chris
You're right, the files aren't UTF-8. If anyone is interested, you can convert
ISO-8859-1 text to UTF-8 like this:
characters from 0-127 (0x00-0x7F) remain the same
characters from 128-191 (0x80-0xBF) are prefixed by 194 (0xC2) but otherwise the
same
characters from 192-255 (0xC0-0xFF) are prefixed by 195 (0xC3) and have 64
(0x40) subtracted from them, so they fall in the 0x80-0xBF range.
BTW, at first I thought you meant to say that UTF-8 doesn't support accented
characters, but then I figured out what you mean. To clarify: UTF-8 supports
all Unicode characters from U+0000 to U+10FFFF, but anything above U+007F needs
to be prefixed. The prefixes range from 0xC2-0xF4, and the characters
themselves range from 0x80-0xBF, so the only bytes not valid in UTF-8 are 0xC0,
0xC1, and 0xF5-0xFF. (in the LDD xml files, there are a few each of 0xF6, 0xF8,
and 0xFC)
Andy
|
|
Message is in Reply To:
| | Re: LEGOfactory.com launches... and it's big!
|
| (...) There are problems with the XML files. The encoding is defined as UTF-8, yet the Danish language descriptions contain accented characters (ASCII values > 127) which are not supported by UTF-8, so cause IE and Perl XML::Simple to choke. (...) (20 years ago, 25-Nov-04, to lugnet.cad)
|
2 Messages in This Thread:
- Entire Thread on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
|
|
|
|