To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.cadOpen lugnet.cad in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 CAD / 12169
12168  |  12170
Subject: 
Re: LEGOfactory.com launches... and it's big!
Newsgroups: 
lugnet.cad
Date: 
Sat, 27 Nov 2004 00:54:22 GMT
Viewed: 
928 times
  
In lugnet.cad, Chris Dee wrote:
There are problems with the XML files. The encoding is defined as UTF-8, yet the
Danish language descriptions contain accented characters (ASCII values > 127)
which are not supported by UTF-8, so cause IE and Perl XML::Simple to choke.
Changing the encoding to ISO-8859-1 makes them much easier to process
programatically.

Chris

You're right, the files aren't UTF-8.  If anyone is interested, you can convert
ISO-8859-1 text to UTF-8 like this:

characters from 0-127 (0x00-0x7F) remain the same
characters from 128-191 (0x80-0xBF) are prefixed by 194 (0xC2) but otherwise the
same
characters from 192-255 (0xC0-0xFF) are prefixed by 195 (0xC3) and have 64
(0x40) subtracted from them, so they fall in the 0x80-0xBF range.

BTW, at first I thought you meant to say that UTF-8 doesn't support accented
characters, but then I figured out what you mean. To clarify:  UTF-8 supports
all Unicode characters from U+0000 to U+10FFFF, but anything above U+007F needs
to be prefixed.  The prefixes range from 0xC2-0xF4, and the characters
themselves range from 0x80-0xBF, so the only bytes not valid in UTF-8 are 0xC0,
0xC1, and 0xF5-0xFF.  (in the LDD xml files, there are a few each of 0xF6, 0xF8,
and 0xFC)

Andy



Message is in Reply To:
  Re: LEGOfactory.com launches... and it's big!
 
(...) There are problems with the XML files. The encoding is defined as UTF-8, yet the Danish language descriptions contain accented characters (ASCII values > 127) which are not supported by UTF-8, so cause IE and Perl XML::Simple to choke. (...) (20 years ago, 25-Nov-04, to lugnet.cad)

2 Messages in This Thread:

Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR