|
In lugnet.off-topic.test, David Eaton writes:
> Hmm.... maybe it's something to do with how you're parsing web-input? I'm
> using the web interface to enter messages, and I'm typing:
> & # 1 6 3 ;
> (but without the spaces) and it comes out as a single character (the pound
> sign: "£") when I view the message. Also, it comes out as the same single
> character when I click on "view raw message". When I type:
> & p o u n d ;
> (without the spaces) it just comes out as "£", it doesnt come out as
> the pound sign.
Thanks for reporting this. When I said earlier that it "didn't" convert
stuff like that to raw form, I meant that I believed that didn't, because
it shouldn't (it wasn't supposed have been). Alas, I did something dumb
in a regex: In an HTML-to-ASCII conversion function, there was a sequence
of three regexen for converting the three types of HTML entities (regular
ones like "£", base-10 ones like "&163;", and base-16 ones like
"&xA3;"). But doing this in three passes rather than in a single pass is
very dangerous -- because something like
&163;
gets first converted to
&163;
and then to
£
and it shouldn't've been doing a double-conversion. So I fixed it to be a
single regex transform on the string (as it should have been all along).
> So I'm thinking that maybe what happens is when you read in
> the web-input, you read it in as a single character?
First it's sent by the browser via the HTTP 'POST' method (encoded in %xx
URL format), then when it's received by the server, it's reencoded as HTML
(the basic block of text my scripts work with). So when you entered
&163;
into the textarea box, it was sent to the server as:
%26163;
and this was then reencoded into HTML as:
&163;
All good so far at the low level -- but then the higher-level code which
handles the actual data of the article needs to make sure it's line-wrapped
at 79 columns if the user's browser doesn't support the WRAP=HARD attribute
of the <TEXTAREA> tag. So it has to be temporarily converted back to ASCII
for line-wrapping, and then back to HTML again.
Hence, with the triple-regex mess, it was coming out finally as "£"
rather than as "&163;" as it should have been...
--Todd
|
|
Message is in Reply To:
| | Re: Just testing character codes
|
| (...) Hmm.... maybe it's something to do with how you're parsing web-input? I'm using the web interface to enter messages, and I'm typing: & # 1 6 3 ; (but without the spaces) and it comes out as a single character (the pound sign: "£") when I view (...) (25 years ago, 28-Sep-99, to lugnet.off-topic.test, lugnet.publish)
|
5 Messages in This Thread:
- Entire Thread on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
|
|
|
|