Publishing : 1030


Publishing / 1030	1029 \| 1031

Subject:	Re: Just testing character codes
Newsgroups:	lugnet.off-topic.test, lugnet.publish, lugnet.admin.general
Followup-To:	lugnet.admin.general
Date:	Sat, 2 Oct 1999 13:28:33 GMT
Viewed:	123 times

In lugnet.off-topic.test, David Eaton writes: > Hmm.... maybe it's something to do with how you're parsing web-input? I'm > using the web interface to enter messages, and I'm typing: > & # 1 6 3 ; > (but without the spaces) and it comes out as a single character (the pound > sign: "£") when I view the message. Also, it comes out as the same single > character when I click on "view raw message". When I type: > & p o u n d ; > (without the spaces) it just comes out as "£", it doesnt come out as > the pound sign. Thanks for reporting this. When I said earlier that it "didn't" convert stuff like that to raw form, I meant that I believed that didn't, because it shouldn't (it wasn't supposed have been). Alas, I did something dumb in a regex: In an HTML-to-ASCII conversion function, there was a sequence of three regexen for converting the three types of HTML entities (regular ones like "£", base-10 ones like "&163;", and base-16 ones like "&xA3;"). But doing this in three passes rather than in a single pass is very dangerous -- because something like &163; gets first converted to &163; and then to £ and it shouldn't've been doing a double-conversion. So I fixed it to be a single regex transform on the string (as it should have been all along). > So I'm thinking that maybe what happens is when you read in > the web-input, you read it in as a single character? First it's sent by the browser via the HTTP 'POST' method (encoded in %xx URL format), then when it's received by the server, it's reencoded as HTML (the basic block of text my scripts work with). So when you entered &163; into the textarea box, it was sent to the server as: %26163; and this was then reencoded into HTML as: &163; All good so far at the low level -- but then the higher-level code which handles the actual data of the article needs to make sure it's line-wrapped at 79 columns if the user's browser doesn't support the WRAP=HARD attribute of the <TEXTAREA> tag. So it has to be temporarily converted back to ASCII for line-wrapping, and then back to HTML again. Hence, with the triple-regex mess, it was coming out finally as "£" rather than as "&163;" as it should have been... --Todd

Message is in Reply To:

		Re: Just testing character codes
(...) Hmm.... maybe it's something to do with how you're parsing web-input? I'm using the web interface to enter messages, and I'm typing: & # 1 6 3 ; (but without the spaces) and it comes out as a single character (the pound sign: "£") when I view (...) (26 years ago, 28-Sep-99, to lugnet.off-topic.test, lugnet.publish)

5 Messages in This Thread:

Entire Thread on One Page:: Nested: All | Brief | Compact | Dots
Linear: All | Brief | Compact

Custom Search