To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.publishOpen lugnet.publish in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 Publishing / 1030
1029  |  1031
Subject: 
Re: Just testing character codes
Newsgroups: 
lugnet.off-topic.test, lugnet.publish, lugnet.admin.general
Followup-To: 
lugnet.admin.general
Date: 
Sat, 2 Oct 1999 13:28:33 GMT
Viewed: 
55 times
  
In lugnet.off-topic.test, David Eaton writes:
Hmm.... maybe it's something to do with how you're parsing web-input? I'm
using the web interface to enter messages, and I'm typing:
& # 1 6 3 ;
(but without the spaces) and it comes out as a single character (the pound
sign: "£") when I view the message. Also, it comes out as the same single
character when I click on "view raw message". When I type:
& p o u n d ;
(without the spaces) it just comes out as "£", it doesnt come out as
the pound sign.

Thanks for reporting this.  When I said earlier that it "didn't" convert
stuff like that to raw form, I meant that I believed that didn't, because
it shouldn't (it wasn't supposed have been).  Alas, I did something dumb
in a regex:  In an HTML-to-ASCII conversion function, there was a sequence
of three regexen for converting the three types of HTML entities (regular
ones like "£", base-10 ones like "&163;", and base-16 ones like
"&xA3;").  But doing this in three passes rather than in a single pass is
very dangerous -- because something like

   &163;

gets first converted to

   &163;

and then to

   £

and it shouldn't've been doing a double-conversion.  So I fixed it to be a
single regex transform on the string (as it should have been all along).


So I'm thinking that maybe what happens is when you read in
the web-input, you read it in as a single character?

First it's sent by the browser via the HTTP 'POST' method (encoded in %xx
URL format), then when it's received by the server, it's reencoded as HTML
(the basic block of text my scripts work with).  So when you entered

   &163;

into the textarea box, it was sent to the server as:

   %26163;

and this was then reencoded into HTML as:

   &163;

All good so far at the low level -- but then the higher-level code which
handles the actual data of the article needs to make sure it's line-wrapped
at 79 columns if the user's browser doesn't support the WRAP=HARD attribute
of the <TEXTAREA> tag.  So it has to be temporarily converted back to ASCII
for line-wrapping, and then back to HTML again.

Hence, with the triple-regex mess, it was coming out finally as "&pound;"
rather than as "&amp;163;" as it should have been...

--Todd



Message is in Reply To:
  Re: Just testing character codes
 
(...) Hmm.... maybe it's something to do with how you're parsing web-input? I'm using the web interface to enter messages, and I'm typing: & # 1 6 3 ; (but without the spaces) and it comes out as a single character (the pound sign: "£") when I view (...) (25 years ago, 28-Sep-99, to lugnet.off-topic.test, lugnet.publish)

5 Messages in This Thread:

Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR