|
|
Q1066 Funny characters received from hidden form
irt.org | Knowledge Base | JavaScript | Bugs | Q1066 [ previous next ] Q1066 Funny characters received from hidden formSymptom: Copying accented characters to a hidden form produces spurious characters. Workaround: Use a character encoding script:
This script allows the author to easily write and read its own sources with native strings and special chars, while still displaying them correctly in all [javascript enabled] browsers. Exit the clumsy entities and unescaped constants. Of course, for source editing concerns, if the source is saved in PC format, a Mac person retrieving it would have to convert it to Mac format, else the source will show him some garbage chars in edition. But for browsing concerns, a script written on, say a PC, will have its outputs correctly displayed on all computers, and that's the point. The basic routine is:
Whatever the computer loading and running the source, this will work because, when there's, let's say an "é", it sees the same code for it into the uniChars string and into the script's native strings, and conversion will always properly occur -- even if on a Mac the source actually shows a garbage char for "é" in both the uniChars and the script's string... Note that I've been lazy to fill the uniChars template with all 8-bits chars... Currently there are only the most common French diacritic chars. However, adding other chars is as easy as inserting new lines along the same scheme. A complete table should have all chars from hex 80 to FF. (Editor's note: The scandinavian and some other national characters are not allowed in script in Netscape 3 - unescape the escaped char if necessary - see the Bug list After an update of the table, uncommenting the last line for one shot will allow to quickly test there's no typo in the additions. Also note that the scheme is re-usable for any other Unicode set outside Iso-Latin-1 w/o any change to the template. It doesn't matter at all, all what is required is to have a file whith lines like:
where the source's byte for the content of 'é' is actually E9h, whatever the character actually displayed on screen. The very same source used by an Iso-Latin-1 script would work the same in, say, cyrillic mode for cyrillic chars. JCS - edited by MHP |
-- div -->
|