Q1066 Funny characters received from hidden form

You are here: irt.org | FAQ | JavaScript | Bugs | Q1066 [ previous next ]

Symptom: Copying accented characters to a hidden form produces spurious characters.

Affects:

Workaround: Use a character encoding script:

<SCRIPT LANGUAGE="JavaScript"><!--
var uniChars = ''
var uniCodes = ""

uniChars+= '°' ; uniCodes+= 'B0'
uniChars+= '²' ; uniCodes+= 'B2'
uniChars+= 'à' ; uniCodes+= 'E0'
uniChars+= 'â' ; uniCodes+= 'E2'
uniChars+= 'ç' ; uniCodes+= 'E7'
uniChars+= 'è' ; uniCodes+= 'E8'
uniChars+= 'é' ; uniCodes+= 'E9'
uniChars+= 'ê' ; uniCodes+= 'EA'
uniChars+= 'ô' ; uniCodes+= 'F4'
uniChars+= 'ù' ; uniCodes+= 'F9'
uniChars+= 'û' ; uniCodes+= 'FB'

function unicode( str ) {
// Convert a native string to Unicode string

        var  n, p, j,c,s=""

        for( j=0, n=str.length ; c=str.charAt(j), j < n ; j++ )
                if( (p=uniChars.indexOf(c)) < 0 ) s+= c
                else s+= unescape( "%"+uniCodes.substring(2*p,2*p+2) )

        return s
}
//--></SCRIPT>

<FORM NAME="form1"><INPUT TYPE=TEXT NAME="field1" size=45></FORM>

<SCRIPT LANGUAGE="JavaScript"><!--
// This string is defined with special chars. The .htm source will
// store them encoded in the native format of my computer

str = "Des mathématiques en français: a² = b² + c²"

// This converts the native string into an all-comps string before use

str = unicode( str )

// Now I can use it safely everywhere:

document.write("<P>" + str + "<\/P>")
document.form1.field1.value = str

//--></SCRIPT>

This script allows the author to easily write and read its own sources with native strings and special chars, while still displaying them correctly in all [javascript enabled] browsers. Exit the clumsy entities and unescaped constants.

Of course, for source editing concerns, if the source is saved in PC format, a Mac person retrieving it would have to convert it to Mac format, else the source will show him some garbage chars in edition. But for browsing concerns, a script written on, say a PC, will have its outputs correctly displayed on all computers, and that's the point.

The basic routine is:

function unicode( str ) {
// Convert a native string to Unicode string
        var  n, p, j,c,s=""

        for( j=0, n=str.length ; c=str.charAt(j), j < n ; j++ )
                if( (p=uniChars.indexOf(c)) < 0 ) s+= c
                else s+= unescape( "%"+uniCodes.substring(2*p,2*p+2) )

        return s
}

Whatever the computer loading and running the source, this will work because, when there's, let's say an "é", it sees the same code for it into the uniChars string and into the script's native strings, and conversion will always properly occur -- even if on a Mac the source actually shows a garbage char for "é" in both the uniChars and the script's string...

Note that I've been lazy to fill the uniChars template with all 8-bits chars... Currently there are only the most common French diacritic chars. However, adding other chars is as easy as inserting new lines along the same scheme. A complete table should have all chars from hex 80 to FF.

(Editor's note: The scandinavian and some other national characters are not allowed in script in Netscape 3 - unescape the escaped char if necessary - see the Bug list

After an update of the table, uncommenting the last line for one shot will allow to quickly test there's no typo in the additions.

Also note that the scheme is re-usable for any other Unicode set outside Iso-Latin-1 w/o any change to the template. It doesn't matter at all, all what is required is to have a file whith lines like:

uniChars+= 'é' ; uniCodes+= 'E9'

where the source's byte for the content of 'é' is actually E9h, whatever the character actually displayed on screen. The very same source used by an Iso-Latin-1 script would work the same in, say, cyrillic mode for cyrillic chars.

JCS - edited by MHP