Feedback: irt.org FAQ Knowledge Base Q122

Feedback on: irt.org FAQ Knowledge Base Q122

Sent by Nguyen THien Bang on December 08, 1999 at 22:20:18: - feedback #660

Worth:
Very worth reading

Comments:
I want to learn more from this items

Sent by Sten Drescher on February 29, 2000 at 18:09:41: - feedback #879

Worth:
Not worth reading

Comments:
The answer to this FAQ refers to Jason Nugent's article "Addressing Form Field Validation with Regular Expressions and JavaScript 1.2". Unfortunately, Mr Nugent provides an invalid regular expression for validating email addresses. The regular expression Mr Nugent supplies is /^\w+((-\w+)|(\.\w+))*\@[A-Za-z0-9]+((\.|-)[A-Za-z0-9]+)*\.[A-Za-z0-9]+$/. This appears to be Mr Nugent's attempt at writing a regular expression to correspond to what RFC 822 referrs to as an addr-spec. From Appendix D of RFC 822, we find:

addr-spec = local-part "@" domain ; global address

Comparing this to to Mr Nugent's regular expression, this means that he is using /\w+((-\w+)|(\.\w+))*/ for local-part and /[A-Za-z0-9]+((\.|-)[A-Za-z0-9]+)*\.[A-Za-z0-9]+/ for domain. Looking back at RFC 822:

local-part = word *("." word) ; uninterpreted
; case-preserved

This looks good so far, since Mr Nugent said:

\w+ matches a whole word.

But once we take a look at the RFC 822 definition of word it starts to break down:

word = atom / quoted-string

Hmmmm. Mr Nugent makes no attempt to look for a quoted-string, and what is an atom anyways?

atom = 1*<any CHAR except specials, SPACE and CTLs>

Well, that sounds like it's close to the JavaScript meaning of \w, which Mr Nugent gives as:

\w matches a "word" character (alphanumerics and the "_" character).

However, an RFC 822 special is:

specials = "(" / ")" / "<" / ">" / "@" ; Must be in quoted-
/ "," / ";" / ":" / "\" / <"> ; string, to use
/ "." / "[" / "]" ; within a word.

There are a lot of non-alphaneumeric characters in an atom besides "_" and "-", which Mr Neugent explicitly accounted for in his regexp, such as "~", "`", "+", "=", and so on, which Mr Neugent's regexp excludes.

While I will not go into it here, Mr Nugent's regexp has similar, although not as extensive, problems in the domain portion. I would encourage you to ask Mr Nugent, or someone else, to revise his article to include an accurate parsing mechanism for email addresses. Until then, you should remove the erroneous answer to Q122.