Validating Email Address in Web Forms – The Hazards of Complexity
by Ben Gross
Validating data in web forms reduces the likelihood of inadvertent submission of data that is incorrectly formatted, inconsistent, or incomplete. It is often useful to validate email addresses, especially if the addresses are going to be used for receipts or other types of follow up. Validation (and basic bounds checking) can also reduce the chance that email address field could be used as an attack vector.
It is important to note that email addresses can be significantly more complicated than commonly thought. This means that it is important to consult the most current RFCs for email standards and ICANN announcements for new types of Top Level Domain names otherwise valid email addresses may be blocked. For example, the plus character is a valid within the local portion of an email address. The plus is typically used as an optional feature for sub-addressing and is supported in many mail servers, Cyrus IMAP installations, and in Gmail. However, the plus sign is frequently rejected as invalid by many web forms.
Unless there is a specific need for sophisticated email address validation, I recommend that sites limited themselves to very basic validation such as simply checking for an @ sign and possibly characters to either side of it. When sophisticated validation is used, it is important to test the algorithm and make sure it is kept up to date. This Stack Overflow thread, How far should one take e-mail address validation?, details many of the problems with being too clever when validating addresses. There will always be users who purposefully submit incorrect data and while this can be limited somewhat by validation, simply sending a verification email is a far more effective method.
Dave Child’s early posts from 2004, Email Address Validation and Email Address Validation Updated, laid out many of the complexities of more sophisticated email address validation. The comments to the posts brought up edge cases where the script resulted in both false positives and false negatives. Child has continued to revise the script and it is available as a Google Code project php-email-address-validation.
Douglas Lovell’s 2007 Linux Journal article Validate an E-Mail Address with PHP, the Right Way attempted to present and even more complex email validation algorithm along with detailed notes on the requirements relating to the various updated RFCs. The comments to this article also bring up many edge cases, which demonstrate the complexity of accurately validating email addresses. Jochen Topf’s articles, the Anatomy of a Mail Address and Characters in the local part of a mail address, are good introductions to the problem as well.
Dominic Sayers wrote a series of posts that iterated on a further refined algorithm that resulted in the RFC-compliant email address validator. Sayers also produced a set of unit tests with a large collection of email addresses in order to compare his own algorithm against others. His PHP code is regularly updated and is also available on Google Code. Cal Henderson (formerly of Flickr) wrote his own RFC (2)822 & 3696 Email Address Parser in PHP, which also passes 100% of Sayers Unit tests.
The chapter on inline validation from Luke Wroblewski’s excellent book Web Form Design: Filling in the Blanks describes how inline validation can improve the usability of web forms. He suggests that users should receive immediate feedback on whether or not a given input will be accepted as well as suggestions for correcting invalid input. His blog post Web Form Design: Boingo shows a real world example where inline validation would improve the user experience for a registration form. A recent report Web forms design guidelines: an eyetracking study from cxpartners’ Chui Chui Tan provides even more suggestions on how to best handle inline validation.
In this article, I primarily discuss server-side validation, rather than validation by SMTP commands such as looking for 250 and 550 SMTP response codes as presented in How to check if an email address exists without sending an email?. If the email address is to be used in a mailing list I recommend that systems send an email with a URL that must be clicked for verification so that the address qualifies as double opt-in for compliance with CAN SPAM and most major Email Service Provider requirements.
* This article originally appeared as Validating Email Address in Web Forms – The Hazards of Complexity in my Messaging News “On Message Column.”