How to validate an email address using regular expressions

Anyone who writes web applications has likely built forms for users to enter their e-mail address. It’s considered a part of our basic identity now – like our name and address. It’s expected we have them.

But then, inevitably, once we’ve collected e-mails from all our customers and we go to use them – we get bounces. Which leaves us to clean up the mess – and there usually is a mess to clean up if we can’t tell the customer their order is ready or that we need information from them. Worse yet, we might just assume the information got through correctly and go on our merry way – leaving our customer dangling.

And while we’re quietly cursing our chubby-fingered (yet all important) customers, we wonder (or scream!) to ourselves why we don’t just write some code to validate the e-mail addresses when they’re entered?

But anyone who’s really looked into doing this will tell you that it gets deep quickly. What seems like it should be straightforward ends up arcane and impossible.

To begin with, e-mail address formats are covered by RFC 822 – which is filled with impenetrable discussions on “sequences of lexical symbols” such as “atoms”, “special characters”, “domain-literals” and “comments”.

“comments”? Yes, e-mail addresses can contain comments. I tested them too – and they work. A comment is (to the best of my knowledge) any text placed in parentheses anywhere in the email address. For example, my e-mail can be:


* kevin@kbedell.com, or
* kev(you da man!)in@kbedell.com, or
* kevin@k(evin)bedell.com

All these work – I tried them. Try validating that. I dare you.

Another bit of a twist is that you can also specify an IP address instead of a domain name. For example, I’m not only “kevin@kbedell.com”, I’m also kevin@216.80.243.82.

To make matters worse – as it should be expected to get – many mail servers won’t accept emails even if they are valid. For example, my mail server won’t accept kevin@216.80.243.82 – the anti-spam controls bounce it.

Imagine – all that work to validate it, and it still won’t work. Makes you want to spend your days surfing these pages…

I even ran across one brave soul that came up with a regular expression that he was sure could validate an e-mail address. Here it is:


function isValidEmail(emailAddress) {
var re =
/^(([^<>()[\]\\.,;:\s@\”]+(\.[^<>()[\]\\.,;:\s@\”]+)*)|(\”.+\”))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/
return re.test(emailAddress);
}

Wow. That’s a mouthful. Of course, I’m so jaded by now that I’m sure he must’ve missed something. Or that the emails will just get bounced anyway.

So is validating an email address impossible? Here’s the answer: It’s easy!

You don’t have to be a genius to validate email addresses. All you have to do is send a test e-mail to the customer! Really – this is the only way. If it gets through, the address is valid. If it bounces, then it’s not.

Now let’s just hope no one ever changes their email address once we validate it…

Are my points “valid”?

[Note: This is a repost of a post I originally wrote for The O'Reilly Network in December of 2002]

This entry was posted in programming, ruby on rails and tagged , , , . Bookmark the permalink.