I like my mailbox organised. And I like things to be automated. Fortunately, email systems support aliases for their users, so more than one email address reaches the same person. This allows for automatic filtering depending on which address the message was sent to.

What’s even better is that these systems can match a pattern to make generic aliases (e.g., user-REPLACEME@example.net for user user@example.net). This way, you can create valid email addresses on the fly, without having to tinker with anything (e.g., user-gascompany@example.net for the gas company to contact the user).

Now, dash (-) is not the most common character used for that purpose. The plus character (+) is more commonly seen. Notably, but not alone, GMail supports it. If you have an account there, try sending an email to YOURUSERNAME+test@gmail.com.

And this is where my problem is.  Once again, I was happily filling in a form requesting my email address, put in an address with a + in it, and got it rejected because it “contain[ed] invalid characters.” It really annoys me that some people who call themselves professionals in IT-related fields do not seem to be able to understand a standard properly, if they have been looking for it, at least…

Because there is a standard describing exactly what characters are valid in an email address. This is RFC 5322, the latest current standard for Internet Message Format. Amongst other things, it describes the format of an email address. This is on page 17 onwards. Granted, this is not the easiest to read, so let me single out the relevant parts.

addr-spec       =   local-part "@" domain
local-part      =   dot-atom / quoted-string / obs-local-part

The dot-atom is described on page 13 of the same document.

atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                    "!" / "#" /        ;  characters not including
                    "$" / "%" /        ;  specials.  Used for atoms.
                    "&" / "'" /
                    "*" / "+" /
                    "-" / "/" /
                    "=" / "?" /
                    "^" / "_" /
                    "`" / "{" /
                    "|" / "}" /
                    "~"
[...]
dot-atom-text   =   1*atext *("." 1*atext)
dot-atom        =   [CFWS] dot-atom-text [CFWS]

In summary, the local part (i.e., username, usually) of a valid email address can contain, amongst other things, a series of at least one character(s), potentially separated by dots (dot-atom-text). Moreover, the list of characters allowed in this series (atext) does contain + (and other unusual but valid ones like #, $, &,? or {).

So, please people, when designing forms and verifying email addresses, make sure your verification procedure correctly matches the standards, rather than hand-waving your way around, and trying to guess what works and what doesn’t.

Edit (2015-06-30): I just realised that there is even an RFC describing the use of subaddress filtering: RFC5233.

Edit (2016-10-26): Finally, a good writeup on the 100% correct way to validate email addresses (WebArchive link)!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.