Validating email in the world of PHP

Introduction

Recently I found myself AGAIN with the problem to validate an email address in PHP.

You might think that this problem has been addressed a million times before and there is a fool proof solution out there.

One of the best results google yields is certainly this stackoverflow article.
It gives you a very good outline why using a simple regex is most languages not a solution to comply with the very complex format outlined in the RFC 5321.

However PCREs (Perl Compatible Regular Expressions) which are available in PHP can do the job according to the stackoverflow post. In the article the following post is linked: http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

As you can see it contains a ridiulous regex, which, as the author states, has been autogenerated. When I tried to run the regex PHP complained with the error:
Warning: preg_match(): Compilation failed: unmatched parentheses at offset 1551 in

The other provided also the original PERL module, which is pretty easy to compile. I tried that too, more on the results later.

Other promising approaches was http://isemail.info/about , where the author has written a very easy to integrate PHP function that basically uses a for loop to iterate over the email address and validate it according to his implementation of the RFC 5321.
This class can also do DNS lookups on the domain to check if an MX record is present.

The final most promising approach was a PHP state machine extracted from the PHP-CMS Barebone (Ultimate email toolkit) which also could give suggestions on what the correct email address could be, if the validation failed.
You could use such a class to be very forthcoming to the user.

Example: User types “arne.tarara.@googlemail.com” – A classic!

The class would then yield “Validation failed! Suggestion: arne.tarara@googlemail.com”

Also the class could do DNS lookups, so basically a one-size-fits-all solution!

Alright, let’s dive into the the Tests.

Testing Results

Candidates

  • Basic PCRE, often used ‘/^([a-zA-Z0-9_\-\+\~\^\{\}]+[\.]?)+@{1}([a-zA-Z0-9_\-\+\~\^\{\}]+[\.]?)+\.[A-Za-z0-9]{2,}$/’
  • PHP internal function filter_var() http://php.net/manual/de/function.filter-var.php
  • is_email() PHP function http://isemail.info/about
  • PHP state machine for email testing. Extracted from Barebone CMS https://barebonescms.com/documentation/ultimate_email_toolkit/
  • Complex PCRE http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

Testing set

I used the tests bundled with is_email() on https://code.google.com/p/isemail/downloads/detail?name=is_email-3.01.zip&can=2&q= and I also used the tests bundled with the perl module on http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

Both did contain negatives and positives. However, I had to eliminate all the positvies from the perl module, because it claimed “abigail@example.com ” (note the space at the end) to be a valid email address, which it is clearly not. Other cases where “abigail @example.com” etc.
The most confusing case was: “*()@[]” => How is that a valid email? the local part, well maybe, but the hostname?

I reduced the set, as said before, which resulted in:


array (
1 =>
array (
45 => 'test@io',
46 => 'test@iana.org',
47 => 'test@nominet.org.uk',
48 => 'test@about.museum',
49 => 'a@iana.org',
50 => 'test@e.com',
51 => 'test@iana.a',
52 => 'test.test@iana.org',
53 => '!#$%&`*+/=?^`{|}~@iana.org',
54 => '123@iana.org',
55 => 'test@123.com',
56 => 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm@iana.org',
57 => 'test@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.com',
58 => 'test@mason-dixon.com',
59 => 'test@g--a.com',
60 => 'test@iana.co-uk',
61 => 'a@a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v',
62 => 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghi',
63 => 'test@xn--hxajbheg2az3al.xn--jxalpdlp',
64 => 'xn--test@iana.org',
65 => 'test@test.com',
66 => 'test@nic.no',
67 => 'test@org',
68 => 'test@iana.123',
69 => 'test@255.255.255.255',
70 => 'test@[IPv6:1111:2222:3333:4444::255.255.255.255]',
71 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:255.255.255.255]',
72 => 'test@[IPv6:::]',
73 => 'test@[IPv6:::3333:4444:5555:6666:7777:8888]',
74 => 'test@[IPv6:1111:2222:3333:4444:5555::8888]',
75 => '"test"@iana.org',
76 => '""@iana.org',
77 => '"\\a"@iana.org',
78 => '"\\""@iana.org',
79 => '"test\\ test"@iana.org',
80 => 'test@[255.255.255.255]',
81 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888]',
82 => 'test@[IPv6:1111:2222:3333:4444:5555:6666::8888]',
83 => '"\\\\"@iana.org',
),
0 =>
array (
0 => 'Just a string',
1 => 'string',
2 => '(comment)',
3 => '()@example.com',
4 => 'fred(&)barny@example.com',
5 => 'fred\\ barny@example.com',
6 => 'Abigail ',
7 => 'Abigail <abigail(fo(o)@example.com>',
8 => 'Abigail <abigail(fo)o)@example.com>',
9 => '"Abi"gail" <abigail@example.com>',
10 => 'abigail@[exa]ple.com]',
11 => 'abigail@[exa[ple.com]',
12 => 'abigail@[exaple].com]',
13 => 'abigail@',
14 => '@example.com',
15 => 'phrase: abigail@example.com abigail@example.com ;',
16 => 'invalid£char@example.com',
17 => '',
18 => 'test',
19 => '@',
20 => 'test@',
21 => '@io',
22 => '@iana.org',
23 => '.test@iana.org',
24 => 'test.@iana.org',
25 => 'test..iana.org',
26 => 'test_exa-mple.com',
27 => 'test\\@test@iana.org',
30 => 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklmn@iana.org',
31 => 'test@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm.com',
32 => 'test@-iana.org',
33 => 'test@iana-.com',
34 => 'test@.iana.org',
35 => 'test@iana.org.',
36 => 'test@iana..com',
37 => 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghij',
38 => 'a@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefg.hij',
39 => 'a@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefg.hijk',
42 => '"""@iana.org',
47 => 'test"@iana.org',
48 => '"test@iana.org',
49 => '"test"test@iana.org',
50 => 'test"text"@iana.org',
51 => '"test""test"@iana.org',
52 => '"test"."test"@iana.org',
54 => '"test".test@iana.org',
55 => '"test␀"@iana.org',
56 => '"test\\␀"@iana.org',
57 => '"abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefghj"@iana.org',
58 => '"abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefg\\h"@iana.org',
60 => 'test@a[255.255.255.255]',
61 => 'test@[255.255.255]',
62 => 'test@[255.255.255.255.255]',
63 => 'test@[255.255.255.256]',
64 => 'test@[1111:2222:3333:4444:5555:6666:7777:8888]',
65 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:7777]',
67 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888:9999]',
68 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:7777:888G]',
71 => 'test@[IPv6:1111:2222:3333:4444:5555:6666::7777:8888]',
72 => 'test@[IPv6::3333:4444:5555:6666:7777:8888]',
74 => 'test@[IPv6:1111::4444:5555::8888]',
76 => 'test@[IPv6:1111:2222:3333:4444:5555:255.255.255.255]',
78 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:7777:255.255.255.255]',
80 => 'test@[IPv6:1111:2222:3333:4444:5555:6666::255.255.255.255]',
81 => 'test@[IPv6:1111:2222:3333:4444:::255.255.255.255]',
82 => 'test@[IPv6::255.255.255.255]',
83 => ' test @iana.org',
84 => 'test@ iana .com',
85 => 'test . test@iana.org',
86 => '␍␊ test@iana.org',
87 => '␍␊ ␍␊ test@iana.org',
88 => '(comment)test@iana.org',
89 => '((comment)test@iana.org',
90 => '(comment(comment))test@iana.org',
91 => 'test@(comment)iana.org',
92 => 'test(comment)test@iana.org',
93 => 'test@(comment)[255.255.255.255]',
94 => '(comment)abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm@iana.org',
95 => 'test@(comment)abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.com',
96 => '(comment)test@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghik.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghik.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk.abcdefghijklmnopqrstuvwxyzabcdefghijk.abcdefghijklmnopqrstu',
97 => 'test@iana.org␊',
98 => 'test@iana.org-',
99 => '"test@iana.org',
100 => '(test@iana.org',
101 => 'test@(iana.org',
102 => 'test@[1.2.3.4',
103 => '"test\\"@iana.org',
104 => '(comment\\)test@iana.org',
105 => 'test@iana.org(comment\\)',
106 => 'test@iana.org(comment\\',
107 => 'test@[RFC-5322-domain-literal]',
108 => 'test@[RFC-5322]-domain-literal]',
109 => 'test@[RFC-5322-[domain-literal]',
110 => 'test@[RFC-5322-\\␇-domain-literal]',
111 => 'test@[RFC-5322-\\␉-domain-literal]',
112 => 'test@[RFC-5322-\\]-domain-literal]',
113 => 'test@[RFC-5322-domain-literal\\]',
114 => 'test@[RFC-5322-domain-literal\\',
115 => 'test@[RFC 5322 domain literal]',
116 => 'test@[RFC-5322-domain-literal] (comment)',
117 => '@iana.org',
118 => 'test@.org',
119 => '""@iana.org',
120 => '"\\"@iana.org',
121 => '()test@iana.org',
122 => 'test@iana.org␍',
123 => '␍test@iana.org',
124 => '"␍test"@iana.org',
125 => '(␍)test@iana.org',
126 => 'test@iana.org(␍)',
127 => '␊test@iana.org',
128 => '"␊"@iana.org',
129 => '"\\␊"@iana.org',
130 => '(␊)test@iana.org',
131 => '␇@iana.org',
132 => 'test@␇.org',
133 => '"␇"@iana.org',
134 => '"\\␇"@iana.org',
135 => '(␇)test@iana.org',
136 => '␍␊test@iana.org',
137 => '␍␊ ␍␊test@iana.org',
138 => ' ␍␊test@iana.org',
139 => ' ␍␊ test@iana.org',
140 => ' ␍␊ ␍␊test@iana.org',
141 => ' ␍␊␍␊test@iana.org',
142 => ' ␍␊␍␊ test@iana.org',
143 => 'test@iana.org␍␊ ',
144 => 'test@iana.org␍␊ ␍␊ ',
145 => 'test@iana.org␍␊',
146 => 'test@iana.org␍␊ ␍␊',
147 => 'test@iana.org ␍␊',
148 => 'test@iana.org ␍␊ ',
149 => 'test@iana.org ␍␊ ␍␊',
150 => 'test@iana.org ␍␊␍␊',
151 => 'test@iana.org ␍␊␍␊ ',
152 => ' test@iana.org',
153 => 'test@iana.org ',
154 => 'test@[IPv6:1::2:]',
155 => '"test\\©"@iana.org',
156 => 'test@iana/icann.org',
157 => 'test.(comment)test@iana.org',
158 => '"\\"@iana.org',
),
);

Please note: I indexed the array with “true” for valid emails and “false” for invalid.
Since not all libraries had DNS checking available in the “true” array are some emails, which are syntactically correct, but do NOT have a matching MX record.

Basic Regex

'/^([a-zA-Z0-9_\-\+\~\^\{\}]+[\.]?)+@{1}([a-zA-Z0-9_\-\+\~\^\{\}]+[\.]?)+\.[A-Za-z0-9]{2,}$/'
n: 181
Invalid results: 29
Accuracy: 83.978%
Duration: 0.11357188224792 s

This basic regex fails to validate the new TLD-less domain names such as “test@io” and also does not like domains such as “test@iana.a”. Other domains with double-dots occur as false-positives: “test@iana..com”

True, “.a” is not a valid TLD at the moment, but it can be. So it would be better to leave this check to MX-Record matching …

Also “!#$%&`*+/=?^`{|}~@iana.org” failed, which is syntactically correct according to the testset from http://isemail.info/about

However, it can handle IPV6 guys like “test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888]” without choking.

is_email() PHP function

http://isemail.info/about

n: 181
Invalid results: 0
Accuracy: 100%
Duration: 0.037806987762451 s

Well, since it runs on its own testset, with just 16 added “invalids” this result is not very suprising. Still good!

PHP internal function filter_var()

http://php.net/manual/de/function.filter-var.php

n: 181
Invalid results: 8
Accuracy: 95.58%
Duration: 0.0053420066833496 s

The accuracy is way better than the simple PCRE. It fails on less, but the same addresses that the simple PCRE could not handle, such as “test@[IPv6:1111:2222:3333:4444:5555:6666::8888]” and
The filter_var() function fails on IPv6 addresses such as “test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888]” and “test@[IPv6:::]”.
Also it cannot handle escaped spaces such as here: “”test\ test”@iana.org” and “test@io”

Also it accounts false-positives such as “test@org”, which is invalid.

Another blog made a really more in-depth test of this function and other regex functions: http://fightingforalostcause.net/content/misc/2006/compare-email-regex.php

The author states, that the filter_var function is the best he got in his database. However, I could not find his testset for an easy download, so I really could not compare with his data.
Still, it seems that this is a very solid regex.

PHP state machine for email testing. Extracted from Barebone CMS

https://barebonescms.com/documentation/ultimate_email_toolkit/

Well, no data? Japp.

The problem is, the function gets into an infinite loop while parsing the address "test\ test"@iana.org

Looking at the source-code line, an unterminating while loop is the culprit:


while ($email != "")
{
$currchr = substr($email, 0, 1);
$nextchr = substr($email, 1, 1);

if ($currchr == "\\")
{
if ($nextchr == "\\" || $nextchr == "\"")
{
$local .= substr($email, 0, 2);
$email = substr($email, 2);
}
else if (ord($nextchr) >= 33 && ord($nextchr) <= 126) { $local .= substr($email, 1, 1); $email = substr($email, 2); } } else if ($currchr == "\"") break; else if (ord($currchr) >= 33 && ord($nextchr) <= 126)
{
$local .= substr($email, 0, 1);
$email = substr($email, 1);
}
else $email = substr($email, 1);
}

As you can see, in the block ‘if ($currchr == “\\”)’ no else condition is defined. Since a space is not accounted for, this results in no further trimming of the string and thus an infinite loop.

Failed!

Complex PCRE

http://ex-parrot.com/~pdw/Mail-RFC822-Address.html
(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
\t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)

As said before, the regex itself did not compile for me in PHP, so built the PERL module.

n: 181
Invalid results: 53
Accuracy: 70.72%
Duration: 0.053420066833496 s

Well, this is a poor result. What happened?

It may be due to the fact, that the test are build for RFC. The module in PERL however is built for RFC822

Problem: This is not correct. According to isemail.info and the wikipedia RFC5322 is the right place to look (http://en.wikipedia.org/wiki/Email_address)

I am not going into detail which of the addresses failed and which not, cause the result is too poor.

Results:

Keep it simple: The winner is is_email() from isemail.info

Conclusion and Usage

isemail.info provides a solid PHP based email testing class in accordance to the RFC5322, which is the relevant RFC for SMTP email.

The class has a light footprint and is easy to use. Also it can do DNS lookups and integrate the result of a found MX or not found MX into the result of the is_email() function.

As theoretically perfect this result is, there is a drawback in the implementation of the function itself.
It uses the internal PHP function dns_get_record() to make the DNS check. If that function is not available (because it is not compiled with) the check is simply omitted.

The problem with “dns_get_record()” is that it uses underlying mechanics of the OS and does not support a PHP stream to mingle with. This results in the problem, that you cannot set a timeout for the function call.

So let’s say you got an unresponsive DNS the call will run forever without hitting a timeout. I tested this myself by setting up a DNS on my localhost that keeps the connection open forever.

The result was, that the function did not return. Also using set_time_limit(1) did not make any effect, since system calls do NOT ACCOUNT for running time of the script.

As good as “is_email()” is for making syntactical check, it is as bad when using it in a real life scenario. If you want to live-check an email, lets say in a webshop, the page gets unresponsive and the user will eventually abort the registration!

The golden solution: Patching is_email()

I decided to patch the is_email() class and integrate the well known NET_DNS2 class (http://pear.php.net/package/Net_DNS2)

This class uses a 5 second timeout and can even be more graceful if the domain does have an A record, but no MX.

Since all the classes were under BSD license I attached my modded class along with a short usage example.

License: http://www.opensource.org/licenses/bsd-license.php BSD License

Example:




Download: ZIP-File