Validating email in the world of PHP

Introduction

Recently I found myself AGAIN with the problem to validate an email address in PHP.

You might think that this problem has been addressed a million times before and there is a fool proof solution out there.

One of the best results google yields is certainly this stackoverflow article.
It gives you a very good outline why using a simple regex is most languages not a solution to comply with the very complex format outlined in the RFC 5321.

However PCREs (Perl Compatible Regular Expressions) which are available in PHP can do the job according to the stackoverflow post. In the article the following post is linked: http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

As you can see it contains a ridiulous regex, which, as the author states, has been autogenerated. When I tried to run the regex PHP complained with the error:
Warning: preg_match(): Compilation failed: unmatched parentheses at offset 1551 in

The other provided also the original PERL module, which is pretty easy to compile. I tried that too, more on the results later.

Other promising approaches was http://isemail.info/about , where the author has written a very easy to integrate PHP function that basically uses a for loop to iterate over the email address and validate it according to his implementation of the RFC 5321.
This class can also do DNS lookups on the domain to check if an MX record is present.

The final most promising approach was a PHP state machine extracted from the PHP-CMS Barebone (Ultimate email toolkit) which also could give suggestions on what the correct email address could be, if the validation failed.
You could use such a class to be very forthcoming to the user.

Example: User types “arne.tarara.@googlemail.com” – A classic!

The class would then yield “Validation failed! Suggestion: arne.tarara@googlemail.com”

Also the class could do DNS lookups, so basically a one-size-fits-all solution!

Alright, let’s dive into the the Tests.

Testing Results

Candidates

  • Basic PCRE, often used ‘/^([a-zA-Z0-9_\-\+\~\^\{\}]+[\.]?)+@{1}([a-zA-Z0-9_\-\+\~\^\{\}]+[\.]?)+\.[A-Za-z0-9]{2,}$/’
  • PHP internal function filter_var() http://php.net/manual/de/function.filter-var.php
  • is_email() PHP function http://isemail.info/about
  • PHP state machine for email testing. Extracted from Barebone CMS https://barebonescms.com/documentation/ultimate_email_toolkit/
  • Complex PCRE http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

Testing set

I used the tests bundled with is_email() on https://code.google.com/p/isemail/downloads/detail?name=is_email-3.01.zip&can=2&q= and I also used the tests bundled with the perl module on http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

Both did contain negatives and positives. However, I had to eliminate all the positvies from the perl module, because it claimed “abigail@example.com ” (note the space at the end) to be a valid email address, which it is clearly not. Other cases where “abigail @example.com” etc.
The most confusing case was: “*()@[]” => How is that a valid email? the local part, well maybe, but the hostname?

I reduced the set, as said before, which resulted in:


array (
1 =>
array (
45 => 'test@io',
46 => 'test@iana.org',
47 => 'test@nominet.org.uk',
48 => 'test@about.museum',
49 => 'a@iana.org',
50 => 'test@e.com',
51 => 'test@iana.a',
52 => 'test.test@iana.org',
53 => '!#$%&`*+/=?^`{|}~@iana.org',
54 => '123@iana.org',
55 => 'test@123.com',
56 => 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm@iana.org',
57 => 'test@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.com',
58 => 'test@mason-dixon.com',
59 => 'test@g--a.com',
60 => 'test@iana.co-uk',
61 => 'a@a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v',
62 => 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghi',
63 => 'test@xn--hxajbheg2az3al.xn--jxalpdlp',
64 => 'xn--test@iana.org',
65 => 'test@test.com',
66 => 'test@nic.no',
67 => 'test@org',
68 => 'test@iana.123',
69 => 'test@255.255.255.255',
70 => 'test@[IPv6:1111:2222:3333:4444::255.255.255.255]',
71 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:255.255.255.255]',
72 => 'test@[IPv6:::]',
73 => 'test@[IPv6:::3333:4444:5555:6666:7777:8888]',
74 => 'test@[IPv6:1111:2222:3333:4444:5555::8888]',
75 => '"test"@iana.org',
76 => '""@iana.org',
77 => '"\\a"@iana.org',
78 => '"\\""@iana.org',
79 => '"test\\ test"@iana.org',
80 => 'test@[255.255.255.255]',
81 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888]',
82 => 'test@[IPv6:1111:2222:3333:4444:5555:6666::8888]',
83 => '"\\\\"@iana.org',
),
0 =>
array (
0 => 'Just a string',
1 => 'string',
2 => '(comment)',
3 => '()@example.com',
4 => 'fred(&)barny@example.com',
5 => 'fred\\ barny@example.com',
6 => 'Abigail ',
7 => 'Abigail <abigail(fo(o)@example.com>',
8 => 'Abigail <abigail(fo)o)@example.com>',
9 => '"Abi"gail" <abigail@example.com>',
10 => 'abigail@[exa]ple.com]',
11 => 'abigail@[exa[ple.com]',
12 => 'abigail@[exaple].com]',
13 => 'abigail@',
14 => '@example.com',
15 => 'phrase: abigail@example.com abigail@example.com ;',
16 => 'invalid£char@example.com',
17 => '',
18 => 'test',
19 => '@',
20 => 'test@',
21 => '@io',
22 => '@iana.org',
23 => '.test@iana.org',
24 => 'test.@iana.org',
25 => 'test..iana.org',
26 => 'test_exa-mple.com',
27 => 'test\\@test@iana.org',
30 => 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklmn@iana.org',
31 => 'test@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm.com',
32 => 'test@-iana.org',
33 => 'test@iana-.com',
34 => 'test@.iana.org',
35 => 'test@iana.org.',
36 => 'test@iana..com',
37 => 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghij',
38 => 'a@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefg.hij',
39 => 'a@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefg.hijk',
42 => '"""@iana.org',
47 => 'test"@iana.org',
48 => '"test@iana.org',
49 => '"test"test@iana.org',
50 => 'test"text"@iana.org',
51 => '"test""test"@iana.org',
52 => '"test"."test"@iana.org',
54 => '"test".test@iana.org',
55 => '"test␀"@iana.org',
56 => '"test\\␀"@iana.org',
57 => '"abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefghj"@iana.org',
58 => '"abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefg\\h"@iana.org',
60 => 'test@a[255.255.255.255]',
61 => 'test@[255.255.255]',
62 => 'test@[255.255.255.255.255]',
63 => 'test@[255.255.255.256]',
64 => 'test@[1111:2222:3333:4444:5555:6666:7777:8888]',
65 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:7777]',
67 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888:9999]',
68 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:7777:888G]',
71 => 'test@[IPv6:1111:2222:3333:4444:5555:6666::7777:8888]',
72 => 'test@[IPv6::3333:4444:5555:6666:7777:8888]',
74 => 'test@[IPv6:1111::4444:5555::8888]',
76 => 'test@[IPv6:1111:2222:3333:4444:5555:255.255.255.255]',
78 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:7777:255.255.255.255]',
80 => 'test@[IPv6:1111:2222:3333:4444:5555:6666::255.255.255.255]',
81 => 'test@[IPv6:1111:2222:3333:4444:::255.255.255.255]',
82 => 'test@[IPv6::255.255.255.255]',
83 => ' test @iana.org',
84 => 'test@ iana .com',
85 => 'test . test@iana.org',
86 => '␍␊ test@iana.org',
87 => '␍␊ ␍␊ test@iana.org',
88 => '(comment)test@iana.org',
89 => '((comment)test@iana.org',
90 => '(comment(comment))test@iana.org',
91 => 'test@(comment)iana.org',
92 => 'test(comment)test@iana.org',
93 => 'test@(comment)[255.255.255.255]',
94 => '(comment)abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm@iana.org',
95 => 'test@(comment)abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.com',
96 => '(comment)test@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghik.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghik.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk.abcdefghijklmnopqrstuvwxyzabcdefghijk.abcdefghijklmnopqrstu',
97 => 'test@iana.org␊',
98 => 'test@iana.org-',
99 => '"test@iana.org',
100 => '(test@iana.org',
101 => 'test@(iana.org',
102 => 'test@[1.2.3.4',
103 => '"test\\"@iana.org',
104 => '(comment\\)test@iana.org',
105 => 'test@iana.org(comment\\)',
106 => 'test@iana.org(comment\\',
107 => 'test@[RFC-5322-domain-literal]',
108 => 'test@[RFC-5322]-domain-literal]',
109 => 'test@[RFC-5322-[domain-literal]',
110 => 'test@[RFC-5322-\\␇-domain-literal]',
111 => 'test@[RFC-5322-\\␉-domain-literal]',
112 => 'test@[RFC-5322-\\]-domain-literal]',
113 => 'test@[RFC-5322-domain-literal\\]',
114 => 'test@[RFC-5322-domain-literal\\',
115 => 'test@[RFC 5322 domain literal]',
116 => 'test@[RFC-5322-domain-literal] (comment)',
117 => '@iana.org',
118 => 'test@.org',
119 => '""@iana.org',
120 => '"\\"@iana.org',
121 => '()test@iana.org',
122 => 'test@iana.org␍',
123 => '␍test@iana.org',
124 => '"␍test"@iana.org',
125 => '(␍)test@iana.org',
126 => 'test@iana.org(␍)',
127 => '␊test@iana.org',
128 => '"␊"@iana.org',
129 => '"\\␊"@iana.org',
130 => '(␊)test@iana.org',
131 => '␇@iana.org',
132 => 'test@␇.org',
133 => '"␇"@iana.org',
134 => '"\\␇"@iana.org',
135 => '(␇)test@iana.org',
136 => '␍␊test@iana.org',
137 => '␍␊ ␍␊test@iana.org',
138 => ' ␍␊test@iana.org',
139 => ' ␍␊ test@iana.org',
140 => ' ␍␊ ␍␊test@iana.org',
141 => ' ␍␊␍␊test@iana.org',
142 => ' ␍␊␍␊ test@iana.org',
143 => 'test@iana.org␍␊ ',
144 => 'test@iana.org␍␊ ␍␊ ',
145 => 'test@iana.org␍␊',
146 => 'test@iana.org␍␊ ␍␊',
147 => 'test@iana.org ␍␊',
148 => 'test@iana.org ␍␊ ',
149 => 'test@iana.org ␍␊ ␍␊',
150 => 'test@iana.org ␍␊␍␊',
151 => 'test@iana.org ␍␊␍␊ ',
152 => ' test@iana.org',
153 => 'test@iana.org ',
154 => 'test@[IPv6:1::2:]',
155 => '"test\\©"@iana.org',
156 => 'test@iana/icann.org',
157 => 'test.(comment)test@iana.org',
158 => '"\\"@iana.org',
),
);

Please note: I indexed the array with “true” for valid emails and “false” for invalid.
Since not all libraries had DNS checking available in the “true” array are some emails, which are syntactically correct, but do NOT have a matching MX record.

Basic Regex

'/^([a-zA-Z0-9_\-\+\~\^\{\}]+[\.]?)+@{1}([a-zA-Z0-9_\-\+\~\^\{\}]+[\.]?)+\.[A-Za-z0-9]{2,}$/'
n: 181
Invalid results: 29
Accuracy: 83.978%
Duration: 0.11357188224792 s

This basic regex fails to validate the new TLD-less domain names such as “test@io” and also does not like domains such as “test@iana.a”. Other domains with double-dots occur as false-positives: “test@iana..com”

True, “.a” is not a valid TLD at the moment, but it can be. So it would be better to leave this check to MX-Record matching …

Also “!#$%&`*+/=?^`{|}~@iana.org” failed, which is syntactically correct according to the testset from http://isemail.info/about

However, it can handle IPV6 guys like “test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888]” without choking.

is_email() PHP function

http://isemail.info/about

n: 181
Invalid results: 0
Accuracy: 100%
Duration: 0.037806987762451 s

Well, since it runs on its own testset, with just 16 added “invalids” this result is not very suprising. Still good!

PHP internal function filter_var()

http://php.net/manual/de/function.filter-var.php

n: 181
Invalid results: 8
Accuracy: 95.58%
Duration: 0.0053420066833496 s

The accuracy is way better than the simple PCRE. It fails on less, but the same addresses that the simple PCRE could not handle, such as “test@[IPv6:1111:2222:3333:4444:5555:6666::8888]” and
The filter_var() function fails on IPv6 addresses such as “test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888]” and “test@[IPv6:::]”.
Also it cannot handle escaped spaces such as here: “”test\ test”@iana.org” and “test@io”

Also it accounts false-positives such as “test@org”, which is invalid.

Another blog made a really more in-depth test of this function and other regex functions: http://fightingforalostcause.net/content/misc/2006/compare-email-regex.php

The author states, that the filter_var function is the best he got in his database. However, I could not find his testset for an easy download, so I really could not compare with his data.
Still, it seems that this is a very solid regex.

PHP state machine for email testing. Extracted from Barebone CMS

https://barebonescms.com/documentation/ultimate_email_toolkit/

Well, no data? Japp.

The problem is, the function gets into an infinite loop while parsing the address "test\ test"@iana.org

Looking at the source-code line, an unterminating while loop is the culprit:


while ($email != "")
{
$currchr = substr($email, 0, 1);
$nextchr = substr($email, 1, 1);

if ($currchr == "\\")
{
if ($nextchr == "\\" || $nextchr == "\"")
{
$local .= substr($email, 0, 2);
$email = substr($email, 2);
}
else if (ord($nextchr) >= 33 && ord($nextchr) <= 126) { $local .= substr($email, 1, 1); $email = substr($email, 2); } } else if ($currchr == "\"") break; else if (ord($currchr) >= 33 && ord($nextchr) <= 126)
{
$local .= substr($email, 0, 1);
$email = substr($email, 1);
}
else $email = substr($email, 1);
}

As you can see, in the block ‘if ($currchr == “\\”)’ no else condition is defined. Since a space is not accounted for, this results in no further trimming of the string and thus an infinite loop.

Failed!

Complex PCRE

http://ex-parrot.com/~pdw/Mail-RFC822-Address.html
(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
\t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)

As said before, the regex itself did not compile for me in PHP, so built the PERL module.

n: 181
Invalid results: 53
Accuracy: 70.72%
Duration: 0.053420066833496 s

Well, this is a poor result. What happened?

It may be due to the fact, that the test are build for RFC. The module in PERL however is built for RFC822

Problem: This is not correct. According to isemail.info and the wikipedia RFC5322 is the right place to look (http://en.wikipedia.org/wiki/Email_address)

I am not going into detail which of the addresses failed and which not, cause the result is too poor.

Results:

Keep it simple: The winner is is_email() from isemail.info

Conclusion and Usage

isemail.info provides a solid PHP based email testing class in accordance to the RFC5322, which is the relevant RFC for SMTP email.

The class has a light footprint and is easy to use. Also it can do DNS lookups and integrate the result of a found MX or not found MX into the result of the is_email() function.

As theoretically perfect this result is, there is a drawback in the implementation of the function itself.
It uses the internal PHP function dns_get_record() to make the DNS check. If that function is not available (because it is not compiled with) the check is simply omitted.

The problem with “dns_get_record()” is that it uses underlying mechanics of the OS and does not support a PHP stream to mingle with. This results in the problem, that you cannot set a timeout for the function call.

So let’s say you got an unresponsive DNS the call will run forever without hitting a timeout. I tested this myself by setting up a DNS on my localhost that keeps the connection open forever.

The result was, that the function did not return. Also using set_time_limit(1) did not make any effect, since system calls do NOT ACCOUNT for running time of the script.

As good as “is_email()” is for making syntactical check, it is as bad when using it in a real life scenario. If you want to live-check an email, lets say in a webshop, the page gets unresponsive and the user will eventually abort the registration!

The golden solution: Patching is_email()

I decided to patch the is_email() class and integrate the well known NET_DNS2 class (http://pear.php.net/package/Net_DNS2)

This class uses a 5 second timeout and can even be more graceful if the domain does have an A record, but no MX.

Since all the classes were under BSD license I attached my modded class along with a short usage example.

License: http://www.opensource.org/licenses/bsd-license.php BSD License

Example:




Download: ZIP-File

Posted in PHP

cURL cannot follow redirects when open_basedir or safe_mode is enabled

Luckily we live in a time where the PHP safe_mode is deprecated.

However some legacy webspaces still have this feature, and also open_basedir is often active.

When using cURL this may be some kind of a bummer, because it prevents you from following redirects. This may be due to the fact, that cURL as a native extension would then be able to follow symlinks in the filesystem and access files which it should not be allowed to do.

The problem

You encounter the following error:


curl_setopt_array() [function.curl-setopt-array]: CURLOPT_FOLLOWLOCATION cannot be activated when safe_mode is enabled or an open_basedir is set

The solution

You will find many great approaches to circumenventing this feature when it comes to HTTP connections.
This is done by parsing the Location header directly from the returned data and issuing a new request.

On php.net you will find many solutions. While some are broken, many also work. However, I could not find a solution that worked for my problem:

I wanted to follow a redirect on a site that needed cookies and needed a correct switch of the request-method from POST to GET.

Typically the circumventions copy the cURL-handle which makes it loose the cookies. Also they do no reset the request type, as normal browsers do it.

The code

This code worked for me. Hopefully it works for you.

Note that this code is an improvement to the code from zsalab orgininally posted on php.net


function curl_exec_follow(/*resource*/ $ch, /*int*/ &$maxredirect = null, $postfields = null) {
$mr = $maxredirect === null ? 5 : intval($maxredirect);
if (ini_get('open_basedir') == '' && (ini_get('safe_mode') == 'Off' || ini_get('safe_mode') == '')) {
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, $mr > 0);
curl_setopt($ch, CURLOPT_MAXREDIRS, $mr);
} else {
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
if ($mr > 0) {
$newurl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);

curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FORBID_REUSE, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

do {
curl_setopt($ch, CURLOPT_URL, $newurl);
$header = curl_exec($ch);
if (curl_errno($ch)) {
$code = 0;
} else {
$code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($code == 301 || $code == 302) {
preg_match('/Location:(.*?)\n/', $header, $matches);
$newurl = trim(array_pop($matches));
curl_setopt($ch, CURLOPT_POSTFIELDS, null); //also switch modes after Redirect
curl_setopt($ch, CURLOPT_HTTPGET, true);
} else {
$code = 0;
}
}

} while ($code && --$mr);
if (!$mr) {
if ($maxredirect === null) {
trigger_error('Too many redirects. When following redirects, libcurl hit the maximum amount.', E_USER_WARNING);
} else {
$maxredirect = 0;
}
return false;
}
curl_setopt($ch, CURLOPT_URL, $newurl);
}
}

return curl_exec($ch);
}

Note

Your first option should however be to fix the PHP settings, as safe-mode is more of a safety issue then a help. Also open_basedir is not the best option either.

The code I posted here slows down the request by 50%. So use it only if absolutely needed.

You can implemented as standard though, as it has a fallback to use the native FOLLOW_LOCATION feature if possible. (TXH to zsalab)

Whats going on at Yahoo Sponsored Search? Yahoo Ads showing on Google?

Recently I made a new campaign using the german Yahoo! Sponsored Search.

What I did expect is the Ad to be delivered on de.search.yahoo.com as well as on bing.com, cause this is what the Search Alliance is all about.

As well all love it on Yahoo, the Ads cannot be blocked from showing on weird third-party sites, who Yahoo! thinks are interesting for my campaign. Google allows the restriction to only Google Search.

Have a look at the screenshot and see what I mean. Do you think the campaign I’m running on Yahoo! is interesting for customers on erkaltung.com (German for common cold) AND consultdomain.de? And why on earth is my site showing on buzzdock.com although the campaign is targeted for Germany? Probably, we will never know …

 

Yahoo! Sponsored Search Metrics

Yahoo! Sponsored Search Metrics

However, what confused me the most is that Google.de is showing as a URL where the Ad got delivered. I do know about the Yahoo-Google Advertising Agreement, but I thought this only means that Ads from Google can get delievered on Yahoo! not the other way round?

Does this mean that I can place the over-long ads from Yahoo! on Google Search?

Post a comment under this post if you got any idea why this is happening.

– Arne

Switched from BlackBerry to Android – Sorry to all you BlackBerry guys …

Hey Folks,

recently I changed my mobile from ye olde BlackBerry 9780 to the brand new Samsung Galaxy Nexus.

First of all, the mobile is totally awesome and so far I only had positve experiences. No crashes, perfect app-integration and the OS runs very smoothly.

Sadly this means for you, who waited for the Wunderlist for BlackBerry App, that I am not doing anymore development for BlackBerry.

Hope the guys at Wunderkinder are making good progress on their native app and make it available to everyone out there soon.

 

– Arne

Progress on Wunderlist for Blackberry …

Hey Guys,

just wanted to keep you up to date, that the Wunderlist project for Blackberry has not been droppped.

Currently I am facing major issues in deploying the application the Blackberry. It seems to me like the CodeSigning-Servers from RIM are still not really working as they should.

Only 1 out of 10 signing attempts works in my virtual machine and I currently resigned to develop further using my Virtual Machine.

Only 1 out of 10 attempts can complete the request to the signing servers

Only 1 out of 10 attempts can complete the request to the signing servers

In November I will be getting my new Windows 7 Machine and will restart working on this project. Till then it has to be freezed, cause at the moment it is the pure HORROR to develop for Blackberry.

Why do I have to sign an application that I want to run on my own phone, and why is it only possible to work under Windows. RIM I am telling you, if you make it so hard for developers you will definitely loose you place in the market.

Hopefully development will go easier on my Windows Machine in November. I will keep you guys up to date.

 

Best Regards,

Arne

Using Wunderlist with RIM Blackberry

Maybe some of you are using Wunderlist on OS X or on any other supported platform.

So am I, and I can just say that I love it since it is free, and is very good for implementing GTD.

My last GTD tool, iGTD has really gotten a bit old, since its Sync options are totally weird. Syncing with my iCal and then syncing to my Blackberry Bold 9780 never really worked. Sometimes I got the appointments and tasks twice, sometimes they were in the wrong calender and so on …

Wundelist does a great job, but it is still lacking of Blackberry support. Since I love Blackberry and certainly don’t wanna move to an iPhone or even pay for it, I decided to develop my own App that runs on Blackberry and can sync my Wunderlist data.

 

Until now, only the roadmap has been set, but it is fairly straight-forward. If you wanna download the plugin already … well you are quite to early. But check back later, or catch my RSS and you will be informed when the plugin is ready. You can also post a comment to this post, and I will eMail you when it is ready.

For all you interested guys, here is how the plugin will work:

  • Syncing the wunderlist.db via Dropbox
  • Reading the wunderlist.db via native HTML5 database support
  • Blackberry integration via PhoneGap

There are quite some steps to go, but the proof of concept has already been done. In the first version that I will be releasing next week, the plugin can only read data from the wunderlist.db. In later versions you will be able to make new notes to your Inbox.

Implementing the whole category, tag aso. stuff will never be implemented by me, cause I think this functionality does the trick when you are on your mobile. The organization of the tasks can still be done @ home.

Best,

Arne

Ported the old entries from MODx Revolution

Hey guys,

just wanted to inform you that all the old entries from MODx Revolution are now ported to WordPress.

I did not retain the links, nor did I redirect them them via 301 to the new locations … “How do you like that Google!”

I really love it if you know how to do it better, but you are just to lazy. Since I do not run a SEO blog, I think I can handle the trust Google looses in my site by just killing the links ;)

Best,

Arne

Configure SMTP-Auth on exim4

Ever tried to get STMP-Auth running on your own exim4 instance?

Well it is really not much of a problem if you are running Debian and have access to the infamous internet.

This post is basically a copy-cat of the great post from debian-administration.org on HowTo Setup Basic SMTP AUTH in Exim4

The post was a great help, but afterwards my server was still not accepting my SMTP request to send an email. Some people in the comments complained that it was still not working for them, but since the post is inactive for more than a year I decided to post it on my blog.

So lets start …

We assume you have exim4 running, all mails get delivered to the correspondig home-dirs and you can access your server via STMP (PORT 25) without SSL or TLS to send an email to a non-relayed host. (Means to a local mail recipient).

I will copy now the steps from debian-administration.org in case the posts is going offline …

We need to generate a self-signed SSL-certificate by calling

 
/usr/share/doc/exim4-base/examples/exim-gencert
Be sure to add the certificate to your keychain once you connect later on.

Then go to
/etc/exim4/conf.d/auth/30_exim4-config_examples
and uncomment this whole bunch
# plain_server:
# driver = plaintext
# public_name = PLAIN
# server_condition = "${if crypteq{$auth3}{${extract{1}{:}{${lookup{$auth2}lsearch{CONFDIR/passwd}{$value}{*:*}}}}}{1}{0}}"
# server_set_id = $auth2
# server_prompts = :
# .ifndef AUTH_SERVER_ALLOW_NOTLS_PASSWORDS
# server_advertise_condition = ${if eq{$tls_cipher}{}{}{*}}
# .endif

and this whole bunch
# login_server:
# driver = plaintext
# public_name = LOGIN
# server_prompts = "Username:: : Password::"
# server_condition = "${if crypteq{$auth2}{${extract{1}{:}{${lookup{$auth1}lsearch{CONFDIR/passwd}{$value}{*:*}}}}}{1}{0}}"
# server_set_id = $auth1
# .ifndef AUTH_SERVER_ALLOW_NOTLS_PASSWORDS
# server_advertise_condition = ${if eq{$tls_cipher}{}{}{*}}
# .endif

These steps will enable you to login via auth plain and auth login. Depending on your eMail program you may need the one or the other. It is save to enable both, your program will choose the correct automatically.

Then the tutorials says to add the line
MAIN_TLS_ENABLE = true
to the file
/etc/exim4/conf.d/main/01_exim4-config_listmacrosdefs
This configuration is correct but it makes debugging HARD, since now your server does not respond with 250-AUTH plain when you do an EHLO localhost via telnet on your server. You first have to do a STARTTLS or use openssl in the first place ;).

Before we de create a new user via
/usr/share/doc/exim4-base/examples/exim-adduser
and the restart via
update-exim4.conf
/etc/init.d/exim4 restart

Now we connect through openssl by calling this command
openssl s_client -host my.server.name -port 25 -starttls smtp
 

and everything should be working fine.

If you receive the error like 435 Unable to authenticate at present” then maybe exim4 cannot read your passwd file under /etc/exim4/passwd for debugging try to set it to 777, but if it works set it to the correct value, according to the group exim4 is in.

 

A good german post on testing SMTP-Auth with telnet is on computer-tipps.info : Testing SMTP with Telnet