Using catch from Throwable instead of Exception

When I write PHP-CLI applications or when I use PHP for scripting only, I often find myself in the situation to create a safe block.

The block is basically a try-catch block that encapsulates the whole code.
In PHP-CLI applications it oftens allows for retrying of certain things via fgets(STDIN).
In scripting applications it is mostly an error mail sending and ROLLBACK command for the database.

If the project I am working on is non framework-based but spans multiple files, I often register error handlers to get alerted when anything doesn’t work as expected. My classic code looks about like this:


<?php

function bindCustomErrorHandler() {

// We should use our custom function to handle errors.
error_reporting(-1);
ini_set('display_startup_errors', 1);

set_error_handler('nachsenden_error_handler');
}

function unbindCustomErrorHandler() {

error_reporting(0);
ini_set('display_startup_errors', 0);
restore_error_handler();

}

function generateCallTrace()
{
$e = new Exception();
$trace = explode("\n", $e->getTraceAsString());
// reverse array to make steps line up chronologically

array_shift($trace); // remove call to this method
array_pop($trace); // remove {main}

return "\t" . implode("\n\t", $trace);
}

function sendErrorMail($message, $attach_dump = true, $additional_receipients = array()){

# since this is the last bastion in error handling, we may NOT produce an error here
unbindCustomErrorHandler();

// code to mail the error

# rebuild the error handler
bindCustomErrorHandler();

return $mailer_return;

}

// Our custom error handler
function custom_error_handler($number, $message, $file, $line, $vars)
{

$email = "
An error ($number) occurred on line $line and in the file: $file.
----
$message
----
Backtrace:
". generateCallTrace(). "
------
Vars: ". print_r($vars, true). PHP_EOL ;

sendErrorMail($email, true, array(__MAIL__ADDRESS__));

if ( ($number !== E_NOTICE) && ($number < 2048) ) {
if(!headers_sent() && defined('ERROR_TARGET')) {
header('Location: '. ERROR_TARGET);
die();
} elseif(defined('ERROR_TARGET')) {
die('Error! Please try again later. Will be redirecting: <meta http-equiv="refresh" content="0; URL='.ERROR_TARGET.'">');
} else {
sendErrorMail("Warning: Error target not defined for error and header already sent!");
die("Error! Please try again later.");
}
}
return true;
}

function exception_handler($e) {

sendErrorMail("Uncaught Exception on line {$e->getLine()} of file {$e->getFile()}: {$e->getMessage()}\n\nTrace: {$e->getTraceAsString()}\n\n", true, array(__MAIL__ADDRESS__));

if(!headers_sent() && defined('ERROR_TARGET')) {
header('Location: '. ERROR_TARGET);
die();
} elseif(defined('ERROR_TARGET')) {
die('Error! Please try again later. Will be redirecting: <meta http-equiv="refresh" content="0; URL='.ERROR_TARGET.'">');
} else {
sendErrorMail("Warning: Error target not defined for exception and header already sent!");
die("Error! Please try again later.");
}
}
set_exception_handler('exception_handler');

function shutDownFunction() {
$error = error_get_last();
if(!is_null($error)) {

sendErrorMail('Shutdown Error - will try to redirect after this: '.var_export($error, true), true, array(__MAIL__ADDRESS__));

if(!headers_sent() && defined('ERROR_TARGET')) {
header('Location: '. ERROR_TARGET);
die();
} elseif(defined('ERROR_TARGET')) {
die('Error! Please try again later. Will be redirecting: <meta http-equiv="refresh" content="0; URL='.ERROR_TARGET.'">');
} else {
sendErrorMail("Warning: Error target not defined for shutdown_handler and header already sent!");
die("Error! Please try again later.");
}
}
}

register_shutdown_function('shutdownFunction');

bindCustomErrorHandler();

date_default_timezone_set('Europe/Berlin');
libxml_use_internal_errors(true);

It is very little commented, but I think it is self-explanatory. Basically for all errors that can be detected from inside PHP I register a handler that tells me, that an error occured.
If possible I attach dumps and tracebacks. For E_NOTICE I continue processing.

One important part I learned is to unregister the error_handler when sending the mail, but actually rather suppress all errors. When using 3rd-Party libraries here an unexpected E_NOTICE may come in unhandy 😉

What this code can’t rescue is errors that lead to HTTP 500 errors. Therefore: Always watch your logs!

The new way in PHP 7

It is always better however to catch errors where they occur. Here you can attach better dumps and be more in context.

In PHP 7 this is now also possible for runtime errors like accessing an element in an unexpected way (json_decode[‘x’] I am looking at you!) or having syntax errors in included files.

So: Other than in Ruby, where it is generally not advised to rescue from the most basic Exception class (as you can easily interfere with process management), this is not true for PHP.
You can also catch SIGKILL or other signals, but this is not done through try-catch.

Therefore, catching Throwable instead of Exception should always be considered a good option.

Validating email in the world of PHP

Introduction

Recently I found myself AGAIN with the problem to validate an email address in PHP.

You might think that this problem has been addressed a million times before and there is a fool proof solution out there.

One of the best results google yields is certainly this stackoverflow article.
It gives you a very good outline why using a simple regex is most languages not a solution to comply with the very complex format outlined in the RFC 5321.

However PCREs (Perl Compatible Regular Expressions) which are available in PHP can do the job according to the stackoverflow post. In the article the following post is linked: http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

As you can see it contains a ridiulous regex, which, as the author states, has been autogenerated. When I tried to run the regex PHP complained with the error:
Warning: preg_match(): Compilation failed: unmatched parentheses at offset 1551 in

The other provided also the original PERL module, which is pretty easy to compile. I tried that too, more on the results later.

Other promising approaches was http://isemail.info/about , where the author has written a very easy to integrate PHP function that basically uses a for loop to iterate over the email address and validate it according to his implementation of the RFC 5321.
This class can also do DNS lookups on the domain to check if an MX record is present.

The final most promising approach was a PHP state machine extracted from the PHP-CMS Barebone (Ultimate email toolkit) which also could give suggestions on what the correct email address could be, if the validation failed.
You could use such a class to be very forthcoming to the user.

Example: User types “arne.tarara.@googlemail.com” – A classic!

The class would then yield “Validation failed! Suggestion: arne.tarara@googlemail.com”

Also the class could do DNS lookups, so basically a one-size-fits-all solution!

Alright, let’s dive into the the Tests.

Testing Results

Candidates

  • Basic PCRE, often used ‘/^([a-zA-Z0-9_\-\+\~\^\{\}]+[\.]?)+@{1}([a-zA-Z0-9_\-\+\~\^\{\}]+[\.]?)+\.[A-Za-z0-9]{2,}$/’
  • PHP internal function filter_var() http://php.net/manual/de/function.filter-var.php
  • is_email() PHP function http://isemail.info/about
  • PHP state machine for email testing. Extracted from Barebone CMS https://barebonescms.com/documentation/ultimate_email_toolkit/
  • Complex PCRE http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

Testing set

I used the tests bundled with is_email() on https://code.google.com/p/isemail/downloads/detail?name=is_email-3.01.zip&can=2&q= and I also used the tests bundled with the perl module on http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

Both did contain negatives and positives. However, I had to eliminate all the positvies from the perl module, because it claimed “abigail@example.com ” (note the space at the end) to be a valid email address, which it is clearly not. Other cases where “abigail @example.com” etc.
The most confusing case was: “*()@[]” => How is that a valid email? the local part, well maybe, but the hostname?

I reduced the set, as said before, which resulted in:


array (
1 =>
array (
45 => 'test@io',
46 => 'test@iana.org',
47 => 'test@nominet.org.uk',
48 => 'test@about.museum',
49 => 'a@iana.org',
50 => 'test@e.com',
51 => 'test@iana.a',
52 => 'test.test@iana.org',
53 => '!#$%&`*+/=?^`{|}~@iana.org',
54 => '123@iana.org',
55 => 'test@123.com',
56 => 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm@iana.org',
57 => 'test@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.com',
58 => 'test@mason-dixon.com',
59 => 'test@g--a.com',
60 => 'test@iana.co-uk',
61 => 'a@a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v',
62 => 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghi',
63 => 'test@xn--hxajbheg2az3al.xn--jxalpdlp',
64 => 'xn--test@iana.org',
65 => 'test@test.com',
66 => 'test@nic.no',
67 => 'test@org',
68 => 'test@iana.123',
69 => 'test@255.255.255.255',
70 => 'test@[IPv6:1111:2222:3333:4444::255.255.255.255]',
71 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:255.255.255.255]',
72 => 'test@[IPv6:::]',
73 => 'test@[IPv6:::3333:4444:5555:6666:7777:8888]',
74 => 'test@[IPv6:1111:2222:3333:4444:5555::8888]',
75 => '"test"@iana.org',
76 => '""@iana.org',
77 => '"\\a"@iana.org',
78 => '"\\""@iana.org',
79 => '"test\\ test"@iana.org',
80 => 'test@[255.255.255.255]',
81 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888]',
82 => 'test@[IPv6:1111:2222:3333:4444:5555:6666::8888]',
83 => '"\\\\"@iana.org',
),
0 =>
array (
0 => 'Just a string',
1 => 'string',
2 => '(comment)',
3 => '()@example.com',
4 => 'fred(&)barny@example.com',
5 => 'fred\\ barny@example.com',
6 => 'Abigail ',
7 => 'Abigail <abigail(fo(o)@example.com>',
8 => 'Abigail <abigail(fo)o)@example.com>',
9 => '"Abi"gail" <abigail@example.com>',
10 => 'abigail@[exa]ple.com]',
11 => 'abigail@[exa[ple.com]',
12 => 'abigail@[exaple].com]',
13 => 'abigail@',
14 => '@example.com',
15 => 'phrase: abigail@example.com abigail@example.com ;',
16 => 'invalid£char@example.com',
17 => '',
18 => 'test',
19 => '@',
20 => 'test@',
21 => '@io',
22 => '@iana.org',
23 => '.test@iana.org',
24 => 'test.@iana.org',
25 => 'test..iana.org',
26 => 'test_exa-mple.com',
27 => 'test\\@test@iana.org',
30 => 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklmn@iana.org',
31 => 'test@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm.com',
32 => 'test@-iana.org',
33 => 'test@iana-.com',
34 => 'test@.iana.org',
35 => 'test@iana.org.',
36 => 'test@iana..com',
37 => 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghij',
38 => 'a@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefg.hij',
39 => 'a@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefg.hijk',
42 => '"""@iana.org',
47 => 'test"@iana.org',
48 => '"test@iana.org',
49 => '"test"test@iana.org',
50 => 'test"text"@iana.org',
51 => '"test""test"@iana.org',
52 => '"test"."test"@iana.org',
54 => '"test".test@iana.org',
55 => '"test␀"@iana.org',
56 => '"test\\␀"@iana.org',
57 => '"abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefghj"@iana.org',
58 => '"abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefg\\h"@iana.org',
60 => 'test@a[255.255.255.255]',
61 => 'test@[255.255.255]',
62 => 'test@[255.255.255.255.255]',
63 => 'test@[255.255.255.256]',
64 => 'test@[1111:2222:3333:4444:5555:6666:7777:8888]',
65 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:7777]',
67 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888:9999]',
68 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:7777:888G]',
71 => 'test@[IPv6:1111:2222:3333:4444:5555:6666::7777:8888]',
72 => 'test@[IPv6::3333:4444:5555:6666:7777:8888]',
74 => 'test@[IPv6:1111::4444:5555::8888]',
76 => 'test@[IPv6:1111:2222:3333:4444:5555:255.255.255.255]',
78 => 'test@[IPv6:1111:2222:3333:4444:5555:6666:7777:255.255.255.255]',
80 => 'test@[IPv6:1111:2222:3333:4444:5555:6666::255.255.255.255]',
81 => 'test@[IPv6:1111:2222:3333:4444:::255.255.255.255]',
82 => 'test@[IPv6::255.255.255.255]',
83 => ' test @iana.org',
84 => 'test@ iana .com',
85 => 'test . test@iana.org',
86 => '␍␊ test@iana.org',
87 => '␍␊ ␍␊ test@iana.org',
88 => '(comment)test@iana.org',
89 => '((comment)test@iana.org',
90 => '(comment(comment))test@iana.org',
91 => 'test@(comment)iana.org',
92 => 'test(comment)test@iana.org',
93 => 'test@(comment)[255.255.255.255]',
94 => '(comment)abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm@iana.org',
95 => 'test@(comment)abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.com',
96 => '(comment)test@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghik.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghik.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk.abcdefghijklmnopqrstuvwxyzabcdefghijk.abcdefghijklmnopqrstu',
97 => 'test@iana.org␊',
98 => 'test@iana.org-',
99 => '"test@iana.org',
100 => '(test@iana.org',
101 => 'test@(iana.org',
102 => 'test@[1.2.3.4',
103 => '"test\\"@iana.org',
104 => '(comment\\)test@iana.org',
105 => 'test@iana.org(comment\\)',
106 => 'test@iana.org(comment\\',
107 => 'test@[RFC-5322-domain-literal]',
108 => 'test@[RFC-5322]-domain-literal]',
109 => 'test@[RFC-5322-[domain-literal]',
110 => 'test@[RFC-5322-\\␇-domain-literal]',
111 => 'test@[RFC-5322-\\␉-domain-literal]',
112 => 'test@[RFC-5322-\\]-domain-literal]',
113 => 'test@[RFC-5322-domain-literal\\]',
114 => 'test@[RFC-5322-domain-literal\\',
115 => 'test@[RFC 5322 domain literal]',
116 => 'test@[RFC-5322-domain-literal] (comment)',
117 => '@iana.org',
118 => 'test@.org',
119 => '""@iana.org',
120 => '"\\"@iana.org',
121 => '()test@iana.org',
122 => 'test@iana.org␍',
123 => '␍test@iana.org',
124 => '"␍test"@iana.org',
125 => '(␍)test@iana.org',
126 => 'test@iana.org(␍)',
127 => '␊test@iana.org',
128 => '"␊"@iana.org',
129 => '"\\␊"@iana.org',
130 => '(␊)test@iana.org',
131 => '␇@iana.org',
132 => 'test@␇.org',
133 => '"␇"@iana.org',
134 => '"\\␇"@iana.org',
135 => '(␇)test@iana.org',
136 => '␍␊test@iana.org',
137 => '␍␊ ␍␊test@iana.org',
138 => ' ␍␊test@iana.org',
139 => ' ␍␊ test@iana.org',
140 => ' ␍␊ ␍␊test@iana.org',
141 => ' ␍␊␍␊test@iana.org',
142 => ' ␍␊␍␊ test@iana.org',
143 => 'test@iana.org␍␊ ',
144 => 'test@iana.org␍␊ ␍␊ ',
145 => 'test@iana.org␍␊',
146 => 'test@iana.org␍␊ ␍␊',
147 => 'test@iana.org ␍␊',
148 => 'test@iana.org ␍␊ ',
149 => 'test@iana.org ␍␊ ␍␊',
150 => 'test@iana.org ␍␊␍␊',
151 => 'test@iana.org ␍␊␍␊ ',
152 => ' test@iana.org',
153 => 'test@iana.org ',
154 => 'test@[IPv6:1::2:]',
155 => '"test\\©"@iana.org',
156 => 'test@iana/icann.org',
157 => 'test.(comment)test@iana.org',
158 => '"\\"@iana.org',
),
);

Please note: I indexed the array with “true” for valid emails and “false” for invalid.
Since not all libraries had DNS checking available in the “true” array are some emails, which are syntactically correct, but do NOT have a matching MX record.

Basic Regex

'/^([a-zA-Z0-9_\-\+\~\^\{\}]+[\.]?)+@{1}([a-zA-Z0-9_\-\+\~\^\{\}]+[\.]?)+\.[A-Za-z0-9]{2,}$/'
n: 181
Invalid results: 29
Accuracy: 83.978%
Duration: 0.11357188224792 s

This basic regex fails to validate the new TLD-less domain names such as “test@io” and also does not like domains such as “test@iana.a”. Other domains with double-dots occur as false-positives: “test@iana..com”

True, “.a” is not a valid TLD at the moment, but it can be. So it would be better to leave this check to MX-Record matching …

Also “!#$%&`*+/=?^`{|}~@iana.org” failed, which is syntactically correct according to the testset from http://isemail.info/about

However, it can handle IPV6 guys like “test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888]” without choking.

is_email() PHP function

http://isemail.info/about

n: 181
Invalid results: 0
Accuracy: 100%
Duration: 0.037806987762451 s

Well, since it runs on its own testset, with just 16 added “invalids” this result is not very suprising. Still good!

PHP internal function filter_var()

http://php.net/manual/de/function.filter-var.php

n: 181
Invalid results: 8
Accuracy: 95.58%
Duration: 0.0053420066833496 s

The accuracy is way better than the simple PCRE. It fails on less, but the same addresses that the simple PCRE could not handle, such as “test@[IPv6:1111:2222:3333:4444:5555:6666::8888]” and
The filter_var() function fails on IPv6 addresses such as “test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888]” and “test@[IPv6:::]”.
Also it cannot handle escaped spaces such as here: “”test\ test”@iana.org” and “test@io”

Also it accounts false-positives such as “test@org”, which is invalid.

Another blog made a really more in-depth test of this function and other regex functions: http://fightingforalostcause.net/content/misc/2006/compare-email-regex.php

The author states, that the filter_var function is the best he got in his database. However, I could not find his testset for an easy download, so I really could not compare with his data.
Still, it seems that this is a very solid regex.

PHP state machine for email testing. Extracted from Barebone CMS

https://barebonescms.com/documentation/ultimate_email_toolkit/

Well, no data? Japp.

The problem is, the function gets into an infinite loop while parsing the address "test\ test"@iana.org

Looking at the source-code line, an unterminating while loop is the culprit:


while ($email != "")
{
$currchr = substr($email, 0, 1);
$nextchr = substr($email, 1, 1);

if ($currchr == "\\")
{
if ($nextchr == "\\" || $nextchr == "\"")
{
$local .= substr($email, 0, 2);
$email = substr($email, 2);
}
else if (ord($nextchr) >= 33 && ord($nextchr) <= 126) { $local .= substr($email, 1, 1); $email = substr($email, 2); } } else if ($currchr == "\"") break; else if (ord($currchr) >= 33 && ord($nextchr) <= 126)
{
$local .= substr($email, 0, 1);
$email = substr($email, 1);
}
else $email = substr($email, 1);
}

As you can see, in the block ‘if ($currchr == “\\”)’ no else condition is defined. Since a space is not accounted for, this results in no further trimming of the string and thus an infinite loop.

Failed!

Complex PCRE

http://ex-parrot.com/~pdw/Mail-RFC822-Address.html
(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
\t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)

As said before, the regex itself did not compile for me in PHP, so built the PERL module.

n: 181
Invalid results: 53
Accuracy: 70.72%
Duration: 0.053420066833496 s

Well, this is a poor result. What happened?

It may be due to the fact, that the test are build for RFC. The module in PERL however is built for RFC822

Problem: This is not correct. According to isemail.info and the wikipedia RFC5322 is the right place to look (http://en.wikipedia.org/wiki/Email_address)

I am not going into detail which of the addresses failed and which not, cause the result is too poor.

Results:

Keep it simple: The winner is is_email() from isemail.info

Conclusion and Usage

isemail.info provides a solid PHP based email testing class in accordance to the RFC5322, which is the relevant RFC for SMTP email.

The class has a light footprint and is easy to use. Also it can do DNS lookups and integrate the result of a found MX or not found MX into the result of the is_email() function.

As theoretically perfect this result is, there is a drawback in the implementation of the function itself.
It uses the internal PHP function dns_get_record() to make the DNS check. If that function is not available (because it is not compiled with) the check is simply omitted.

The problem with “dns_get_record()” is that it uses underlying mechanics of the OS and does not support a PHP stream to mingle with. This results in the problem, that you cannot set a timeout for the function call.

So let’s say you got an unresponsive DNS the call will run forever without hitting a timeout. I tested this myself by setting up a DNS on my localhost that keeps the connection open forever.

The result was, that the function did not return. Also using set_time_limit(1) did not make any effect, since system calls do NOT ACCOUNT for running time of the script.

As good as “is_email()” is for making syntactical check, it is as bad when using it in a real life scenario. If you want to live-check an email, lets say in a webshop, the page gets unresponsive and the user will eventually abort the registration!

The golden solution: Patching is_email()

I decided to patch the is_email() class and integrate the well known NET_DNS2 class (http://pear.php.net/package/Net_DNS2)

This class uses a 5 second timeout and can even be more graceful if the domain does have an A record, but no MX.

Since all the classes were under BSD license I attached my modded class along with a short usage example.

License: http://www.opensource.org/licenses/bsd-license.php BSD License

Example:




Download: ZIP-File

cURL cannot follow redirects when open_basedir or safe_mode is enabled

Luckily we live in a time where the PHP safe_mode is deprecated.

However some legacy webspaces still have this feature, and also open_basedir is often active.

When using cURL this may be some kind of a bummer, because it prevents you from following redirects. This may be due to the fact, that cURL as a native extension would then be able to follow symlinks in the filesystem and access files which it should not be allowed to do.

The problem

You encounter the following error:


curl_setopt_array() [function.curl-setopt-array]: CURLOPT_FOLLOWLOCATION cannot be activated when safe_mode is enabled or an open_basedir is set

The solution

You will find many great approaches to circumenventing this feature when it comes to HTTP connections.
This is done by parsing the Location header directly from the returned data and issuing a new request.

On php.net you will find many solutions. While some are broken, many also work. However, I could not find a solution that worked for my problem:

I wanted to follow a redirect on a site that needed cookies and needed a correct switch of the request-method from POST to GET.

Typically the circumventions copy the cURL-handle which makes it loose the cookies. Also they do no reset the request type, as normal browsers do it.

The code

This code worked for me. Hopefully it works for you.

Note that this code is an improvement to the code from zsalab orgininally posted on php.net


function curl_exec_follow(/*resource*/ $ch, /*int*/ &$maxredirect = null, $postfields = null) {
$mr = $maxredirect === null ? 5 : intval($maxredirect);
if (ini_get('open_basedir') == '' && (ini_get('safe_mode') == 'Off' || ini_get('safe_mode') == '')) {
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, $mr > 0);
curl_setopt($ch, CURLOPT_MAXREDIRS, $mr);
} else {
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
if ($mr > 0) {
$newurl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);

curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FORBID_REUSE, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

do {
curl_setopt($ch, CURLOPT_URL, $newurl);
$header = curl_exec($ch);
if (curl_errno($ch)) {
$code = 0;
} else {
$code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($code == 301 || $code == 302) {
preg_match('/Location:(.*?)\n/', $header, $matches);
$newurl = trim(array_pop($matches));
curl_setopt($ch, CURLOPT_POSTFIELDS, null); //also switch modes after Redirect
curl_setopt($ch, CURLOPT_HTTPGET, true);
} else {
$code = 0;
}
}

} while ($code && --$mr);
if (!$mr) {
if ($maxredirect === null) {
trigger_error('Too many redirects. When following redirects, libcurl hit the maximum amount.', E_USER_WARNING);
} else {
$maxredirect = 0;
}
return false;
}
curl_setopt($ch, CURLOPT_URL, $newurl);
}
}

return curl_exec($ch);
}

Note

Your first option should however be to fix the PHP settings, as safe-mode is more of a safety issue then a help. Also open_basedir is not the best option either.

The code I posted here slows down the request by 50%. So use it only if absolutely needed.

You can implemented as standard though, as it has a fallback to use the native FOLLOW_LOCATION feature if possible. (TXH to zsalab)