PHP validation/regex for URL

PhpRegexUrlValidation

Php Problem Overview


I've been looking for a simple regex for URLs, does anybody have one handy that works well? I didn't find one with the zend framework validation classes and have seen several implementations.

Php Solutions


Solution 1 - Php

Use the filter_var() function to validate whether a string is URL or not:

var_dump(filter_var('example.com', FILTER_VALIDATE_URL));

It is bad practice to use regular expressions when not necessary.

EDIT: Be careful, this solution is not unicode-safe and not XSS-safe. If you need a complex validation, maybe it's better to look somewhere else.

Solution 2 - Php

I used this on a few projects, I don't believe I've run into issues, but I'm sure it's not exhaustive:

$text = preg_replace(
  '#((https?|ftp)://(\S*?\.\S*?))([\s)\[\]{},;"\':<]|\.\s|$)#i',
  "'<a href=\"$1\" target=\"_blank\">$3</a>$4'",
  $text
);

Most of the random junk at the end is to deal with situations like http://domain.com. in a sentence (to avoid matching the trailing period). I'm sure it could be cleaned up but since it worked. I've more or less just copied it over from project to project.

Solution 3 - Php

As per the PHP manual - parse_url should not be used to validate a URL.

Unfortunately, it seems that filter_var('example.com', FILTER_VALIDATE_URL) does not perform any better.

Both parse_url() and filter_var() will pass malformed URLs such as http://...

Therefore in this case - regex is the better method.

Solution 4 - Php

As per John Gruber (Daring Fireball):

Regex:

(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))

using in preg_match():

preg_match("/(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/", $url)

Here is the extended regex pattern (with comments):

(?xi)
\b
(                       # Capture 1: entire matched URL
  (?:
    https?://               # http or https protocol
    |                       #   or
    www\d{0,3}[.]           # "www.", "www1.", "www2." … "www999."
    |                           #   or
    [a-z0-9.\-]+[.][a-z]{2,4}/  # looks like domain name followed by a slash
  )
  (?:                       # One or more:
    [^\s()<>]+                  # Run of non-space, non-()<>
    |                           #   or
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\)  # balanced parens, up to 2 levels
  )+
  (?:                       # End with:
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\)  # balanced parens, up to 2 levels
    |                               #   or
    [^\s`!()\[\]{};:'".,<>?«»“”‘’]        # not a space or one of these punct chars
  )
)

For more details please look at: http://daringfireball.net/2010/07/improved_regex_for_matching_urls

Solution 5 - Php

Just in case you want to know if the url really exists:

function url_exist($url){//se passar a URL existe
    $c=curl_init();
    curl_setopt($c,CURLOPT_URL,$url);
    curl_setopt($c,CURLOPT_HEADER,1);//get the header
    curl_setopt($c,CURLOPT_NOBODY,1);//and *only* get the header
    curl_setopt($c,CURLOPT_RETURNTRANSFER,1);//get the response as a string from curl_exec(), rather than echoing it
    curl_setopt($c,CURLOPT_FRESH_CONNECT,1);//don't use a cached version of the url
    if(!curl_exec($c)){
		//echo $url.' inexists';
		return false;
	}else{
		//echo $url.' exists';
		return true;
	}
    //$httpcode=curl_getinfo($c,CURLINFO_HTTP_CODE);
    //return ($httpcode<400);
}

Solution 6 - Php

I don't think that using regular expressions is a smart thing to do in this case. It is impossible to match all of the possibilities and even if you did, there is still a chance that url simply doesn't exist.

Here is a very simple way to test if url actually exists and is readable :

if (preg_match("#^https?://.+#", $link) and @fopen($link,"r")) echo "OK";

(if there is no preg_match then this would also validate all filenames on your server)

Solution 7 - Php

I've used this one with good success - I don't remember where I got it from

$pattern = "/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i";

Solution 8 - Php

    function validateURL($URL) {
      $pattern_1 = "/^(http|https|ftp):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
      $pattern_2 = "/^(www)((\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";		
      if(preg_match($pattern_1, $URL) || preg_match($pattern_2, $URL)){
        return true;
      } else{
        return false;
      }
    }

Solution 9 - Php

The best URL Regex that worked for me:

function valid_URL($url){
	return preg_match('%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu', $url);
}

Examples:

valid_URL('https://twitter.com'); // true
valid_URL('http://twitter.com');  // true
valid_URL('http://twitter.co');   // true
valid_URL('http://t.co');         // true
valid_URL('http://twitter.c');    // false
valid_URL('htt://twitter.com');   // false

valid_URL('http://example.com/?a=1&b=2&c=3'); // true
valid_URL('http://127.0.0.1');    // true
valid_URL('');                    // false
valid_URL(1);                     // false

Source: http://urlregex.com/

Solution 10 - Php

Edit:
As incidence pointed out this code has been DEPRECATED with the release of PHP 5.3.0 (2009-06-30) and should be used accordingly.


Just my two cents but I've developed this function and have been using it for a while with success. It's well documented and separated so you can easily change it.

// Checks if string is a URL
// @param string $url
// @return bool
function isURL($url = NULL) {
	if($url==NULL) return false;
	
	$protocol = '(http://|https://)';
	$allowed = '([a-z0-9]([-a-z0-9]*[a-z0-9]+)?)';
	
	$regex = "^". $protocol . // must include the protocol
			 '(' . $allowed . '{1,63}\.)+'. // 1 or several sub domains with a max of 63 chars
			 '[a-z]' . '{2,6}'; // followed by a TLD
	if(eregi($regex, $url)==true) return true;
	else return false;
}

Solution 11 - Php

And there is your answer =) Try to break it, you can't!!!

function link_validate_url($text) {
$LINK_DOMAINS = 'aero|arpa|asia|biz|com|cat|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|mobi|local';
  $LINK_ICHARS_DOMAIN = (string) html_entity_decode(implode("", array( // @TODO completing letters ...
    "&#x00E6;", // æ
    "&#x00C6;", // Æ
    "&#x00C0;", // À
    "&#x00E0;", // à
    "&#x00C1;", // Á
    "&#x00E1;", // á
    "&#x00C2;", // Â
    "&#x00E2;", // â
    "&#x00E5;", // å
    "&#x00C5;", // Å
    "&#x00E4;", // ä
    "&#x00C4;", // Ä
    "&#x00C7;", // Ç
    "&#x00E7;", // ç
    "&#x00D0;", // Ð
    "&#x00F0;", // ð
    "&#x00C8;", // È
    "&#x00E8;", // è
    "&#x00C9;", // É
    "&#x00E9;", // é
    "&#x00CA;", // Ê
    "&#x00EA;", // ê
    "&#x00CB;", // Ë
    "&#x00EB;", // ë
    "&#x00CE;", // Î
    "&#x00EE;", // î
    "&#x00CF;", // Ï
    "&#x00EF;", // ï
    "&#x00F8;", // ø
    "&#x00D8;", // Ø
    "&#x00F6;", // ö
    "&#x00D6;", // Ö
    "&#x00D4;", // Ô
    "&#x00F4;", // ô
    "&#x00D5;",	// Õ
    "&#x00F5;",	// õ
    "&#x0152;", // Œ
    "&#x0153;", // œ
    "&#x00FC;", // ü
    "&#x00DC;", // Ü
    "&#x00D9;", // Ù
    "&#x00F9;", // ù
    "&#x00DB;", // Û
    "&#x00FB;", // û
    "&#x0178;", // Ÿ
    "&#x00FF;", // ÿ 
    "&#x00D1;", // Ñ
    "&#x00F1;", // ñ
    "&#x00FE;", // þ
    "&#x00DE;", // Þ
    "&#x00FD;", // ý
    "&#x00DD;", // Ý
    "&#x00BF;", // ¿
  )), ENT_QUOTES, 'UTF-8');

  $LINK_ICHARS = $LINK_ICHARS_DOMAIN . (string) html_entity_decode(implode("", array(
    "&#x00DF;", // ß
  )), ENT_QUOTES, 'UTF-8');
  $allowed_protocols = array('http', 'https', 'ftp', 'news', 'nntp', 'telnet', 'mailto', 'irc', 'ssh', 'sftp', 'webcal');

  // Starting a parenthesis group with (?: means that it is grouped, but is not captured
  $protocol = '((?:'. implode("|", $allowed_protocols) .'):\/\/)';
  $authentication = "(?:(?:(?:[\w\.\-\+!$&'\(\)*\+,;=" . $LINK_ICHARS . "]|%[0-9a-f]{2})+(?::(?:[\w". $LINK_ICHARS ."\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})*)?)?@)";
  $domain = '(?:(?:[a-z0-9' . $LINK_ICHARS_DOMAIN . ']([a-z0-9'. $LINK_ICHARS_DOMAIN . '\-_\[\]])*)(\.(([a-z0-9' . $LINK_ICHARS_DOMAIN . '\-_\[\]])+\.)*('. $LINK_DOMAINS .'|[a-z]{2}))?)';
  $ipv4 = '(?:[0-9]{1,3}(\.[0-9]{1,3}){3})';
  $ipv6 = '(?:[0-9a-fA-F]{1,4}(\:[0-9a-fA-F]{1,4}){7})';
  $port = '(?::([0-9]{1,5}))';

  // Pattern specific to external links.
  $external_pattern = '/^'. $protocol .'?'. $authentication .'?('. $domain .'|'. $ipv4 .'|'. $ipv6 .' |localhost)'. $port .'?';

  // Pattern specific to internal links.
  $internal_pattern = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]]+)";
  $internal_pattern_file = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]\.]+)$/i";

  $directories = "(?:\/[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'#!():;*@\[\]]*)*";
  // Yes, four backslashes == a single backslash.
  $query = "(?:\/?\?([?a-z0-9". $LINK_ICHARS ."+_|\-\.~\/\\\\%=&,$'():;*@\[\]{} ]*))";
  $anchor = "(?:#[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'():;*@\[\]\/\?]*)";

  // The rest of the path for a standard URL.
  $end = $directories .'?'. $query .'?'. $anchor .'?'.'$/i';

  $message_id = '[^@].*@'. $domain;
  $newsgroup_name = '(?:[0-9a-z+-]*\.)*[0-9a-z+-]*';
  $news_pattern = '/^news:('. $newsgroup_name .'|'. $message_id .')$/i';

  $user = '[a-zA-Z0-9'. $LINK_ICHARS .'_\-\.\+\^!#\$%&*+\/\=\?\`\|\{\}~\'\[\]]+';
  $email_pattern = '/^mailto:'. $user .'@'.'(?:'. $domain .'|'. $ipv4 .'|'. $ipv6 .'|localhost)'. $query .'?$/';

  if (strpos($text, '<front>') === 0) {
    return false;
  }
  if (in_array('mailto', $allowed_protocols) && preg_match($email_pattern, $text)) {
    return false;
  }
  if (in_array('news', $allowed_protocols) && preg_match($news_pattern, $text)) {
    return false;
  }
  if (preg_match($internal_pattern . $end, $text)) {
    return false;
  }
  if (preg_match($external_pattern . $end, $text)) {
    return false;
  }
  if (preg_match($internal_pattern_file, $text)) {
    return false;
  }

  return true;
}

Solution 12 - Php

function is_valid_url ($url="") {

        if ($url=="") {
            $url=$this->url;
        }

        $url = @parse_url($url);

        if ( ! $url) {


            return false;
        }

        $url = array_map('trim', $url);
        $url['port'] = (!isset($url['port'])) ? 80 : (int)$url['port'];
        $path = (isset($url['path'])) ? $url['path'] : '';

        if ($path == '') {
            $path = '/';
        }

        $path .= ( isset ( $url['query'] ) ) ? "?$url[query]" : '';



        if ( isset ( $url['host'] ) AND $url['host'] != gethostbyname ( $url['host'] ) ) {
            if ( PHP_VERSION >= 5 ) {
                $headers = get_headers("$url[scheme]://$url[host]:$url[port]$path");
            }
            else {
                $fp = fsockopen($url['host'], $url['port'], $errno, $errstr, 30);

                if ( ! $fp ) {
                    return false;
                }
                fputs($fp, "HEAD $path HTTP/1.1\r\nHost: $url[host]\r\n\r\n");
                $headers = fread ( $fp, 128 );
                fclose ( $fp );
            }
            $headers = ( is_array ( $headers ) ) ? implode ( "\n", $headers ) : $headers;
            return ( bool ) preg_match ( '#^HTTP/.*\s+[(200|301|302)]+\s#i', $headers );
        }

        return false;
    }

Solution 13 - Php

Inspired in this .NET StackOverflow question and in this referenced article from that question there is this URI validator (URI means it validates both URL and URN).

if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
{
    throw new \RuntimeException( "URI has not a valid format." );
}

I have successfully unit-tested this function inside a ValueObject I made named Uri and tested by UriTest.

UriTest.php (Contains valid and invalid cases for both URLs and URNs)

<?php

declare( strict_types = 1 );

namespace XaviMontero\ThrasherPortage\Tests\Tour;

use XaviMontero\ThrasherPortage\Tour\Uri;

class UriTest extends \PHPUnit_Framework_TestCase
{
    private $sut;

    public function testCreationIsOfProperClassWhenUriIsValid()
    {
        $sut = new Uri( 'http://example.com' );
        $this->assertInstanceOf( 'XaviMontero\\ThrasherPortage\\Tour\\Uri', $sut );
    }

    /**
     * @dataProvider urlIsValidProvider
     * @dataProvider urnIsValidProvider
     */
    public function testGetUriAsStringWhenUriIsValid( string $uri )
    {
        $sut = new Uri( $uri );
        $actual = $sut->getUriAsString();

        $this->assertInternalType( 'string', $actual );
        $this->assertEquals( $uri, $actual );
    }

    public function urlIsValidProvider()
    {
        return
            [
                [ 'http://example-server' ],
                [ 'http://example.com' ],
                [ 'http://example.com/' ],
                [ 'http://subdomain.example.com/path/?parameter1=value1&parameter2=value2' ],
                [ 'random-protocol://example.com' ],
                [ 'http://example.com:80' ],
                [ 'http://example.com?no-path-separator' ],
                [ 'http://example.com/pa%20th/' ],
                [ 'ftp://example.org/resource.txt' ],
                [ 'file://../../../relative/path/needs/protocol/resource.txt' ],
                [ 'http://example.com/#one-fragment' ],
                [ 'http://example.edu:8080#one-fragment' ],
            ];
    }

    public function urnIsValidProvider()
    {
        return
            [
                [ 'urn:isbn:0-486-27557-4' ],
                [ 'urn:example:mammal:monotreme:echidna' ],
                [ 'urn:mpeg:mpeg7:schema:2001' ],
                [ 'urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
                [ 'rare-urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
                [ 'urn:FOO:a123,456' ]
            ];
    }

    /**
     * @dataProvider urlIsNotValidProvider
     * @dataProvider urnIsNotValidProvider
     */
    public function testCreationThrowsExceptionWhenUriIsNotValid( string $uri )
    {
        $this->expectException( 'RuntimeException' );
        $this->sut = new Uri( $uri );
    }

    public function urlIsNotValidProvider()
    {
        return
            [
                [ 'only-text' ],
                [ 'http//missing.colon.example.com/path/?parameter1=value1&parameter2=value2' ],
                [ 'missing.protocol.example.com/path/' ],
                [ 'http://example.com\\bad-separator' ],
                [ 'http://example.com|bad-separator' ],
                [ 'ht tp://example.com' ],
                [ 'http://exampl e.com' ],
                [ 'http://example.com/pa th/' ],
                [ '../../../relative/path/needs/protocol/resource.txt' ],
                [ 'http://example.com/#two-fragments#not-allowed' ],
                [ 'http://example.edu:portMustBeANumber#one-fragment' ],
            ];
    }

    public function urnIsNotValidProvider()
    {
        return
            [
                [ 'urn:mpeg:mpeg7:sch ema:2001' ],
                [ 'urn|mpeg:mpeg7:schema:2001' ],
                [ 'urn?mpeg:mpeg7:schema:2001' ],
                [ 'urn%mpeg:mpeg7:schema:2001' ],
                [ 'urn#mpeg:mpeg7:schema:2001' ],
            ];
    }
}

Uri.php (Value Object)

<?php

declare( strict_types = 1 );

namespace XaviMontero\ThrasherPortage\Tour;

class Uri
{
    /** @var string */
    private $uri;

    public function __construct( string $uri )
    {
        $this->assertUriIsCorrect( $uri );
        $this->uri = $uri;
    }

    public function getUriAsString()
    {
        return $this->uri;
    }

    private function assertUriIsCorrect( string $uri )
    {
        // https://stackoverflow.com/questions/30847/regex-to-validate-uris
        // http://snipplr.com/view/6889/regular-expressions-for-uri-validationparsing/

        if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
        {
            throw new \RuntimeException( "URI has not a valid format." );
        }
    }
}

Running UnitTests

There are 65 assertions in 46 tests. Caution: there are 2 data-providers for valid and 2 more for invalid expressions. One is for URLs and the other for URNs. If you are using a version of PhpUnit of v5.6* or earlier then you need to join the two data providers into a single one.

xavi@bromo:~/custom_www/hello-trip/mutant-migrant$ vendor/bin/phpunit
PHPUnit 5.7.3 by Sebastian Bergmann and contributors.

..............................................                    46 / 46 (100%)

Time: 82 ms, Memory: 4.00MB

OK (46 tests, 65 assertions)

Code coverage

There's is 100% of code-coverage in this sample URI checker.

Solution 14 - Php

"/(http(s?):\/\/)([a-z0-9\-]+\.)+[a-z]{2,4}(\.[a-z]{2,4})*(\/[^ ]+)*/i"
  1. (http(s?)://) means http:// or https://

  2. ([a-z0-9-]+.)+ => 2.0[a-z0-9-] means any a-z character or any 0-9 or (-)sign)

                  2.1 (+) means the character can be one or more ex: a1w, 
                      a9-,c559s, f)
                      
                  2.2 \. is (.)sign
                  
                  2.3. the (+) sign after ([a-z0-9\-]+\.) mean do 2.1,2.2,2.3 
                     at least 1 time 
                   ex: abc.defgh0.ig, aa.b.ced.f.gh. also in case www.yyy.com
    
                  3.[a-z]{2,4} mean a-z at least 2 character but not more than 
                               4 characters for check that there will not be 
                               the case 
                               ex: https://www.google.co.kr.asdsdagfsdfsf
                  
                  4.(\.[a-z]{2,4})*(\/[^ ]+)* mean 
                    
                    4.1 \.[a-z]{2,4} means like number 3 but start with 
                        (.)sign 
    
                    4.2 * means (\.[a-z]{2,4})can be use or not use never mind
                     
                    4.3 \/ means \
                    4.4 [^ ] means any character except blank
                    4.5 (+) means do 4.3,4.4,4.5 at least 1 times
                    4.6 (*) after (\/[^ ]+) mean use 4.3 - 4.5 or not use 
                        no problem
    
                    use for case https://stackoverflow.com/posts/51441301/edit
    
                    5. when you use regex write in "/ /" so it come
    

    "/(http(s?)://)([a-z0-9-]+.)+[a-z]{2,4}(.[a-z]{2,4})(/[^ ]+)/i"

                    6. almost forgot: letter i on the back mean ignore case of 
                       Big letter or small letter ex: A same as a, SoRRy same 
                       as sorry.
    

Note : Sorry for bad English. My country not use it well.

Solution 15 - Php

OK, so this is a little bit more complex then a simple regex, but it allows for different types of urls.

Examples:

All which should be marked as valid.

function is_valid_url($url) {
    // First check: is the url just a domain name? (allow a slash at the end)
    $_domain_regex = "|^[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*(\.[A-Za-z]{2,})/?$|";
    if (preg_match($_domain_regex, $url)) {
        return true;
    }

    // Second: Check if it's a url with a scheme and all
    $_regex = '#^([a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))$#';
    if (preg_match($_regex, $url, $matches)) {
        // pull out the domain name, and make sure that the domain is valid.
        $_parts = parse_url($url);
        if (!in_array($_parts['scheme'], array( 'http', 'https' )))
            return false;

        // Check the domain using the regex, stops domains like "-example.com" passing through
        if (!preg_match($_domain_regex, $_parts['host']))
            return false;

        // This domain looks pretty valid. Only way to check it now is to download it!
        return true;
    }

    return false;
}

Note that there is a in_array check for the protocols that you want to allow (currently only http and https are in that list).

var_dump(is_valid_url('google.com'));         // true
var_dump(is_valid_url('google.com/'));        // true
var_dump(is_valid_url('http://google.com'));  // true
var_dump(is_valid_url('http://google.com/')); // true
var_dump(is_valid_url('https://google.com')); // true

Solution 16 - Php

For anyone developing with WordPress, just use

esc_url_raw($url) === $url

to validate a URL (here's WordPress' documentation on esc_url_raw). It handles URLs much better than filter_var($url, FILTER_VALIDATE_URL) because it is unicode and XSS-safe. (Here is a good article mentioning all the problems with filter_var).

Solution 17 - Php

Peter's Regex doesn't look right to me for many reasons. It allows all kinds of special characters in the domain name and doesn't test for much.

Frankie's function looks good to me and you can build a good regex from the components if you don't want a function, like so:

^(http://|https://)(([a-z0-9]([-a-z0-9]*[a-z0-9]+)?){1,63}\.)+[a-z]{2,6}

Untested but I think that should work.

Also, Owen's answer doesn't look 100% either. I took the domain part of the regex and tested it on a Regex tester tool http://erik.eae.net/playground/regexp/regexp.html

I put the following line:

(\S*?\.\S*?)

in the "regexp" section and the following line:

> -hello.com

under the "sample text" section.

The result allowed the minus character through. Because \S means any non-space character.

Note the regex from Frankie handles the minus because it has this part for the first character:

[a-z0-9]

Which won't allow the minus or any other special character.

Solution 18 - Php

Here is the way I did it. But I want to mentoin that I am not so shure about the regex. But It should work thou :)

$pattern = "#((http|https)://(\S*?\.\S*?))(\s|\;|\)|\]|\[|\{|\}|,|”|\"|'|:|\<|$|\.\s)#i";
		$text = preg_replace_callback($pattern,function($m){
				return "<a href=\"$m[1]\" target=\"_blank\">$m[1]</a>$m[4]";
			},
			$text);

This way you won't need the eval marker on your pattern.

Hope it helps :)

Solution 19 - Php

Here's a simple class for URL Validation using RegEx and then cross-references the domain against popular RBL (Realtime Blackhole Lists) servers:

Install:

require 'URLValidation.php';

Usage:

require 'URLValidation.php';
$urlVal = new UrlValidation(); //Create Object Instance

Add a URL as the parameter of the domain() method and check the the return.

$urlArray = ['http://www.bokranzr.com/test.php?test=foo&test=dfdf', 'https://en-gb.facebook.com', 'https://www.google.com'];
foreach ($urlArray as $k=>$v) {

    echo var_dump($urlVal->domain($v)) . ' URL: ' . $v . '<br>';

}

Output:

bool(false) URL: http://www.bokranzr.com/test.php?test=foo&test=dfdf
bool(true) URL: https://en-gb.facebook.com
bool(true) URL: https://www.google.com

As you can see above, www.bokranzr.com is listed as malicious website via an RBL so the domain was returned as false.

Solution 20 - Php

I've found this to be the most useful for matching a URL..

^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$

Solution 21 - Php

There is a PHP native function for that:

$url = 'http://www.yoururl.co.uk/sub1/sub2/?param=1&param2/';

if ( ! filter_var( $url, FILTER_VALIDATE_URL ) ) {
    // Wrong
}
else {
    // Valid
}

Returns the filtered data, or FALSE if the filter fails.

Check it here

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAndreLiemView Question on Stackoverflow
Solution 1 - PhpStanislavView Answer on Stackoverflow
Solution 2 - PhpOwenView Answer on Stackoverflow
Solution 3 - PhpcatchdaveView Answer on Stackoverflow
Solution 4 - PhpabhiomkarView Answer on Stackoverflow
Solution 5 - PhpRogerView Answer on Stackoverflow
Solution 6 - PhppromatyView Answer on Stackoverflow
Solution 7 - PhpPeter BaileyView Answer on Stackoverflow
Solution 8 - PhpVikash KumarView Answer on Stackoverflow
Solution 9 - PhpFred VanelliView Answer on Stackoverflow
Solution 10 - PhpFrankieView Answer on Stackoverflow
Solution 11 - PhpGeorge MilonasView Answer on Stackoverflow
Solution 12 - PhpjiniView Answer on Stackoverflow
Solution 13 - PhpXavi MonteroView Answer on Stackoverflow
Solution 14 - PhpSome_North_korea_kidView Answer on Stackoverflow
Solution 15 - PhpTim GroeneveldView Answer on Stackoverflow
Solution 16 - PhpthespacecamelView Answer on Stackoverflow
Solution 17 - PhpjoedevonView Answer on Stackoverflow
Solution 18 - PhpThomas VenturiniView Answer on Stackoverflow
Solution 19 - PhpKitson88View Answer on Stackoverflow
Solution 20 - PhpJeremy MooreView Answer on Stackoverflow
Solution 21 - PhpFredmatView Answer on Stackoverflow