Best way to check if a URL is valid
PhpPhp Problem Overview
I want to use PHP to check, if string stored in $myoutput
variable contains a valid link syntax or is it just a normal text. The function or solution, that I'm looking for, should recognize all links formats including the ones with GET parameters.
A solution, suggested on many sites, to actually query string (using CURL or file_get_contents()
function) is not possible in my case and I would like to avoid it.
I thought about regular expressions or another solution.
Php Solutions
Solution 1 - Php
You can use a native Filter Validator
filter_var($url, FILTER_VALIDATE_URL);
> Validates value as URL (according to » http://www.faqs.org/rfcs/rfc2396), optionally with required components. Beware a valid URL may not specify the HTTP protocol http:// so further validation may be required to determine the URL uses an expected protocol, e.g. ssh:// or mailto:. Note that the function will only find ASCII URLs to be valid; internationalized domain names (containing non-ASCII characters) will fail.
Example:
if (filter_var($url, FILTER_VALIDATE_URL) === FALSE) {
die('Not a valid URL');
}
Solution 2 - Php
Here is the best tutorial I found over there:
http://www.w3schools.com/php/filter_validate_url.asp
<?php
$url = "http://www.qbaki.com";
// Remove all illegal characters from a url
$url = filter_var($url, FILTER_SANITIZE_URL);
// Validate url
if (filter_var($url, FILTER_VALIDATE_URL) !== false) {
echo("$url is a valid URL");
} else {
echo("$url is not a valid URL");
}
?>
Possible flags:
FILTER_FLAG_SCHEME_REQUIRED - URL must be RFC compliant (like http://example)
FILTER_FLAG_HOST_REQUIRED - URL must include host name (like http://www.example.com)
FILTER_FLAG_PATH_REQUIRED - URL must have a path after the domain name (like www.example.com/example1/)
FILTER_FLAG_QUERY_REQUIRED - URL must have a query string (like "example.php?name=Peter&age=37")
Solution 3 - Php
Using filter_var() will fail for urls with non-ascii chars, e.g. (http://pt.wikipedia.org/wiki/Guimarães). The following function encode all non-ascii chars (e.g. http://pt.wikipedia.org/wiki/Guimar%C3%A3es) before calling filter_var().
Hope this helps someone.
<?php
function validate_url($url) {
$path = parse_url($url, PHP_URL_PATH);
$encoded_path = array_map('urlencode', explode('/', $path));
$url = str_replace($path, implode('/', $encoded_path), $url);
return filter_var($url, FILTER_VALIDATE_URL) ? true : false;
}
// example
if(!validate_url("http://somedomain.com/some/path/file1.jpg")) {
echo "NOT A URL";
}
else {
echo "IS A URL";
}
Solution 4 - Php
function is_url($uri){
if(preg_match( '/^(http|https):\\/\\/[a-z0-9_]+([\\-\\.]{1}[a-z_0-9]+)*\\.[_a-z]{2,5}'.'((:[0-9]{1,5})?\\/.*)?$/i' ,$uri)){
return $uri;
}
else{
return false;
}
}
Solution 5 - Php
Personally I would like to use regular expression here. Bellow code perfectly worked for me.
$baseUrl = url('/'); // for my case https://www.xrepeater.com
$posted_url = "home";
// Test with one by one
/*$posted_url = "/home";
$posted_url = "xrepeater.com";
$posted_url = "www.xrepeater.com";
$posted_url = "http://www.xrepeater.com";
$posted_url = "https://www.xrepeater.com";
$posted_url = "https://xrepeater.com/services";
$posted_url = "xrepeater.dev/home/test";
$posted_url = "home/test";*/
$regularExpression = "((https?|ftp)\:\/\/)?"; // SCHEME Check
$regularExpression .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass Check
$regularExpression .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP Check
$regularExpression .= "(\:[0-9]{2,5})?"; // Port Check
$regularExpression .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path Check
$regularExpression .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?"; // GET Query String Check
$regularExpression .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor Check
if(preg_match("/^$regularExpression$/i", $posted_url)) {
if(preg_match("@^http|https://@i",$posted_url)) {
$final_url = preg_replace("@(http://)+@i",'http://',$posted_url);
// return "*** - ***Match : ".$final_url;
}
else {
$final_url = 'http://'.$posted_url;
// return "*** / ***Match : ".$final_url;
}
}
else {
if (substr($posted_url, 0, 1) === '/') {
// return "*** / ***Not Match :".$final_url."<br>".$baseUrl.$posted_url;
$final_url = $baseUrl.$posted_url;
}
else {
// return "*** - ***Not Match :".$posted_url."<br>".$baseUrl."/".$posted_url;
$final_url = $baseUrl."/".$final_url; }
}
Solution 6 - Php
Actually... filter_var($url, FILTER_VALIDATE_URL); doesn't work very well. When you type in a real url, it works but, it only checks for http:// so if you type something like "http://weirtgcyaurbatc";, it will still say it's real.
Solution 7 - Php
You can use this function, but its will return false if website offline.
function isValidUrl($url) {
$url = parse_url($url);
if (!isset($url["host"])) return false;
return !(gethostbyname($url["host"]) == $url["host"]);
}
Solution 8 - Php
Given issues with filter_var() needing http://, I use:
$is_url = filter_var($filename, FILTER_VALIDATE_URL) || array_key_exists('scheme', parse_url($filename));
Solution 9 - Php
Another way to check if given URL is valid is to try to access it, below function will fetch the headers from given URL, this will ensure that URL is valid AND web server is alive:
function is_url($url){
$response = array();
//Check if URL is empty
if(!empty($url)) {
$response = get_headers($url);
}
return (bool)in_array("HTTP/1.1 200 OK", $response, true);
/*Array
(
[0] => HTTP/1.1 200 OK
[Date] => Sat, 29 May 2004 12:28:14 GMT
[Server] => Apache/1.3.27 (Unix) (Red-Hat/Linux)
[Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT
[ETag] => "3f80f-1b6-3e1cb03b"
[Accept-Ranges] => bytes
[Content-Length] => 438
[Connection] => close
[Content-Type] => text/html
)*/
}
Solution 10 - Php
public function testing($Url=''){
$ch = curl_init($Url);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
return ($httpcode >= 200 && $httpcode < 300) ? true : false;
}
Solution 11 - Php
Came across this article from 2012. It takes into account variables that may or may not be just plain URLs.
The author of the article, David Müeller, provides this function that he says, "...could be worth wile [sic]," along with some examples of filter_var
and its shortcomings.
/**
* Modified version of `filter_var`.
*
* @param mixed $url Could be a URL or possibly much more.
* @return bool
*/
function validate_url( $url ) {
$url = trim( $url );
return (
( strpos( $url, 'http://' ) === 0 || strpos( $url, 'https://' ) === 0 ) &&
filter_var(
$url,
FILTER_VALIDATE_URL,
FILTER_FLAG_SCHEME_REQUIRED || FILTER_FLAG_HOST_REQUIRED
) !== false
);
}
Solution 12 - Php
if anyone is interested to use the cURL for validation. You can use the following code.
<?php
public function validationUrl($Url){
if ($Url == NULL){
return $false;
}
$ch = curl_init($Url);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
return ($httpcode >= 200 && $httpcode < 300) ? true : false;
}