Parsing domain from a URL

Php

Php Problem Overview


I need to build a function which parses the domain from a URL.

So, with

http://google.com/dhasjkdas/sadsdds/sdda/sdads.html

or

http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html

it should return google.com

with

http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html

it should return google.co.uk.

Php Solutions


Solution 1 - Php

Check out parse_url():

$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
echo $parse['host']; // prints 'google.com'

parse_url doesn't handle really badly mangled urls very well, but is fine if you generally expect decent urls.

Solution 2 - Php

$domain = str_ireplace('www.', '', parse_url($url, PHP_URL_HOST));

This would return the google.com for both http://google.com/... and http://www.google.com/...

Solution 3 - Php

From http://us3.php.net/manual/en/function.parse-url.php#93983

> for some odd reason, parse_url > returns the host (ex. example.com) as > the path when no scheme is provided in > the input url. So I've written a quick > function to get the real host:

function getHost($Address) { 
   $parseUrl = parse_url(trim($Address)); 
   return trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2))); 
} 

getHost("example.com"); // Gives example.com 
getHost("http://example.com"); // Gives example.com 
getHost("www.example.com"); // Gives www.example.com 
getHost("http://example.com/xyz"); // Gives example.com 

Solution 4 - Php

function get_domain($url = SITE_URL)
{
    preg_match("/[a-z0-9\-]{1,63}\.[a-z\.]{2,6}$/", parse_url($url, PHP_URL_HOST), $_domain_tld);
    return $_domain_tld[0];
}

get_domain('http://www.cdl.gr'); //cdl.gr
get_domain('http://cdl.gr'); //cdl.gr
get_domain('http://www2.cdl.gr'); //cdl.gr

Solution 5 - Php

The code that was meant to work 100% didn't seem to cut it for me, I did patch the example a little but found code that wasn't helping and problems with it. so I changed it out to a couple of functions (to save asking for the list from Mozilla all the time, and removing the cache system). This has been tested against a set of 1000 URLs and seemed to work.

function domain($url)
{
	global $subtlds;
    $slds = "";
    $url = strtolower($url);
	
    $host = parse_url('http://'.$url,PHP_URL_HOST);

    preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
    foreach($subtlds as $sub){
		if (preg_match('/\.'.preg_quote($sub).'$/', $host, $xyz)){
	        preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
		}
    }

    return @$matches[0];
}

function get_tlds() {
	$address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
    $content = file($address);
	foreach ($content as $num => $line) {
        $line = trim($line);
        if($line == '') continue;
        if(@substr($line[0], 0, 2) == '/') continue;
        $line = @preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
        if($line == '') continue;  //$line = '.'.$line;
        if(@$line[0] == '.') $line = substr($line, 1);
        if(!strstr($line, '.')) continue;
        $subtlds[] = $line;
        //echo "{$num}: '{$line}'"; echo "<br>";
	}
    
	$subtlds = array_merge(array(
            'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk', 
            'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
            'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au'
        ), $subtlds);
 
    $subtlds = array_unique($subtlds);

	return $subtlds;	
}

Then use it like

$subtlds = get_tlds();
echo domain('www.example.com') //outputs: example.com
echo domain('www.example.uk.com') //outputs: example.uk.com
echo domain('www.example.fr') //outputs: example.fr

I know I should have turned this into a class, but didn't have time.

Solution 6 - Php

If you want extract host from string http://google.com/dhasjkdas/sadsdds/sdda/sdads.html, usage of parse_url() is acceptable solution for you.

But if you want extract domain or its parts, you need package that using Public Suffix List. Yes, you can use string functions arround parse_url(), but it will produce incorrect results sometimes.

I recomend TLDExtract for domain parsing, here is sample code that show diff:

$extract = new LayerShifter\TLDExtract\Extract();

# For 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'

$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';

parse_url($url, PHP_URL_HOST); // will return google.com

$result = $extract->parse($url);
$result->getFullHost(); // will return 'google.com'
$result->getRegistrableDomain(); // will return 'google.com'
$result->getSuffix(); // will return 'com'

# For 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html'

$url = 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html';

parse_url($url, PHP_URL_HOST); // will return 'search.google.com'

$result = $extract->parse($url);
$result->getFullHost(); // will return 'search.google.com'
$result->getRegistrableDomain(); // will return 'google.com'

Solution 7 - Php

Please consider replacring the accepted solution with the following:

parse_url() will always include any sub-domain(s), so this function doesn't parse domain names very well. Here are some examples:

$url = 'http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
echo $parse['host']; // prints 'www.google.com'

echo parse_url('https://subdomain.example.com/foo/bar', PHP_URL_HOST);
// Output: subdomain.example.com

echo parse_url('https://subdomain.example.co.uk/foo/bar', PHP_URL_HOST);
// Output: subdomain.example.co.uk

Instead, you may consider this pragmatic solution. It will cover many, but not all domain names -- for instance, lower-level domains such as 'sos.state.oh.us' are not covered.

function getDomain($url) {
    $host = parse_url($url, PHP_URL_HOST);

    if(filter_var($host,FILTER_VALIDATE_IP)) {
        // IP address returned as domain
        return $host; //* or replace with null if you don't want an IP back
    }
        
	$domain_array = explode(".", str_replace('www.', '', $host));
	$count = count($domain_array);
	if( $count>=3 && strlen($domain_array[$count-2])==2 ) {
		// SLD (example.co.uk)
		return implode('.', array_splice($domain_array, $count-3,3));
	} else if( $count>=2 ) {
		// TLD (example.com)
		return implode('.', array_splice($domain_array, $count-2,2));
	}
}

// Your domains
	echo getDomain('http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
	echo getDomain('http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
	echo getDomain('http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html'); // google.co.uk

// TLD
	echo getDomain('https://shop.example.com'); // example.com
	echo getDomain('https://foo.bar.example.com'); // example.com
	echo getDomain('https://www.example.com'); // example.com
	echo getDomain('https://example.com'); // example.com

// SLD
	echo getDomain('https://more.news.bbc.co.uk'); // bbc.co.uk
	echo getDomain('https://www.bbc.co.uk'); // bbc.co.uk
	echo getDomain('https://bbc.co.uk'); // bbc.co.uk

// IP
    echo getDomain('https://1.2.3.45');  // 1.2.3.45

Finally, Jeremy Kendall's PHP Domain Parser allows you to parse the domain name from a url. League URI Hostname Parser will also do the job.

Solution 8 - Php

Here is the code i made that 100% finds only the domain name, since it takes mozilla sub tlds to account. Only thing you have to check is how you make cache of that file, so you dont query mozilla every time.

For some strange reason, domains like co.uk are not in the list, so you have to make some hacking and add them manually. Its not cleanest solution but i hope it helps someone.

//=====================================================
static function domain($url)
{
	$slds = "";
	$url = strtolower($url);

			$address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
	if(!$subtlds = @kohana::cache('subtlds', null, 60)) 
	{
		$content = file($address);
		foreach($content as $num => $line)
		{
			$line = trim($line);
			if($line == '') continue;
			if(@substr($line[0], 0, 2) == '/') continue;
			$line = @preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
			if($line == '') continue;  //$line = '.'.$line;
			if(@$line[0] == '.') $line = substr($line, 1);
			if(!strstr($line, '.')) continue;
			$subtlds[] = $line;
			//echo "{$num}: '{$line}'"; echo "<br>";
		}
		$subtlds = array_merge(Array(
			'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk', 
			'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
			'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au',
			),$subtlds);
				
		$subtlds = array_unique($subtlds);
		//echo var_dump($subtlds);
		@kohana::cache('subtlds', $subtlds);
	}

	
	preg_match('/^(http:[\/]{2,})?([^\/]+)/i', $url, $matches);
	//preg_match("/^(http:\/\/|https:\/\/|)[a-zA-Z-]([^\/]+)/i", $url, $matches);
	$host = @$matches[2];
	//echo var_dump($matches);

	preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
	foreach($subtlds as $sub) 
	{
		if (preg_match("/{$sub}$/", $host, $xyz))
		preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
	}
			
	return @$matches[0];
}

Solution 9 - Php

You can pass PHP_URL_HOST into parse_url function as second parameter

$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$host = parse_url($url, PHP_URL_HOST);
print $host; // prints 'google.com'

Solution 10 - Php

I've found that @philfreo's solution (referenced from php.net) is pretty well to get fine result but in some cases it shows php's "notice" and "Strict Standards" message. Here a fixed version of this code.

function getHost($url) { 
   $parseUrl = parse_url(trim($url)); 
   if(isset($parseUrl['host']))
   {
	   $host = $parseUrl['host'];
   }
   else
   {
		$path = explode('/', $parseUrl['path']);
		$host = $path[0];
   }
   return trim($host); 
} 
  
echo getHost("http://example.com/anything.html"); 			// example.com
echo getHost("http://www.example.net/directory/post.php");	// www.example.net
echo getHost("https://example.co.uk");						// example.co.uk
echo getHost("www.example.net");							// example.net
echo getHost("subdomain.example.net/anything");				// subdomain.example.net
echo getHost("example.net");								// example.net

Solution 11 - Php

function getTrimmedUrl($link)
{
    $str = str_replace(["www.","https://","http://"],[''],$link);
    $link = explode("/",$str);
    return strtolower($link[0]);                
}

Solution 12 - Php

$domain = parse_url($url, PHP_URL_HOST);
echo implode('.', array_slice(explode('.', $domain), -2, 2))

Solution 13 - Php

parse_url didn't work for me. It only returned the path. Switching to basics using php5.3+:

$url  = str_replace('http://', '', strtolower( $s->website));
if (strpos($url, '/'))  $url = strstr($url, '/', true);

Solution 14 - Php

I have edited for you:

function getHost($Address) { 
    $parseUrl = parse_url(trim($Address));
    $host = trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2))); 

    $parts = explode( '.', $host );
    $num_parts = count($parts);

    if ($parts[0] == "www") {
	    for ($i=1; $i < $num_parts; $i++) { 
		    $h .= $parts[$i] . '.';
	    }
    }else {
	    for ($i=0; $i < $num_parts; $i++) { 
		    $h .= $parts[$i] . '.';
	    }
    }
    return substr($h,0,-1);
}

All type url (www.domain.ltd, sub1.subn.domain.ltd will result to : domain.ltd.

Solution 15 - Php

I'm adding this answer late since this is the answer that pops up most on Google...

You can use PHP to...

$url = "www.google.co.uk";
$host = parse_url($url, PHP_URL_HOST);
// $host == "www.google.co.uk"

to grab the host but not the private domain to which the host refers. (Example www.google.co.uk is the host, but google.co.uk is the private domain)

To grab the private domain, you must need know the list of public suffixes to which one can register a private domain. This list happens to be curated by Mozilla at https://publicsuffix.org/

The below code works when an array of public suffixes has been created already. Simply call

$domain = get_private_domain("www.google.co.uk");

with the remaining code...

// find some way to parse the above list of public suffix
// then add them to a PHP array
$suffix = [... all valid public suffix ...];

function get_public_suffix($host) {
  $parts = split("\.", $host);
  while (count($parts) > 0) {
    if (is_public_suffix(join(".", $parts)))
      return join(".", $parts);

    array_shift($parts);
  }

  return false;
}

function is_public_suffix($host) {
  global $suffix;
  return isset($suffix[$host]);
}

function get_private_domain($host) {
  $public = get_public_suffix($host);
  $public_parts = split("\.", $public);
  $all_parts = split("\.", $host);

  $private = [];

  for ($x = 0; $x < count($public_parts); ++$x) 
    $private[] = array_pop($all_parts);

  if (count($all_parts) > 0)
    $private[] = array_pop($all_parts);

  return join(".", array_reverse($private));
}

Solution 16 - Php

This will generally work very well if the input URL is not total junk. It removes the subdomain.

$host = parse_url( $Row->url, PHP_URL_HOST );
$parts = explode( '.', $host );
$parts = array_reverse( $parts );
$domain = $parts[1].'.'.$parts[0];

Example

Input: http://www2.website.com:8080/some/file/structure?some=parameters

Output: website.com

Solution 17 - Php

Combining the answers of worldofjr and Alix Axel into one small function that will handle most use-cases:

function get_url_hostname($url) {

    $parse = parse_url($url);
    return str_ireplace('www.', '', $parse['host']);

}

get_url_hostname('http://www.google.com/example/path/file.html'); // google.com

Solution 18 - Php

None of this solutions worked for me when I use this test cases:

public function getTestCases(): array
{
    return [        //input                              expected        ['http://google.com/dhasjkdas',      'google.com'],
        ['https://google.com/dhasjkdas',     'google.com'],
        ['https://www.google.com/dhasjkdas', 'google.com'],
        ['http://www.google.com/dhasjkdas',  'google.com'],
        ['www.google.com/dhasjkdas',         'google.com'],
        ['google.com/dhasjkdas',             'google.com'],
    ];
}

but wrapping this answer into function worked in all cases: https://stackoverflow.com/a/65659814/5884988

Solution 19 - Php

Just use as like following ...

<?php
   echo $_SERVER['SERVER_NAME'];
?>

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionzuk1View Question on Stackoverflow
Solution 1 - PhpOwenView Answer on Stackoverflow
Solution 2 - PhpAlix AxelView Answer on Stackoverflow
Solution 3 - PhpphilfreoView Answer on Stackoverflow
Solution 4 - PhpnikmauroView Answer on Stackoverflow
Solution 5 - PhpShaunView Answer on Stackoverflow
Solution 6 - PhpOleksandr FediashovView Answer on Stackoverflow
Solution 7 - PhpKristoffer BohmannView Answer on Stackoverflow
Solution 8 - PhpLukaView Answer on Stackoverflow
Solution 9 - PhpOleg MateiView Answer on Stackoverflow
Solution 10 - PhpfatihView Answer on Stackoverflow
Solution 11 - Phprk3263025View Answer on Stackoverflow
Solution 12 - PhpMichaelView Answer on Stackoverflow
Solution 13 - PhpWillView Answer on Stackoverflow
Solution 14 - PhpNotFound LifeView Answer on Stackoverflow
Solution 15 - PhpAndy JonesView Answer on Stackoverflow
Solution 16 - PhpT. Brian JonesView Answer on Stackoverflow
Solution 17 - PhpMichael Giovanni PumoView Answer on Stackoverflow
Solution 18 - PhpRawburnerView Answer on Stackoverflow
Solution 19 - PhpMd. Maruf HossainView Answer on Stackoverflow