How to extract domain name from url?

Regex Problem Overview

How do I extract the domain name from a url using bash? like: http://example.com/ to example.com must work for any tld, not just .com

Regex Solutions

Solution 1 - Regex

You can use simple AWK way to extract the domain name as follows:

echo http://example.com/index.php | awk -F[/:] '{print $4}'

OUTPUT: example.com

:-)

Solution 2 - Regex

$ URI="http://user:[email protected]:80/"
$ echo $URI | sed -e 's/[^/]*\/\/\([^@]*@\)\?\([^:/]*\).*/\2/'
example.com

see [http://en.wikipedia.org/wiki/URI_scheme][1]

[1]: http://en.wikipedia.org/wiki/URI_scheme "URI scheme"

Solution 3 - Regex

basename "http://example.com"

Now of course, this won't work with a URI like this: http://www.example.com/index.html but you could do the following:

basename $(dirname "http://www.example.com/index.html")

Or for more complex URIs:

echo "http://www.example.com/somedir/someotherdir/index.html" | cut -d'/' -f3

-d means "delimiter" and -f means "field"; in the above example, the third field delimited by the forward slash '/' is www.example.com.

Solution 4 - Regex

echo $URL | cut -d'/' -f3 | cut -d':' -f1

Works for URLs:

http://host.example.com
http://host.example.com/hi/there
http://host.example.com:2345/hi/there
http://host.example.com:2345

Solution 5 - Regex

sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_'

e.g.

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'https://example.com'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com:1234/some/path'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:[email protected]:1234/some/path'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:[email protected]:1234/some/path#fragment'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:[email protected]:1234/some/path#fragment?params=true'
example.com

Solution 6 - Regex

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];

if($url =~ /([^:]*:\/\/)?([^\/]+\.[^\/]+)/g) {
  print $2;
}

Usage:

./test.pl 'https://example.com'
example.com

./test.pl 'https://www.example.com/'
www.example.com

./test.pl 'example.org/'
example.org

 ./test.pl 'example.org'
example.org

./test.pl 'example'  -> no output

And if you just want the domain and not the full host + domain use this instead:

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+\.[^\/]+)/g) {
  print $3;
}

Solution 7 - Regex

Instead of using regex to do this you can use python's urlparse:

 URL=http://www.example.com

 python -c "from urlparse import urlparse
 url = urlparse('$URL')
 print url.netloc"

You could either use it like this or put it in a small script. However this still expects a valid scheme identifier, looking at your comment your input doesn't necessarily provide one. You can specify a default scheme, but urlparse expects the netloc to start with '//' :

url = urlparse('//www.example.com/index.html','http')

So you will have to prepend those manually, i.e:

 python -c "from urlparse import urlparse
 if '$URL'.find('://') == -1 then:
   url = urlparse('//$URL','http')
 else:
   url = urlparse('$URL')
 print url.netloc"

Solution 8 - Regex

there is so little info on how you get those urls...please show more info next time. are there parameters in the url etc etc... Meanwhile, just simple string manipulation for your sample url

$ s="http://example.com/index.php"
$ echo ${s/%/*}  #get rid of last "/" onwards
http://example.com
$ s=${s/%\//}  
$ echo ${s/#http:\/\//} # get rid of http://
example.com

other ways, using sed(GNU)

$ echo $s | sed 's/http:\/\///;s|\/.*||'
example.com

use awk

$ echo $s| awk '{gsub("http://|/.*","")}1'
example.com

Solution 9 - Regex

The following will output "example.com":

URI="http://[email protected]/foo/bar/baz/?lala=foo" 
ruby -ruri -e "p URI.parse('$URI').host"

For more info on what you can do with Ruby's URI class you'd have to consult the docs.

Solution 10 - Regex

Here's the node.js way, it works with or without ports and deep paths:

//get-hostname.js
'use strict';

const url = require('url');
const parts = url.parse(process.argv[2]);

console.log(parts.hostname);

Can be called like:

node get-hostname.js http://foo.example.com:8080/test/1/2/3.html
//foo.example.com

Docs: https://nodejs.org/api/url.html

Solution 11 - Regex

One solution that would cover for more cases would be based on sed regexps:

echo http://example.com/index.php | sed -e 's#^https://\|^http://##' -e 's#:.*##' -e 's#/.*##'

That would work for URLs like: http://example.com/index.php, http://example.com:4040/index.php, https://example.com/index.php

Solution 12 - Regex

With Ruby you can use the Domainatrix library / gem

http://www.pauldix.net/2009/12/parse-domains-from-urls-easily-with-domainatrix.html

require 'rubygems'
require 'domainatrix'
s = 'http://www.champa.kku.ac.th/dir1/dir2/file?option1&option2';
url = Domainatrix.parse(s)
url.domain
=> "kku"

great tool! :-)

Solution 13 - Regex

Pure Bash implementation without any sub-shell or sub-process:

# Extract host from an URL
#   $1: URL
function extractHost {
    local s="$1"
    s="${s/#*:\/\/}" # Parameter Expansion & Pattern Matching
    echo -n "${s/%+(:*|\/*)}"
}

E.g. extractHost "docker://1.2.3.4:1234/a/v/c" will output 1.2.3.4

Content Type	Original Author	Original Content on Stackoverflow
Question	Ben Smith	View Question on Stackoverflow
Solution 1 - Regex	Soj	View Answer on Stackoverflow
Solution 2 - Regex	user300653	View Answer on Stackoverflow
Solution 3 - Regex	musashiXXX	View Answer on Stackoverflow
Solution 4 - Regex	keyoxy	View Answer on Stackoverflow
Solution 5 - Regex	Armand	View Answer on Stackoverflow
Solution 6 - Regex	Dark Castle	View Answer on Stackoverflow
Solution 7 - Regex	Garns	View Answer on Stackoverflow
Solution 8 - Regex	ghostdog74	View Answer on Stackoverflow
Solution 9 - Regex	Michael Kohl	View Answer on Stackoverflow
Solution 10 - Regex	chovy	View Answer on Stackoverflow
Solution 11 - Regex	user3837712	View Answer on Stackoverflow
Solution 12 - Regex	Tilo	View Answer on Stackoverflow
Solution 13 - Regex	vbem	View Answer on Stackoverflow