How to extract domain name from url?

RegexBashUrl

Regex Problem Overview


How do I extract the domain name from a url using bash? like: http://example.com/ to example.com must work for any tld, not just .com

Regex Solutions


Solution 1 - Regex

You can use simple AWK way to extract the domain name as follows:

echo http://example.com/index.php | awk -F[/:] '{print $4}'

OUTPUT: example.com

:-)

Solution 2 - Regex

$ URI="http://user:[email protected]:80/"
$ echo $URI | sed -e 's/[^/]*\/\/\([^@]*@\)\?\([^:/]*\).*/\2/'
example.com

see [http://en.wikipedia.org/wiki/URI_scheme][1]

[1]: http://en.wikipedia.org/wiki/URI_scheme "URI scheme"

Solution 3 - Regex

basename "http://example.com"

Now of course, this won't work with a URI like this: http://www.example.com/index.html but you could do the following:

basename $(dirname "http://www.example.com/index.html")

Or for more complex URIs:

echo "http://www.example.com/somedir/someotherdir/index.html" | cut -d'/' -f3

-d means "delimiter" and -f means "field"; in the above example, the third field delimited by the forward slash '/' is www.example.com.

Solution 4 - Regex

echo $URL | cut -d'/' -f3 | cut -d':' -f1

Works for URLs:

http://host.example.com
http://host.example.com/hi/there
http://host.example.com:2345/hi/there
http://host.example.com:2345

Solution 5 - Regex

sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_'

e.g.

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'https://example.com'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com:1234/some/path'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:[email protected]:1234/some/path'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:[email protected]:1234/some/path#fragment'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:[email protected]:1234/some/path#fragment?params=true'
example.com

Solution 6 - Regex

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];

if($url =~ /([^:]*:\/\/)?([^\/]+\.[^\/]+)/g) {
  print $2;
}

Usage:

./test.pl 'https://example.com'
example.com

./test.pl 'https://www.example.com/'
www.example.com

./test.pl 'example.org/'
example.org

 ./test.pl 'example.org'
example.org

./test.pl 'example'  -> no output

And if you just want the domain and not the full host + domain use this instead:

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+\.[^\/]+)/g) {
  print $3;
}

Solution 7 - Regex

Instead of using regex to do this you can use python's urlparse:

 URL=http://www.example.com

 python -c "from urlparse import urlparse
 url = urlparse('$URL')
 print url.netloc"

You could either use it like this or put it in a small script. However this still expects a valid scheme identifier, looking at your comment your input doesn't necessarily provide one. You can specify a default scheme, but urlparse expects the netloc to start with '//' :

url = urlparse('//www.example.com/index.html','http')

So you will have to prepend those manually, i.e:

 python -c "from urlparse import urlparse
 if '$URL'.find('://') == -1 then:
   url = urlparse('//$URL','http')
 else:
   url = urlparse('$URL')
 print url.netloc"

Solution 8 - Regex

there is so little info on how you get those urls...please show more info next time. are there parameters in the url etc etc... Meanwhile, just simple string manipulation for your sample url

eg

$ s="http://example.com/index.php"
$ echo ${s/%/*}  #get rid of last "/" onwards
http://example.com
$ s=${s/%\//}  
$ echo ${s/#http:\/\//} # get rid of http://
example.com

other ways, using sed(GNU)

$ echo $s | sed 's/http:\/\///;s|\/.*||'
example.com

use awk

$ echo $s| awk '{gsub("http://|/.*","")}1'
example.com

Solution 9 - Regex

The following will output "example.com":

URI="http://[email protected]/foo/bar/baz/?lala=foo" 
ruby -ruri -e "p URI.parse('$URI').host"

For more info on what you can do with Ruby's URI class you'd have to consult the docs.

Solution 10 - Regex

Here's the node.js way, it works with or without ports and deep paths:

//get-hostname.js
'use strict';

const url = require('url');
const parts = url.parse(process.argv[2]);

console.log(parts.hostname);

Can be called like:

node get-hostname.js http://foo.example.com:8080/test/1/2/3.html
//foo.example.com

Docs: https://nodejs.org/api/url.html

Solution 11 - Regex

One solution that would cover for more cases would be based on sed regexps:

echo http://example.com/index.php | sed -e 's#^https://\|^http://##' -e 's#:.*##' -e 's#/.*##'

That would work for URLs like: http://example.com/index.php, http://example.com:4040/index.php, https://example.com/index.php

Solution 12 - Regex

With Ruby you can use the Domainatrix library / gem

http://www.pauldix.net/2009/12/parse-domains-from-urls-easily-with-domainatrix.html

require 'rubygems'
require 'domainatrix'
s = 'http://www.champa.kku.ac.th/dir1/dir2/file?option1&option2';
url = Domainatrix.parse(s)
url.domain
=> "kku"
great tool! :-)

Solution 13 - Regex

Pure Bash implementation without any sub-shell or sub-process:

# Extract host from an URL
#   $1: URL
function extractHost {
    local s="$1"
    s="${s/#*:\/\/}" # Parameter Expansion & Pattern Matching
    echo -n "${s/%+(:*|\/*)}"
}

E.g. extractHost "docker://1.2.3.4:1234/a/v/c" will output 1.2.3.4

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBen SmithView Question on Stackoverflow
Solution 1 - RegexSojView Answer on Stackoverflow
Solution 2 - Regexuser300653View Answer on Stackoverflow
Solution 3 - RegexmusashiXXXView Answer on Stackoverflow
Solution 4 - RegexkeyoxyView Answer on Stackoverflow
Solution 5 - RegexArmandView Answer on Stackoverflow
Solution 6 - RegexDark CastleView Answer on Stackoverflow
Solution 7 - RegexGarnsView Answer on Stackoverflow
Solution 8 - Regexghostdog74View Answer on Stackoverflow
Solution 9 - RegexMichael KohlView Answer on Stackoverflow
Solution 10 - RegexchovyView Answer on Stackoverflow
Solution 11 - Regexuser3837712View Answer on Stackoverflow
Solution 12 - RegexTiloView Answer on Stackoverflow
Solution 13 - RegexvbemView Answer on Stackoverflow