How do you extract IP addresses from files using a regex in a linux shell?

RegexLinuxBashUnixCommand Line

Regex Problem Overview


How to extract a text part by regexp in linux shell? Lets say, I have a file where in every line is an IP address, but on a different position. What is the simplest way to extract those IP addresses using common unix command-line tools?

Regex Solutions


Solution 1 - Regex

You could use grep to pull them out.

grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' file.txt

Solution 2 - Regex

Most of the examples here will match on 999.999.999.999 which is not technically a valid IP address.

The following will match on only valid IP addresses (including network and broadcast addresses).

grep -E -o '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' file.txt

Omit the -o if you want to see the entire line that matched.

Solution 3 - Regex

This works fine for me in access logs.

cat access_log | egrep -o '([0-9]{1,3}\.){3}[0-9]{1,3}'

Let's break it part by part.

[0-9]{1,3} means one to three occurrences of the range mentioned in []. In this case it is 0-9. so it matches patterns like 10 or 183.

  • Followed by a '.'. We will need to escape this as '.' is a meta character and has special meaning for the shell.

So now we are at patterns like '123.' '12.' etc.

  • This pattern repeats itself three times(with the '.'). So we enclose it in brackets. ([0-9]{1,3}\.){3}

  • And lastly the pattern repeats itself but this time without the '.'. That is why we kept it separately in the 3rd step. [0-9]{1,3}

If the ips are at the beginning of each line as in my case use:

egrep -o '^([0-9]{1,3}\.){3}[0-9]{1,3}'

where '^' is an anchor that tells to search at the start of a line.

Solution 4 - Regex

I usually start with grep, to get the regexp right.

# [multiple failed attempts here]
grep    '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*'                 file  # good?
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file  # good enough

Then I'd try and convert it to sed to filter out the rest of the line. (After reading this thread, you and I aren't going to do that anymore: we're going to use grep -o instead)

sed -ne 's/.*\([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\).*/\1/p  # FAIL

That's when I usually get annoyed with sed for not using the same regexes as anyone else. So I move to perl.

$ perl -nle '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/ and print $&'

Perl's good to know in any case. If you've got a teeny bit of CPAN installed, you can even make it more reliable at little cost:

$ perl -MRegexp::Common=net -nE '/$RE{net}{IPV4}/ and say $&' file(s)

Solution 5 - Regex

I wrote a little script to see my log files better, it's nothing special, but might help a lot of the people who are learning perl. It does DNS lookups on the IP addresses after it extracts them.

Solution 6 - Regex

You can use sed. But if you know perl, that might be easier, and more useful to know in the long run:

perl -n '/(\d+\.\d+\.\d+\.\d+)/ && print "$1\n"' < file

Solution 7 - Regex

You can use some shell helper I made: https://github.com/philpraxis/ipextract

included them here for convenience:

#!/bin/sh
ipextract () 
{ 
egrep --only-matching -E  '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' 
}

ipextractnet ()
{ 
egrep --only-matching -E  '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/[[:digit:]]+' 
}

ipextracttcp ()
{ 
egrep --only-matching -E  '[[:digit:]]+/tcp' 
}

ipextractudp ()
{ 
egrep --only-matching -E  '[[:digit:]]+/udp' 
}

ipextractsctp ()
{ 
egrep --only-matching -E  '[[:digit:]]+/sctp' 
}

ipextractfqdn ()
{ 
egrep --only-matching -E  '[a-zA-Z0-9]+[a-zA-Z0-9\-\.]*\.[a-zA-Z]{2,}' 
}

Load it / source it (when stored in ipextract file) from shell:

> $ . ipextract

Use them:

$ ipextract < /etc/hosts
127.0.0.1
255.255.255.255
$

For some example of real use:

ipextractfqdn < /var/log/snort/alert | sort -u
dmesg | ipextractudp

Solution 8 - Regex

 grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}"

Solution 9 - Regex

For those who want a ready solution for getting IP addresses from apache log and listing occurences of how many times IP address has visited website, use this line:

grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' error.log | sort | uniq -c | sort -nr > occurences.txt

Nice method to ban hackers. Next you can:

  1. Delete lines with less than 20 visits
  2. Using regexp cut till single space so you will have only IP addresses
  3. Using regexp cut 1-3 last numbers of IP addresses so you will have only network addresses
  4. Add deny from and a space at the beginning of each line
  5. Put the result file as .htaccess

Solution 10 - Regex

I'd suggest perl. (\d+.\d+.\d+.\d+) should probably do the trick.

EDIT: Just to make it more like a complete program, you could do something like the following (not tested):

#!/usr/bin/perl -w
use strict;

while (<>) {
    if (/(\d+\.\d+\.\d+\.\d+)/) {
        print "$1\n";
    }
}

This handles one IP per line. If you have more than one IPs per line, you need to use the /g option. man perlretut gives you a more detailed tutorial on regular expressions.

Solution 11 - Regex

You could use awk, as well. Something like ...

awk '{i=1; if (NF > 0) do {if ($i ~ /regexp/) print $i; i++;} while (i <= NF);}' file

May require cleaning. just a quick and dirty response to shows basically how to do it with awk.

Solution 12 - Regex

All of the previous answers have one or more problems. The accepted answer allows ip numbers like 999.999.999.999. The currently second most upvoted answer requires prefixing with 0 such as 127.000.000.001 or 008.008.008.008 instead of 127.0.0.1 or 8.8.8.8. Apama has it almost right, but that expression requires that the ipnumber is the only thing on the line, no leading or trailing space allowed, nor can it select ip's from the middle of a line.

I think the correct regex can be found on http://www.regextester.com/22

So if you want to extract all ip-adresses from a file use:

grep -Eo "(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])" file.txt

If you don't want duplicates use:

grep -Eo "(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])" file.txt | sort | uniq

Please comment if there still are problems in this regex. It easy to find many wrong regex for this problem, I hope this one has no real issues.

Solution 13 - Regex

Everyone here is using really long-handed regular expressions but actually understanding the regex of POSIX will allow you to use a small grep command like this for printing IP addresses.

grep -Eo "(([0-9]{1,3})\.){3}([0-9]{1,3})"

(Side note) This doesn't ignore invalid IPs but it is very simple.

Solution 14 - Regex

I have tried all answers but all of them had one or many problems that I list a few of them.

  1. Some detected 123.456.789.111 as valid IP
  2. Some don't detect 127.0.00.1 as valid IP
  3. Some don't detect IP that start with zero like 08.8.8.8

So here I post a regex that works on all above conditions.

> Note : I have extracted more than 2 millions IP without any problem with following regex.

(?:(?:1\d\d|2[0-5][0-5]|2[0-4]\d|0?[1-9]\d|0?0?\d)\.){3}(?:1\d\d|2[0-5][0-5]|2[0-4]\d|0?[1-9]\d|0?0?\d)

Solution 15 - Regex

I wrote an informative blog article about this topic: How to Extract IPv4 and IPv6 IP Addresses from Plain Text Using Regex.

In the article there's a detailed guide of the most common different patterns for IPs, often required to be extracted and isolated from plain text using regular expressions.
This guide is based on CodVerter's IP Extractor source code tool for handling IP addresses extraction and detection when necessary.

If you wish to validate and capture IPv4 Address this pattern can do the job:

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)[.]){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

or to validate and capture IPv4 Address with Prefix ("slash notation"):

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)[.]){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/[0-9]{1,2})\b

or to capture subnet mask or wildcard mask:

(255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)

or to filter out subnet mask addresses you do it with regex negative lookahead:

\b((?!(255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)))(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)[.]){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

For IPv6 validation you can go to the article link I have added at the top of this answer.
Here is an example for capturing all the common patterns (taken from CodVerter`s IP Extractor Help Sample):

enter image description here

If you wish you can test the IPv4 regex here.

Solution 16 - Regex

The awk example above didn't work for me, and I needed to do it with awk specifically, so I came up with this method:

$ awk '{match($0,/[0-9]{1,3}+\.[0-9]{1,3}+\.[0-9]{1,3}+\.[0-9]{1,3}+/); ip = substr($0,RSTART,RLENGTH); print ip}' your_sample_file.log

You can also just use pipes if you're getting the data from somewhere else. Eg, ipconfig

I also realized the method matches invalid IP addresses.

Here is an extended version that only matches valid IPv4 Addresses:

$ awk 'match($0, /(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/) {print substr($0, RSTART, RLENGTH)}' sample_file.log

Hope it helps someone else.

Solution 17 - Regex

It's a REALLY brute-force type of solution, and I haven't had time to handle things like subnet masks.

Since many awk variants lack backreferences in regex, range notation in regex {n,m}, FPAT ability, an array target for match(), I have to try my best to emulate some of that functionality here.

The regex itself is very basic, and it's very much intentional, since each of candidates that passed through the first layer filter will then be fed into the ip4 validation function to ensure values are in range.

Additionally, I use a second array to handle the duplicate scenario (although it's only de-duped in the ASCII string sense - leading zeros, for now, will show up multiple times for each unique ASCII string representation of it).

I know it's ultra brute-force and unseemly of a solution - there's only so much lemonade I can make out of the lemons I have.

 echo "${bbbbbbbb}" \
 \
  | mawk 'function validIP4(_,__,___) {
    
      __^=__=___=4;—-__
      if(--___!=gsub("[.]","_",_)) { 
          return !___ }
      ++___
      do {
          if ((+_<-_)||(__<+_)||(--___<-___)) {
             _="[|]"
             break 
      } } while (sub("^[^_]+[_]","",_))
      
      return _!="[|]" 

   } BEGIN {   FS = RS = "^$"
            __=(__= (__="[0]*([012]?[0-9])?[0-9][.]")__)__
       sub("...$","",__) 

   } END {
      
     gsub(/[^0-9.]+/,OFS)
     gsub(__,"=&~")
     gsub(/[~][^0-9.=~]+[=]/,"~=")

     gsub(/^[^=~]+[=~]|[=~][^=~]+$/,"")
     split($(_<_),___,"[=~]+")

     for(_ in ___) { 
         if ( ! (____[__=___[_]]++)) { 
         if (validIP4(__)) { 
               print (__)   } } } }' \
                                      \
  | gsort -t'.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n \
  | gcat -n              \
  | rs -t -c$'\n' -C= 0 4 \
  | column -s= -t \
  |  lgp3 5

     1	00.69.84.243        76	23.108.43.3        151	79.127.56.148       226	172.241.192.165
     2	00.71.110.228       77	23.108.43.19       152	80.48.119.28        227	172.245.220.154
     3	00.105.215.18       78	23.108.43.55       153	80.76.60.2          228	175.196.182.58
     4	00.123.2.171        79	23.108.43.94       154	80.244.229.102      229	176.74.9.62
     5	00.123.228.2        80	23.108.43.120      155	81.8.52.78          230	176.214.97.55

     6	00.201.223.164      81	23.108.43.208      156	83.166.241.233      231	177.128.44.131
     7	01.51.106.70        82	23.108.43.244      157	85.25.4.28          232	177.129.53.114
     8	01.144.14.232       83	23.108.75.98       158	85.25.91.156        233	178.88.185.2
     9	01.148.85.50        84	23.108.75.164      159	85.25.91.161        234	180.180.171.123
    10	01.174.10.170       85	23.225.64.59       160	85.25.117.171       235	180.183.15.198

    11	02.64.120.219       86	36.37.177.186      161	85.25.150.32        236	180.250.153.129
    12	02.68.128.214       87	36.94.161.219      162	85.25.201.22        237	181.36.230.242
    13	02.129.196.242      88	37.48.82.87        163	85.195.104.71       238	181.191.141.43
    14	02.134.127.15       89	37.144.180.52      164	85.208.211.163      239	182.253.186.140
    15	03.28.246.130       90	41.65.236.56       165	85.209.149.130      240	185.24.233.208

    16	03.73.194.2         91	41.65.251.86       166	88.119.195.35       241	185.61.152.137
    17	03.80.77.1          92	41.79.65.241       167	91.107.15.221       242	185.74.7.51
    18	03.81.77.194        93	41.161.92.138      168	91.188.246.246      243	185.93.205.236
    19	03.97.200.52        94	41.164.68.42       169	93.184.8.74         244	185.138.114.113
    20	3.120.173.144       95	41.164.68.194      170	94.16.15.100        245	186.3.85.131

    21	03.134.97.233       96	41.205.24.155      171	94.75.76.3          246	186.5.117.82
    22	03.148.72.192       97	43.255.113.232     172	94.228.204.229      247	186.46.168.42
    23	03.150.113.147      98	45.5.68.18         173	95.181.150.121      248	186.96.50.39
    24	03.159.46.18        99	45.5.68.25         174	95.181.151.105      249	186.154.211.106
    25	03.162.181.132     100	45.43.63.230       175	110.74.200.177      250	186.167.48.138

    26	03.177.45.7        101	45.67.212.99       176	112.163.123.242     251	186.202.176.153
    27	03.177.45.10       102	45.67.230.13       177	113.161.59.136      252	186.233.186.60
    28	03.177.45.11       103	45.71.203.110      178	115.87.196.88       253	186.251.71.193
    29	03.217.169.100     104	45.87.249.80       179	116.212.155.229     254	187.217.54.84
    30	03.232.215.194     105	45.122.233.76      180	117.54.114.101      255	188.94.225.177

    31	04.208.138.14      106	45.131.213.170     181	117.54.114.102      256	188.95.89.81
    32	04.244.75.205      107	45.158.158.29      182	117.54.114.103      257	188.133.153.143
    33	5.39.189.39        108	45.179.193.70      183	119.82.241.21       258	188.138.89.50
    34	05.149.219.201     109	45.183.142.126     184	120.72.20.225       259	188.138.90.226
    35	5.149.219.201      110	45.184.103.68      185	121.1.41.162        260	188.166.218.243

    36	5.189.229.42       111	45.184.155.7       186	123.31.30.100       261	190.128.225.115
    37	07.151.182.247     112	45.189.113.63      187	125.25.33.241       262	190.217.7.73
    38	07.154.221.245     113	45.189.117.237     188	125.25.206.28       263	190.217.19.243
    39	07.244.242.103     114	45.192.141.247     189	133.242.146.103     264	192.3.219.94
    40	08.177.248.47      115	45.229.32.190      190	137.74.93.21        265	192.99.38.64

    41	08.177.248.213     116	45.250.65.15       191	137.184.57.245      266	192.140.42.83
    42	08.177.248.217     117	46.99.146.232      192	139.5.151.182       267	192.155.107.59
    43	8.210.83.33        118	46.243.220.70      193	139.59.233.24       268	192.254.104.201
    44	8.213.128.19       119	46.246.80.6        194	139.255.58.212      269	194.5.193.183
    45	8.213.128.30       120	47.74.114.83       195	140.238.19.26       270	194.114.128.149

    46	8.213.128.41       121	47.88.79.154       196	151.106.13.221      271	194.233.67.98
    47	8.213.128.106      122	47.91.44.217       197	151.106.18.126      272	194.233.69.41
    48	8.213.128.123      123	47.243.75.115      198	152.26.229.67       273	194.233.73.103
    49	8.213.128.131      124	47.254.28.2        199	152.32.143.109      274	194.233.73.104
    50	8.213.128.149      125	49.156.47.162      200	153.122.106.94      275	194.233.73.105

    51	8.213.128.152      126	50.195.227.153     201	153.122.107.129     276	194.233.73.107
    52	8.213.128.158      127	50.235.149.74      202	154.85.35.235       277	194.233.73.109
    53	8.213.128.171      128	50.250.56.129      203	154.95.36.182       278	194.233.88.38
    54	8.213.128.172      129	51.68.199.120      204	154.236.162.59      279	195.80.49.3
    55	8.213.128.202      130	51.77.141.29       205	154.236.168.179     280	195.80.49.4

    56	8.213.128.214      131	51.81.32.81        206	154.236.177.101     281	195.80.49.5
    57	8.213.129.23       132	51.159.3.223       207	154.236.179.226     282	195.80.49.6
    58	8.213.129.36       133	51.178.182.23      208	157.100.26.69       283	195.80.49.7
    59	8.213.129.51       134	54.80.246.241      209	159.65.69.186       284	195.80.49.253
    60	8.213.129.57       135	61.9.48.169        210	159.65.133.175      285	195.80.49.254

    61	8.213.129.243      136	61.9.53.157        211	159.203.13.121      286	195.158.30.232
    62	8.214.41.50        137	62.75.219.49       212	160.16.242.164      287	197.149.247.82
    63	8.218.213.95       138	62.75.229.77       213	161.22.34.142       288	197.243.20.178
    64	09.200.156.102     139	62.78.84.159       214	164.132.137.241     289	198.46.200.70
    65	13.237.147.45      140	62.138.8.42        215	167.71.207.46       290	198.229.231.13

    66	20.47.108.204      141	62.204.35.69       216	167.86.81.208       291	212.112.113.178
    67	20.113.24.12       142	63.161.104.189     217	167.249.180.42      292	212.154.234.46
    68	23.19.7.136        143	66.29.154.103      218	168.205.100.36      293	212.174.44.87
    69	23.19.10.93        144	66.29.154.105      219	169.57.1.85         294	213.32.75.44
    70	23.81.127.253      145	69.163.252.140     220	170.81.35.26        295	213.230.69.193

    71	23.105.78.193      146	76.118.227.8       221	170.83.60.19        296	213.230.71.230
    72	23.105.78.252      147	77.83.86.65        222	170.155.5.235       297	213.230.90.106
    73	23.105.86.52       148	77.83.87.217       223	171.233.151.214     298	221.159.192.122
    74	23.108.42.228      149	77.104.97.3        224	172.241.156.1       299	222.158.197.138
    75	23.108.42.238      150	77.236.243.125     225	172.241.192.104     300	222.252.23.5

Solution 18 - Regex

If you are not given a specific file and you need to extract IP address then we need to do it recursively. grep command -> Searches a text or file for matching a given string and displays the matched string .

grep -roE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'

-r We can search the entire directory tree i.e. the current directory and all levels of sub-directories. It denotes recursive searching.

-o Print only the matching string

-E Use extended regular expression

If we would not have used the second grep command after the pipe we would have got the IP address along with the path where it is present

Solution 19 - Regex

cat ip_address.txt | grep '^[0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}[,].*$\|^.*[,][0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}[,].*$\|^.*[,][0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}$'

Lets assume the file is comma delimited and the position of ip address in the beginning ,end and somewhere in the middle

First regexp looks for the exact match of ip address in the beginning of the line. The second regexp after the or looks for ip address in the middle.we are matching it in such a way that the number that follows ,should be exactly 1 to 3 digits .falsy ips like 12345.12.34.1 can be excluded in this.

The third regexp looks for the ip address at the end of the line

Solution 20 - Regex

I wanted to get only IP addresses that began with "10", from any file in a directory:

grep -o -nr "[10]\{2\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" /var/www

Solution 21 - Regex

for centos6.3

ifconfig eth0 | grep 'inet addr' | awk '{print $2}' | awk 'BEGIN {FS=":"} {print $2}'

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionKazimieras AliulisView Question on Stackoverflow
Solution 1 - RegexbrienView Answer on Stackoverflow
Solution 2 - RegexSarel BothaView Answer on Stackoverflow
Solution 3 - RegexSankalpView Answer on Stackoverflow
Solution 4 - RegexJB.View Answer on Stackoverflow
Solution 5 - RegexJamesView Answer on Stackoverflow
Solution 6 - RegexAviView Answer on Stackoverflow
Solution 7 - RegexPhil L.View Answer on Stackoverflow
Solution 8 - Regexshaa0601View Answer on Stackoverflow
Solution 9 - RegexpbiesView Answer on Stackoverflow
Solution 10 - RegexPolyThinkerView Answer on Stackoverflow
Solution 11 - RegexAllen RatcliffView Answer on Stackoverflow
Solution 12 - RegexannebView Answer on Stackoverflow
Solution 13 - RegexYokaiView Answer on Stackoverflow
Solution 14 - RegexMohsen SarkarView Answer on Stackoverflow
Solution 15 - RegexJonathan ApplebaumView Answer on Stackoverflow
Solution 16 - RegexSynt4x 4rr0rView Answer on Stackoverflow
Solution 17 - RegexRARE Kpop ManifestoView Answer on Stackoverflow
Solution 18 - RegexSpursh_UjjawalView Answer on Stackoverflow
Solution 19 - RegexAparnaView Answer on Stackoverflow
Solution 20 - RegexShōgun8View Answer on Stackoverflow
Solution 21 - RegexapachebeardView Answer on Stackoverflow