How to print matched regex pattern using awk?

RegexAwk

Regex Problem Overview


Using awk, I need to find a word in a file that matches a regex pattern.

I only want to print the word matched with the pattern.

So if in the line, I have:

xxx yyy zzz

And pattern:

/yyy/

I want to only get:

yyy

EDIT: thanks to kurumi i managed to write something like this:

awk '{
        for(i=1; i<=NF; i++) {
                tmp=match($i, /[0-9]..?.?[^A-Za-z0-9]/)
                if(tmp) {
                        print $i
                }
        }
}' $1

and this is what i needed :) thanks a lot!

Regex Solutions


Solution 1 - Regex

This is the very basic

awk '/pattern/{ print $0 }' file

ask awk to search for pattern using //, then print out the line, which by default is called a record, denoted by $0. At least read up the documentation.

If you only want to get print out the matched word.

awk '{for(i=1;i<=NF;i++){ if($i=="yyy"){print $i} } }' file

Solution 2 - Regex

It sounds like you are trying to emulate GNU's grep -o behaviour. This will do that providing you only want the first match on each line:

awk 'match($0, /regex/) {
    print substr($0, RSTART, RLENGTH)
}
' file

Here's an example, using GNU's awk implementation ([tag:gawk]):

awk 'match($0, /a.t/) {
    print substr($0, RSTART, RLENGTH)
}
' /usr/share/dict/words | head
act
act
act
act
aft
ant
apt
art
art
art

Read about match, substr, RSTART and RLENGTH in the awk manual.

After that you may wish to extend this to deal with multiple matches on the same line.

Solution 3 - Regex

gawk can get the matching part of every line using this as action:

{ if (match($0,/your regexp/,m)) print m[0] }

> match(string, regexp [, array]) > If array is present, it is cleared, > and then the zeroth element of array is set to the entire portion of > string matched by regexp. If regexp contains parentheses, the > integer-indexed elements of array are set to contain the portion of > string matching the corresponding parenthesized subexpression. http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions

Solution 4 - Regex

If Perl is an option, you can try this:

perl -lne 'print $1 if /(regex)/' file

To implement case-insensitive matching, add the i modifier

perl -lne 'print $1 if /(regex)/i' file

To print everything AFTER the match:

perl -lne 'if ($found){print} else{if (/regex(.*)/){print $1; $found++}}' textfile

To print the match and everything after the match:

perl -lne 'if ($found){print} else{if (/(regex.*)/){print $1; $found++}}' textfile

Solution 5 - Regex

If you are only interested in the last line of input and you expect to find only one match (for example a part of the summary line of a shell command), you can also try this very compact code, adopted from https://stackoverflow.com/questions/5466411/print-regexp-matches-in-awk:

$ echo "xxx yyy zzz" | awk '{match($0,"yyy",a)}END{print a[0]}'
yyy

Or the more complex version with a partial result:

$ echo "xxx=a yyy=b zzz=c" | awk '{match($0,"yyy=([^ ]+)",a)}END{print a[1]}'
b

Warning: the awk match() function with three arguments only exists in gawk, not in mawk

Here is another nice solution using a lookbehind regex in grep instead of awk. This solution has lower requirements to your installation:

$ echo "xxx=a yyy=b zzz=c" | grep -Po '(?<=yyy=)[^ ]+'
b

Solution 6 - Regex

Off topic, this can be done using the grep also, just posting it here in case if anyone is looking for grep solution

echo 'xxx yyy zzze ' | grep -oE 'yyy'

Solution 7 - Regex

Using sed can also be elegant in this situation. Example (replace line with matched group "yyy" from line):

$ cat testfile
xxx yyy zzz
yyy xxx zzz
$ cat testfile | sed -r 's#^.*(yyy).*$#\1#g'
yyy
yyy

Relevant manual page: https://www.gnu.org/software/sed/manual/sed.html#Back_002dreferences-and-Subexpressions

Solution 8 - Regex

If you know what column the text/pattern you're looking for (e.g. "yyy") is in, you can just check that specific column to see if it matches, and print it.

For example, given a file with the following contents, (called asdf.txt)

xxx yyy zzz

to only print the second column if it matches the pattern "yyy", you could do something like this:

awk '$2 ~ /yyy/ {print $2}' asdf.txt

Note that this will also match basically any line where the second column has a "yyy" in it, like these:

xxx yyyz zzz
xxx zyyyz

Solution 9 - Regex

echo "abc123def" | awk '

function MATCH(haystack, needle, ltrim, rtrim)
{
if(ltrim == 0 && !length(ltrim))
  ltrim = 0;

if(rtrim == 0 && !length(rtrim))
  rtrim = 0;

return substr(haystack, match(haystack, needle) + ltrim, RLENGTH - ltrim - rtrim);
}
    
{
print $0 " - " MATCH($0, "123");             # 123
print $0 " - " MATCH($0, "[0-9]*d", 0, 1);   # 123
print $0 " - " MATCH($0, "1234");            # Nothing printed
}'

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionmarverixView Question on Stackoverflow
Solution 1 - RegexkurumiView Answer on Stackoverflow
Solution 2 - RegexJohnsywebView Answer on Stackoverflow
Solution 3 - RegexroyasView Answer on Stackoverflow
Solution 4 - RegexChris KoknatView Answer on Stackoverflow
Solution 5 - RegexDaniel AlderView Answer on Stackoverflow
Solution 6 - RegexZeusView Answer on Stackoverflow
Solution 7 - RegexKonrad BrodzikView Answer on Stackoverflow
Solution 8 - RegexkimboView Answer on Stackoverflow
Solution 9 - RegexRen HoekView Answer on Stackoverflow