Using multiple delimiters in awk

Awk Problem Overview

I have a file which contain following lines:

/logs/tc0001/tomcat/tomcat7.1/conf/catalina.properties:app.env.server.name = demo.example.com
/logs/tc0001/tomcat/tomcat7.2/conf/catalina.properties:app.env.server.name = quest.example.com
/logs/tc0001/tomcat/tomcat7.5/conf/catalina.properties:app.env.server.name = www.example.com

In above output I want to extract 3 fields (Number 2, 4 and the last one *.example.com). I am getting the following output:

cat file | awk -F'/' '{print $3 "\t" $5}'
tc0001   tomcat7.1
tc0001   tomcat7.2
tc0001   tomcat7.5

How do I also extract last field with domain name which is after '='? How do I use multiple delimiter to extract field?

Awk Solutions

Solution 1 - Awk

The delimiter can be a regular expression.

awk -F'[/=]' '{print $3 "\t" $5 "\t" $8}' file

Produces:

tc0001   tomcat7.1    demo.example.com  
tc0001   tomcat7.2    quest.example.com  
tc0001   tomcat7.5    www.example.com

Solution 2 - Awk

Good news! awk field separator can be a regular expression. You just need to use -F"<separator1>|<separator2>|...":

awk -F"/|=" -vOFS='\t' '{print $3, $5, $NF}' file

Returns:

tc0001  tomcat7.1  demo.example.com
tc0001  tomcat7.2  quest.example.com
tc0001  tomcat7.5  www.example.com

Here:

-F"/|=" sets the input field separator to either / or =.
-vOFS='\t' is using the -v flag for setting a variable. OFS is the default variable for the Output Field Separator and it is set to the tab character. The flag is necessary because there is no built-in for the OFS like -F.
{print $3, $5, $NF} prints the 3rd, 5th and last fields based on the input field separator.

See another example:

$ cat file
hello#how_are_you
i#am_very#well_thank#you

This file has two fields separators, # and _. If we want to print the second field regardless of the separator being one or the other, let's make both be separators!

$ awk -F"#|_" '{print $2}' file
how
am

Where the files are numbered as follows:

hello#how_are_you           i#am_very#well_thank#you
^^^^^ ^^^ ^^^ ^^^           ^ ^^ ^^^^ ^^^^ ^^^^^ ^^^
  1    2   3   4            1  2   3    4    5    6

Solution 3 - Awk

Another one is to use the -F option but pass it regex to print the text between left and or right parenthesis ().

The file content:

528(smbw)
529(smbt)
530(smbn)
10115(smbs)

The command:

awk -F"[()]" '{print $2}' filename

result:

smbw
smbt
smbn
smbs

Using awk to just print the text between []:

Use awk -F'[][]' but awk -F'[[]]' will not work.

http://stanlo45.blogspot.com/2020/06/awk-multiple-field-separators.html

Solution 4 - Awk

If your whitespace is consistent you could use that as a delimiter, also instead of inserting \t directly, you could set the output separator and it will be included automatically:

< file awk -v OFS='\t' -v FS='[/ ]' '{print $3, $5, $NF}'

Solution 5 - Awk

For a field separator of any number 2 through 5 or letter a or # or a space, where the separating character must be repeated at least 2 times and not more than 6 times, for example:

awk -F'[2-5a# ]{2,6}' ...

I am sure variations of this exist using ( ) and parameters

Solution 6 - Awk

Perl one-liner:

perl -F'/[\/=]/' -lane 'print "$F[2]\t$F[4]\t$F[7]"' file

These command-line options are used:

-n loop around every line of the input file, put the line in the $_ variable, do not automatically print every line
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – perl will automatically split input lines into the @F array. Defaults to splitting on whitespace
-F autosplit modifier, in this example splits on either / or =
-e execute the perl code

Perl is closely related to awk, however, the @F autosplit array starts at index $F[0] while awk fields start with $1.

Solution 7 - Awk

I see many perfect answers are up on the board, but still would like to upload my piece of code too,

awk -F"/" '{print $3 " " $5 " " $7}' sam | sed 's/ cat.* =//g'

Solution 8 - Awk

Using Raku (formerly known as Perl_6)

raku -ne '.split(/ <[/=]> /).[2,4,7].put;'

Sample Input:

/logs/tc0001/tomcat/tomcat7.1/conf/catalina.properties:app.env.server.name = demo.example.com
/logs/tc0001/tomcat/tomcat7.2/conf/catalina.properties:app.env.server.name = quest.example.com
/logs/tc0001/tomcat/tomcat7.5/conf/catalina.properties:app.env.server.name = www.example.com

Sample Output:

tc0001 tomcat7.1  demo.example.com
tc0001 tomcat7.2  quest.example.com
tc0001 tomcat7.5  www.example.com

Above is a solution coded in Raku, a member of the Perl-family of programming languages. Briefly, input in read linewise with the -ne (linewise, non-autoprinting) commandline flags. Lines are split on a regex which consists of a custom character class (/=) created with the <[ ]> operator. Elements [2,4,7] are then put to give the results above.

Of course, the above is a 'bare-bones' implementation, and Raku being a Perl-family language, TMTOWTDI applies. So lines can be split on literal characters separated by a | "OR" operator. Element numbering (which is zero-indexed in both Perl and Raku) can be tightened up adding the :skip-empty adverb to the split routine. Whitespace can be trim-med away from each element (using map), and the desired elements (now [1,3,6]) are join-ed with \t tabs, giving the following result:

raku -ne '.split(/ "/" | "=" /, :skip-empty).map(*.trim).[1,3,6].join("\t").put;' file
tc0001	tomcat7.1	demo.example.com
tc0001	tomcat7.2	quest.example.com
tc0001	tomcat7.5	www.example.com

https://raku.org

Content Type	Original Author	Original Content on Stackoverflow
Question	Satish	View Question on Stackoverflow
Solution 1 - Awk	embedded.kyle	View Answer on Stackoverflow
Solution 2 - Awk	fedorqui	View Answer on Stackoverflow
Solution 3 - Awk	Stan Lovisa	View Answer on Stackoverflow
Solution 4 - Awk	Thor	View Answer on Stackoverflow
Solution 5 - Awk	genome	View Answer on Stackoverflow
Solution 6 - Awk	Chris Koknat	View Answer on Stackoverflow
Solution 7 - Awk	Sadhun	View Answer on Stackoverflow
Solution 8 - Awk	jubilatious1	View Answer on Stackoverflow