Using multiple delimiters in awk
AwkCommand LineText ProcessingAwk Problem Overview
I have a file which contain following lines:
/logs/tc0001/tomcat/tomcat7.1/conf/catalina.properties:app.env.server.name = demo.example.com
/logs/tc0001/tomcat/tomcat7.2/conf/catalina.properties:app.env.server.name = quest.example.com
/logs/tc0001/tomcat/tomcat7.5/conf/catalina.properties:app.env.server.name = www.example.com
In above output I want to extract 3 fields (Number 2, 4 and the last one *.example.com
). I am getting the following output:
cat file | awk -F'/' '{print $3 "\t" $5}'
tc0001 tomcat7.1
tc0001 tomcat7.2
tc0001 tomcat7.5
How do I also extract last field with domain name which is after '='
? How do I use multiple delimiter
to extract field?
Awk Solutions
Solution 1 - Awk
The delimiter can be a regular expression.
awk -F'[/=]' '{print $3 "\t" $5 "\t" $8}' file
Produces:
tc0001 tomcat7.1 demo.example.com
tc0001 tomcat7.2 quest.example.com
tc0001 tomcat7.5 www.example.com
Solution 2 - Awk
Good news! awk
field separator can be a regular expression. You just need to use -F"<separator1>|<separator2>|..."
:
awk -F"/|=" -vOFS='\t' '{print $3, $5, $NF}' file
Returns:
tc0001 tomcat7.1 demo.example.com
tc0001 tomcat7.2 quest.example.com
tc0001 tomcat7.5 www.example.com
Here:
-
-F"/|="
sets the input field separator to either/
or=
. -
-vOFS='\t'
is using the-v
flag for setting a variable.OFS
is the default variable for the Output Field Separator and it is set to the tab character. The flag is necessary because there is no built-in for the OFS like-F
. -
{print $3, $5, $NF}
prints the 3rd, 5th and last fields based on the input field separator.
See another example:
$ cat file
hello#how_are_you
i#am_very#well_thank#you
This file has two fields separators, #
and _
. If we want to print the second field regardless of the separator being one or the other, let's make both be separators!
$ awk -F"#|_" '{print $2}' file
how
am
Where the files are numbered as follows:
hello#how_are_you i#am_very#well_thank#you
^^^^^ ^^^ ^^^ ^^^ ^ ^^ ^^^^ ^^^^ ^^^^^ ^^^
1 2 3 4 1 2 3 4 5 6
Solution 3 - Awk
Another one is to use the -F option but pass it regex to print the text between left and or right parenthesis ()
.
The file content:
528(smbw)
529(smbt)
530(smbn)
10115(smbs)
The command:
awk -F"[()]" '{print $2}' filename
result:
smbw
smbt
smbn
smbs
Using awk to just print the text between []
:
Use awk -F'[][]'
but awk -F'[[]]'
will not work.
http://stanlo45.blogspot.com/2020/06/awk-multiple-field-separators.html
Solution 4 - Awk
If your whitespace is consistent you could use that as a delimiter, also instead of inserting \t
directly, you could set the output separator and it will be included automatically:
< file awk -v OFS='\t' -v FS='[/ ]' '{print $3, $5, $NF}'
Solution 5 - Awk
For a field separator of any number 2
through 5
or letter a
or #
or a space, where the separating character must be repeated at least 2 times and not more than 6 times, for example:
awk -F'[2-5a# ]{2,6}' ...
I am sure variations of this exist using ( ) and parameters
Solution 6 - Awk
Perl one-liner:
perl -F'/[\/=]/' -lane 'print "$F[2]\t$F[4]\t$F[7]"' file
These command-line options are used:
-
-n
loop around every line of the input file, put the line in the$_
variable, do not automatically print every line -
-l
removes newlines before processing, and adds them back in afterwards -
-a
autosplit mode – perl will automatically split input lines into the@F
array. Defaults to splitting on whitespace -
-F
autosplit modifier, in this example splits on either/
or=
-
-e
execute the perl code
Perl is closely related to awk, however, the @F
autosplit array starts at index $F[0]
while awk fields start with $1.
Solution 7 - Awk
I see many perfect answers are up on the board, but still would like to upload my piece of code too,
awk -F"/" '{print $3 " " $5 " " $7}' sam | sed 's/ cat.* =//g'
Solution 8 - Awk
Using Raku (formerly known as Perl_6)
raku -ne '.split(/ <[/=]> /).[2,4,7].put;'
Sample Input:
/logs/tc0001/tomcat/tomcat7.1/conf/catalina.properties:app.env.server.name = demo.example.com
/logs/tc0001/tomcat/tomcat7.2/conf/catalina.properties:app.env.server.name = quest.example.com
/logs/tc0001/tomcat/tomcat7.5/conf/catalina.properties:app.env.server.name = www.example.com
Sample Output:
tc0001 tomcat7.1 demo.example.com
tc0001 tomcat7.2 quest.example.com
tc0001 tomcat7.5 www.example.com
Above is a solution coded in Raku, a member of the Perl-family of programming languages. Briefly, input in read linewise with the -ne
(linewise, non-autoprinting) commandline flags. Lines are split
on a regex which consists of a custom character class (/=
) created with the <[
]>
operator. Elements [2,4,7]
are then put
to give the results above.
Of course, the above is a 'bare-bones' implementation, and Raku being a Perl-family language, TMTOWTDI applies. So lines can be split
on literal characters separated by a |
"OR" operator. Element numbering (which is zero-indexed in both Perl and Raku) can be tightened up adding the :skip-empty
adverb to the split
routine. Whitespace can be trim
-med away from each element (using map
), and the desired elements (now [1,3,6]
) are join
-ed with \t
tabs, giving the following result:
raku -ne '.split(/ "/" | "=" /, :skip-empty).map(*.trim).[1,3,6].join("\t").put;' file
tc0001 tomcat7.1 demo.example.com
tc0001 tomcat7.2 quest.example.com
tc0001 tomcat7.5 www.example.com