grepping using the "|" alternative operator
RegexLinuxGrepRegex Problem Overview
The following is a sample of a large file named AT5G60410.gff:
Chr5 TAIR10 gene 24294890 24301147 . + . ID=AT5G60410;Note=protein_coding_gene;Name=AT5G60410
Chr5 TAIR10 mRNA 24294890 24301147 . + . ID=AT5G60410.1;Parent=AT5G60410;Name=AT5G60410.1;Index=1
Chr5 TAIR10 protein 24295226 24300671 . + . ID=AT5G60410.1-Protein;Name=AT5G60410.1;Derives_from=AT5G60410.1
Chr5 TAIR10 exon 24294890 24295035 . + . Parent=AT5G60410.1
Chr5 TAIR10 five_prime_UTR 24294890 24295035 . + . Parent=AT5G60410.1
Chr5 TAIR10 exon 24295134 24295249 . + . Parent=AT5G60410.1
Chr5 TAIR10 five_prime_UTR 24295134 24295225 . + . Parent=AT5G60410.1
Chr5 TAIR10 CDS 24295226 24295249 . + 0 Parent=AT5G60410.1,AT5G60410.1-Protein;
Chr5 TAIR10 exon 24295518 24295598 . + . Parent=AT5G60410.1
I am having some trouble extracting specific lines from this using grep. I wanted to extract all lines that are of type "gene" or type "exon", specified in the third column. I was suprised when this did not work:
grep 'gene|exon' AT5G60410.gff
No results are returned. Where have I gone wrong?
Regex Solutions
Solution 1 - Regex
You need to escape the |
. The following should do the job.
grep "gene\|exon" AT5G60410.gff
Solution 2 - Regex
By default, grep treats the typical special characters as normal characters unless they are escaped. So you could use the following:
grep 'gene\|exon' AT5G60410.gff
However, you can change its mode by using the following forms to do what you are expecting:
egrep 'gene|exon' AT5G60410.gff
grep -E 'gene|exon' AT5G60410.gff
Solution 3 - Regex
This is a different way of grepping for a few choices:
grep -e gene -e exon AT5G60410.gff
the -e
switch specifies different patterns to match.
Solution 4 - Regex
This will work:
grep "gene\|exon" AT5G60410.gff
Solution 5 - Regex
I found this question while googling for a particular problem I was having involving a piped command to a grep
command that used the alternation operator in a regex, so I thought that I would contribute my more specialized answer.
The error I faced turned out to be with the previous pipe operator (i.e. |
) and not the alternation operator (i.e. |
identical to pipe operator) in the grep regex at all. The answer for me was to properly escape and quote as necessary special shell characters such as & before assuming the issue was with my grep regex that involved the alternation operator.
For example, the command I executed on my local machine was:
get http://localhost/foobar-& | grep "fizz\|buzz"
This command resulted in the following error:
-bash: syntax error near unexpected token `|'
This error was corrected by changing my command to:
get "http://localhost/foobar-&" | grep "fizz\|buzz"
By escaping the &
character with double quotes I was able to resolve my issue. The answer had nothing to do with the alternation operation at all.