grepping using the "|" alternative operator

RegexLinuxGrep

Regex Problem Overview


The following is a sample of a large file named AT5G60410.gff:

Chr5	TAIR10	gene	24294890	24301147	.	+	.	ID=AT5G60410;Note=protein_coding_gene;Name=AT5G60410
Chr5	TAIR10	mRNA	24294890	24301147	.	+	.	ID=AT5G60410.1;Parent=AT5G60410;Name=AT5G60410.1;Index=1
Chr5	TAIR10	protein	24295226	24300671	.	+	.	ID=AT5G60410.1-Protein;Name=AT5G60410.1;Derives_from=AT5G60410.1
Chr5	TAIR10	exon	24294890	24295035	.	+	.	Parent=AT5G60410.1
Chr5	TAIR10	five_prime_UTR	24294890	24295035	.	+	.	Parent=AT5G60410.1
Chr5	TAIR10	exon	24295134	24295249	.	+	.	Parent=AT5G60410.1
Chr5	TAIR10	five_prime_UTR	24295134	24295225	.	+	.	Parent=AT5G60410.1
Chr5	TAIR10	CDS	24295226	24295249	.	+	0	Parent=AT5G60410.1,AT5G60410.1-Protein;
Chr5	TAIR10	exon	24295518	24295598	.	+	.	Parent=AT5G60410.1

I am having some trouble extracting specific lines from this using grep. I wanted to extract all lines that are of type "gene" or type "exon", specified in the third column. I was suprised when this did not work:

grep 'gene|exon' AT5G60410.gff

No results are returned. Where have I gone wrong?

Regex Solutions


Solution 1 - Regex

You need to escape the |. The following should do the job.

grep "gene\|exon" AT5G60410.gff

Solution 2 - Regex

By default, grep treats the typical special characters as normal characters unless they are escaped. So you could use the following:

grep 'gene\|exon' AT5G60410.gff

However, you can change its mode by using the following forms to do what you are expecting:

egrep 'gene|exon' AT5G60410.gff
grep -E 'gene|exon' AT5G60410.gff

Solution 3 - Regex

This is a different way of grepping for a few choices:

grep -e gene -e exon AT5G60410.gff

the -e switch specifies different patterns to match.

Solution 4 - Regex

This will work:

grep "gene\|exon" AT5G60410.gff

Solution 5 - Regex

I found this question while googling for a particular problem I was having involving a piped command to a grep command that used the alternation operator in a regex, so I thought that I would contribute my more specialized answer.

The error I faced turned out to be with the previous pipe operator (i.e. |) and not the alternation operator (i.e. | identical to pipe operator) in the grep regex at all. The answer for me was to properly escape and quote as necessary special shell characters such as & before assuming the issue was with my grep regex that involved the alternation operator.

For example, the command I executed on my local machine was:

get http://localhost/foobar-& | grep "fizz\|buzz"

This command resulted in the following error:

-bash: syntax error near unexpected token `|'

This error was corrected by changing my command to:

get "http://localhost/foobar-&" | grep "fizz\|buzz"

By escaping the & character with double quotes I was able to resolve my issue. The answer had nothing to do with the alternation operation at all.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMattLBeckView Question on Stackoverflow
Solution 1 - RegexJeff FosterView Answer on Stackoverflow
Solution 2 - Regexa'rView Answer on Stackoverflow
Solution 3 - RegexNathan FellmanView Answer on Stackoverflow
Solution 4 - RegexennuikillerView Answer on Stackoverflow
Solution 5 - RegexentpnerdView Answer on Stackoverflow