Always include first line in grep

Bash Problem Overview

I often grep CSV files with column names on the first line. Therefore, I want the output of grep to always include the first line (to get the column names) as well as any lines matching the grep pattern. What is the best way to do this?

Bash Solutions

Solution 1 - Bash

sed:

sed '1p;/pattern/!d' input.txt

awk:

awk 'NR==1 || /pattern/' input.txt

grep1:

grep1() { awk -v pattern="${1:?pattern is empty}" 'NR==1 || $0~pattern' "${2:-/dev/stdin}"; }

Solution 2 - Bash

grep doesn't really have a concept of line number, but awk does, so here's an example to output lines contain "Incoming" - and the first line, whatever it is:

awk 'NR == 1 || /Incoming/' foo.csv

You could make a script (a bit excessive, but). I made a file, grep+1, and put this in it:

#!/bin/sh
pattern="$1" ; shift
exec awk 'NR == 1 || /'"$pattern"'/' "$@"

Now one can:

./grep+1 Incoming

edit: removed the "{print;}", which is awk's default action.

Solution 3 - Bash

You could include an alternate pattern match for the one of the column names. If a column was called COL then this would work:

$ grep -E 'COL|pattern' file.csv

Solution 4 - Bash

You can use sed instead of grep to do this:

sed -n -e '1p' -e '/pattern/p' < $FILE

This will print the first line twice, however, if it happens to contain the pattern.

-n tells sed not to print each line by default.
-e '1p' prints the first line.
-e '/pattern/p' prints each line that matches the pattern.

Solution 5 - Bash

Another option:

$ cat data.csv | (read line; echo "$line"; grep SEARCH_TERM)

Example:

$ echo "title\nvalue1\nvalue2\nvalue3" | (read line; echo "$line"; grep value2)

Output:

title
value2

Solution 6 - Bash

This is a very general solution, for example if you want to sort a file while keeping the first line in place. Basically, "pass the first line through as-is, then do whatever I want (awk/grep/sort/whatever) on the rest of the data."

Try this in a script, perhaps calling it keepfirstline (don't forget chmod +x keepfirstline and to put it in your PATH):

#!/bin/bash
IFS='' read -r JUST1LIINE
printf "%s\n" "$JUST1LIINE"
exec "$@"

It can be used as follows:

cat your.data.csv | keepfirstline grep SearchTerm > results.with.header.csv

or perhaps, if you want to filter with awk

cat your.data.csv | keepfirstline awk '$1 < 3' > results.with.header.csv

I often like to sort a file, but keeping the header in the first line

cat your.data.csv | keepfirstline sort

keepfirstline executes the command it's given (grep SearchTerm), but only after reading and printing the first line.

Solution 7 - Bash

Just do

head -1 <filename>

and then execute grep

Solution 8 - Bash

So, I posted a completely different short answer above a while back.

However, for those pining for a command that looks like grep in terms of taking all the same options (although this script requires you to use the long options if an optarg is involved), and can cope with weird characters in filenames, etc, etc.. have fun pulling this apart.

Essentially it's a grep that always emits the first line. If you think a file with no matching lines should skip emitting that first (header) line, well, that's left as an exercise for the reader. I saved is as grep+1.

#!/bin/bash
# grep+1 [<option>...] [<regex>] [<file>...]
# Emits the first line of each input and ignores it otherwise.
# For grep options that have optargs, only the --forms will work here.

declare -a files options
regex_seen=false
regex=

double_dash_seen=false
for arg in "$@" ; do
	is_file_or_rx=true
	case "$arg" in
		-*) is_file_or_rx=$double_dash_seen ;;
	esac
	if $is_file_or_rx ; then
		if ! $regex_seen ; then
			regex="$arg"
			regex_seen=true
		else
			files[${#files[*]}]="$arg"     # append the value
		fi
	else
		options[${#options[*]}]="$arg"     # append the value		
	fi
done

# We could either open files all at once in the shell and pass the handles into
# one grep call, but that would limit how many we can process to the fd limit.
# So instead, here's the simpler approach with a series of grep calls

if $regex_seen ; then
	if [ ${#files[@]} -gt 0 ] ; then
		for file in "${files[@]}" ; do
			head -n 1 "$file"
        	tail -n +2 "$file" | grep --label="$file" "${options[@]}" "$regex" 
        done
	else
		grep "${options[@]}"   # stdin
	fi
else
	grep "${options[@]}"   # probably --help
fi

#--eof

Solution 9 - Bash

All answer were correct. Just another idea for situations to grep the output of a command (and not a file) including the first line could be done like this ;-)

df -h | grep -E '(^Filesystem|/mnt)'  # <<< returns usage of devices, with mountpoint '/mnt/...'
ps aux | grep -E '(^USER|grep)'       # <<< returns all grep-process

The -E option of grep enables its regex-mode. The string we grep uses | and can be interpretated as an "or", so we look in the df-exmaple for lines:

starting with Filesystem (leading '^' in the first sub expression means "line starts with")
and lines, that contains /mnt

Another, way could be to pipe the output into a tempfile and to grep the content like shown in other posts. This can be helpful, if you don't know the content of the first line.

head -1 <file> && grep ff <file>

Content Type	Original Author	Original Content on Stackoverflow
Question	jhourback	View Question on Stackoverflow
Solution 1 - Bash	kev	View Answer on Stackoverflow
Solution 2 - Bash	Alex North-Keys	View Answer on Stackoverflow
Solution 3 - Bash	DigitalRoss	View Answer on Stackoverflow
Solution 4 - Bash	Adam Liss	View Answer on Stackoverflow
Solution 5 - Bash	Eyal Levin	View Answer on Stackoverflow
Solution 6 - Bash	Aaron McDaid	View Answer on Stackoverflow
Solution 7 - Bash	scibuff	View Answer on Stackoverflow
Solution 8 - Bash	Alex North-Keys	View Answer on Stackoverflow
Solution 9 - Bash	Sven	View Answer on Stackoverflow