Always include first line in grep
BashGrepBash Problem Overview
I often grep CSV files with column names on the first line. Therefore, I want the output of grep to always include the first line (to get the column names) as well as any lines matching the grep pattern. What is the best way to do this?
Bash Solutions
Solution 1 - Bash
sed:
sed '1p;/pattern/!d' input.txt
awk:
awk 'NR==1 || /pattern/' input.txt
grep1:
grep1() { awk -v pattern="${1:?pattern is empty}" 'NR==1 || $0~pattern' "${2:-/dev/stdin}"; }
Solution 2 - Bash
grep doesn't really have a concept of line number, but awk does, so here's an example to output lines contain "Incoming" - and the first line, whatever it is:
awk 'NR == 1 || /Incoming/' foo.csv
You could make a script (a bit excessive, but). I made a file, grep+1, and put this in it:
#!/bin/sh
pattern="$1" ; shift
exec awk 'NR == 1 || /'"$pattern"'/' "$@"
Now one can:
./grep+1 Incoming
edit: removed the "{print;}", which is awk's default action.
Solution 3 - Bash
You could include an alternate pattern match for the one of the column names. If a column was called COL then this would work:
$ grep -E 'COL|pattern' file.csv
Solution 4 - Bash
You can use sed
instead of grep
to do this:
sed -n -e '1p' -e '/pattern/p' < $FILE
This will print the first line twice, however, if it happens to contain the pattern.
-n
tells sed
not to print each line by default.
-e '1p'
prints the first line.
-e '/pattern/p'
prints each line that matches the pattern.
Solution 5 - Bash
Another option:
$ cat data.csv | (read line; echo "$line"; grep SEARCH_TERM)
Example:
$ echo "title\nvalue1\nvalue2\nvalue3" | (read line; echo "$line"; grep value2)
Output:
title
value2
Solution 6 - Bash
This is a very general solution, for example if you want to sort a file while keeping the first line in place. Basically, "pass the first line through as-is, then do whatever I want (awk
/grep
/sort
/whatever) on the rest of the data."
Try this in a script, perhaps calling it keepfirstline
(don't forget chmod +x keepfirstline
and to put it in your PATH
):
#!/bin/bash
IFS='' read -r JUST1LIINE
printf "%s\n" "$JUST1LIINE"
exec "$@"
It can be used as follows:
cat your.data.csv | keepfirstline grep SearchTerm > results.with.header.csv
or perhaps, if you want to filter with awk
cat your.data.csv | keepfirstline awk '$1 < 3' > results.with.header.csv
I often like to sort a file, but keeping the header in the first line
cat your.data.csv | keepfirstline sort
keepfirstline
executes the command it's given (grep SearchTerm
), but only after reading and printing the first line.
Solution 7 - Bash
Just do
head -1 <filename>
and then execute grep
Solution 8 - Bash
So, I posted a completely different short answer above a while back.
However, for those pining for a command that looks like grep in terms of taking all the same options (although this script requires you to use the long options if an optarg is involved), and can cope with weird characters in filenames, etc, etc.. have fun pulling this apart.
Essentially it's a grep that always emits the first line. If you think a file with no matching lines should skip emitting that first (header) line, well, that's left as an exercise for the reader. I saved is as grep+1
.
#!/bin/bash
# grep+1 [<option>...] [<regex>] [<file>...]
# Emits the first line of each input and ignores it otherwise.
# For grep options that have optargs, only the --forms will work here.
declare -a files options
regex_seen=false
regex=
double_dash_seen=false
for arg in "$@" ; do
is_file_or_rx=true
case "$arg" in
-*) is_file_or_rx=$double_dash_seen ;;
esac
if $is_file_or_rx ; then
if ! $regex_seen ; then
regex="$arg"
regex_seen=true
else
files[${#files[*]}]="$arg" # append the value
fi
else
options[${#options[*]}]="$arg" # append the value
fi
done
# We could either open files all at once in the shell and pass the handles into
# one grep call, but that would limit how many we can process to the fd limit.
# So instead, here's the simpler approach with a series of grep calls
if $regex_seen ; then
if [ ${#files[@]} -gt 0 ] ; then
for file in "${files[@]}" ; do
head -n 1 "$file"
tail -n +2 "$file" | grep --label="$file" "${options[@]}" "$regex"
done
else
grep "${options[@]}" # stdin
fi
else
grep "${options[@]}" # probably --help
fi
#--eof
Solution 9 - Bash
All answer were correct. Just another idea for situations to grep the output of a command (and not a file) including the first line could be done like this ;-)
df -h | grep -E '(^Filesystem|/mnt)' # <<< returns usage of devices, with mountpoint '/mnt/...'
ps aux | grep -E '(^USER|grep)' # <<< returns all grep-process
The -E
option of grep enables its regex-mode. The string we grep uses |
and can be interpretated as an "or", so we look in the df
-exmaple for lines:
- starting with
Filesystem
(leading '^' in the first sub expression means "line starts with") - and lines, that contains
/mnt
Another, way could be to pipe the output into a tempfile
and to grep the content like shown in other posts. This can be helpful, if you don't know the content of the first line.
head -1 <file> && grep ff <file>