Count all occurrences of a string in lots of files with grep
GrepGrep Problem Overview
I have a bunch of log files. I need to find out how many times a string occurs in all files.
grep -c string *
returns
...
file1:1
file2:0
file3:0
...
Using a pipe I was able to get only files that have one or more occurrences:
grep -c string * | grep -v :0
...
file4:5
file5:1
file6:2
...
How can I get only the combined count? (If it returns file4:5, file5:1, file6:2
, I want to get back 8.)
Grep Solutions
Solution 1 - Grep
This works for multiple occurrences per line:
grep -o string * | wc -l
Solution 2 - Grep
cat * | grep -c string
Solution 3 - Grep
grep -oh string * | wc -w
will count multiple occurrences in a line
Solution 4 - Grep
Instead of using -c, just pipe it to wc -l.
grep string * | wc -l
This will list each occurrence on a single line and then count the number of lines.
This will miss instances where the string occurs 2+ times on one line, though.
Solution 5 - Grep
cat * | grep -c string
One of the rare useful applications of cat
.
Solution 6 - Grep
You can add -R
to search recursively (and avoid to use cat) and -I
to ignore binary files.
grep -RIc string .
Solution 7 - Grep
Something different than all the previous answers:
perl -lne '$count++ for m/<pattern>/g;END{print $count}' *
Solution 8 - Grep
Obligatory AWK solution:
grep -c string * | awk 'BEGIN{FS=":"}{x+=$2}END{print x}'
Take care if your file names include ":" though.
Solution 9 - Grep
If you want number of occurrences per file (example for string "tcp"):
grep -RIci "tcp" . | awk -v FS=":" -v OFS="\t" '$2>0 { print $2, $1 }' | sort -hr
Example output:
53 ./HTTPClient/src/HTTPClient.cpp
21 ./WiFi/src/WiFiSTA.cpp
19 ./WiFi/src/ETH.cpp
13 ./WiFi/src/WiFiAP.cpp
4 ./WiFi/src/WiFiClient.cpp
4 ./HTTPClient/src/HTTPClient.h
3 ./WiFi/src/WiFiGeneric.cpp
2 ./WiFi/examples/WiFiClientBasic/WiFiClientBasic.ino
2 ./WiFiClientSecure/src/ssl_client.cpp
1 ./WiFi/src/WiFiServer.cpp
Explanation:
grep -RIci NEEDLE .
- looks for string NEEDLE recursively from current directory (following symlinks), ignoring binaries, counting number of occurrences, ignoring caseawk ...
- this command ignores files with zero occurrences and formats linessort -hr
- sorts lines in reverse order by numbers in first column
Of course, it works with other grep commands with option -c
(count) as well. For example:
grep -c "tcp" *.txt | awk -v FS=":" -v OFS="\t" '$2>0 { print $2, $1 }' | sort -hr
Solution 10 - Grep
The AWK solution which also handles file names including colons:
grep -c string * | sed -r 's/^.*://' | awk 'BEGIN{}{x+=$1}END{print x}'
Keep in mind that this method still does not find multiple occurrences of string
on the same line.
Solution 11 - Grep
You can use a simple grep
to capture the number of occurrences effectively. I will use the -i
option to make sure STRING/StrING/string
get captured properly.
Command line that gives the files' name:
grep -oci string * | grep -v :0
Command line that removes the file names and prints 0 if there is a file without occurrences:
grep -ochi string *
Solution 12 - Grep
short recursive variant:
find . -type f -exec cat {} + | grep -c 'string'
Solution 13 - Grep
Here is a faster-than-grep AWK alternative way of doing this, which handles multiple matches of <url>
per line, within a collection of XML files in a directory:
awk '/<url>/{m=gsub("<url>","");total+=m}END{print total}' some_directory/*.xml
This works well in cases where some XML files don't have line breaks.
Solution 14 - Grep
Grep only solution which I tested with grep for windows:
grep -ro "pattern to find in files" "Directory to recursively search" | grep -c "pattern to find in files"
This solution will count all occurrences even if there are multiple on one line. -r
recursively searches the directory, -o
will "show only the part of a line matching PATTERN" -- this is what splits up multiple occurences on a single line and makes grep print each match on a new line; then pipe those newline-separated-results back into grep with -c
to count the number of occurrences using the same pattern.
Solution 15 - Grep
Another oneliner using basic command line functions handling multiple occurences per line.
cat * |sed s/string/\\\nstring\ /g |grep string |wc -l
Solution 16 - Grep
awk -v RS='' -v FPAT='fast' '{print NF,FILENAME}' <file1..N>
Take a string, make it a line look for instance of fast and then print the number of fields with the filename.