How to count occurrences of a word in all the files of a directory?
LinuxUnixCountFindGrepLinux Problem Overview
I’m trying to count a particular word occurrence in a whole directory. Is this possible?
Say for example there is a directory with 100 files all of whose files may have the word “aaa” in them. How would I count the number of “aaa” in all the files under that directory?
I tried something like:
zegrep "xception" `find . -name '*auth*application*' | wc -l
But it’s not working.
Linux Solutions
Solution 1 - Linux
grep -roh aaa . | wc -w
Grep recursively all files and directories in the current dir searching for aaa, and output only the matches, not the entire line. Then, just use wc
to count how many words are there.
Solution 2 - Linux
Another solution based on find
and grep
.
find . -type f -exec grep -o aaa {} \; | wc -l
Should correctly handle filenames with spaces in them.
Solution 3 - Linux
Use grep
in its simplest way. Try grep --help
for more info.
-
To get count of a word in a particular file:
grep -c <word> <file_name>
Example:
grep -c 'aaa' abc_report.csv
Output:
445
-
To get count of a word in the whole directory:
grep -c -R <word>
Example:
grep -c -R 'aaa'
Output:
abc_report.csv:445 lmn_report.csv:129 pqr_report.csv:445 my_folder/xyz_report.csv:408
Solution 4 - Linux
Let's use AWK!
$ function wordfrequency() { awk 'BEGIN { FS="[^a-zA-Z]+" } { for (i=1; i<=NF; i++) { word = tolower($i); words[word]++ } } END { for (w in words) printf("%3d %s\n", words[w], w) } ' | sort -rn; }
$ cat your_file.txt | wordfrequency
This lists the frequency of each word occurring in the provided file. If you want to see the occurrences of your word, you can just do this:
$ cat your_file.txt | wordfrequency | grep yourword
To find occurrences of your word across all files in a directory (non-recursively), you can do this:
$ cat * | wordfrequency | grep yourword
To find occurrences of your word across all files in a directory (and it's sub-directories), you can do this:
$ find . -type f | xargs cat | wordfrequency | grep yourword
Source: AWK-ward Ruby
Solution 5 - Linux
find .|xargs perl -p -e 's/ /\n'|xargs grep aaa|wc -l
Solution 6 - Linux
cat the files together and grep the output: cat $(find /usr/share/doc/ -name '*.txt') | zegrep -ic '<exception\>'
if you want 'exceptional' to match, don't use the '<' and '\>' around the word.
Solution 7 - Linux
How about starting with:
cat * | sed 's/ /\n/g' | grep '^aaa$' | wc -l
as in the following transcript:
pax$ cat file1
this is a file number 1
pax$ cat file2
And this file is file number 2,
a slightly larger file
pax$ cat file[12] | sed 's/ /\n/g' | grep 'file$' | wc -l
4
The sed
converts spaces to newlines (you may want to include other space characters as well such as tabs, with sed 's/[ \t]/\n/g'
). The grep
just gets those lines that have the desired word, then the wc
counts those lines for you.
Now there may be edge cases where this script doesn't work but it should be okay for the vast majority of situations.
If you wanted a whole tree (not just a single directory level), you can use somthing like:
( find . -name '*.txt' -exec cat {} ';' ) | sed 's/ /\n/g' | grep '^aaa$' | wc -l
Solution 8 - Linux
There's also a grep regex syntax for matching words only:
# based on Carlos Campderrós solution posted in this thread
man grep | less -p '\<'
grep -roh '\<aaa\>' . | wc -l
For a different word matching regex syntax see:
man re_format | less -p '\[\[:<:\]\]'