Shell command to find lines common in two files

ShellCommand Line

Shell Problem Overview


I'm sure I once found a shell command which could print the common lines from two or more files. What is its name?

It was much simpler than diff.

Shell Solutions


Solution 1 - Shell

The command you are seeking is comm. eg:-

comm -12 1.sorted.txt 2.sorted.txt

Here:

-1 : suppress column 1 (lines unique to 1.sorted.txt)

-2 : suppress column 2 (lines unique to 2.sorted.txt)

Solution 2 - Shell

To easily apply the comm command to unsorted files, use Bash's process substitution:

$ bash --version
GNU bash, version 3.2.51(1)-release
Copyright (C) 2007 Free Software Foundation, Inc.
$ cat > abc
123
567
132
$ cat > def
132
777
321

So the files abc and def have one line in common, the one with "132". Using comm on unsorted files:

$ comm abc def
123
    132
567
132
    777
    321
$ comm -12 abc def # No output! The common line is not found
$

The last line produced no output, the common line was not discovered.

Now use comm on sorted files, sorting the files with process substitution:

$ comm <( sort abc ) <( sort def )
123
            132
    321
567
    777
$ comm -12 <( sort abc ) <( sort def )
132

Now we got the 132 line!

Solution 3 - Shell

To complement the Perl one-liner, here's its awk equivalent:

awk 'NR==FNR{arr[$0];next} $0 in arr' file1 file2

This will read all lines from file1 into the array arr[], and then check for each line in file2 if it already exists within the array (i.e. file1). The lines that are found will be printed in the order in which they appear in file2. Note that the comparison in arr uses the entire line from file2 as index to the array, so it will only report exact matches on entire lines.

Solution 4 - Shell

Maybe you mean comm ?

> Compare sorted files FILE1 and FILE2 line by line. > > With no options, produce three-column output. Column one > contains lines unique to FILE1, column > two contains lines unique to > FILE2, and column three contains lines common to both files.

The secret in finding these information are the info pages. For GNU programs, they are much more detailed than their man-pages. Try info coreutils and it will list you all the small useful utils.

Solution 5 - Shell

While

fgrep -v -f 1.txt 2.txt > 3.txt

gives you the differences of two files (what is in 2.txt and not in 1.txt), you could easily do a

fgrep -f 1.txt 2.txt > 3.txt

to collect all common lines, which should provide an easy solution to your problem. If you have sorted files, you should take comm nonetheless. Regards!

Note: You can use grep -F instead of fgrep.

Solution 6 - Shell

If the two files are not sorted yet, you can use:

comm -12 <(sort a.txt) <(sort b.txt)

and it will work, avoiding the error message comm: file 2 is not in sorted order when doing comm -12 a.txt b.txt.

Solution 7 - Shell

perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/'  file1 file2

Solution 8 - Shell

awk 'NR==FNR{a[$1]++;next} a[$1] ' file1 file2

Solution 9 - Shell

On limited version of Linux (like a QNAP (NAS) I was working on):

  • comm did not exist
  • grep -f file1 file2 can cause some problems as said by @ChristopherSchultz and using grep -F -f file1 file2 was really slow (more than 5 minutes - not finished it - over 2-3 seconds with the method below on files over 20 MB)

So here is what I did:

sort file1 > file1.sorted
sort file2 > file2.sorted

diff file1.sorted file2.sorted | grep "<" | sed 's/^< *//' > files.diff
diff file1.sorted files.diff | grep "<" | sed 's/^< *//' > files.same.sorted

If files.same.sorted shall be in the same order as the original ones, then add this line for same order than file1:

awk 'FNR==NR {a[$0]=$0; next}; $0 in a {print a[$0]}' files.same.sorted file1 > files.same

Or, for the same order than file2:

awk 'FNR==NR {a[$0]=$0; next}; $0 in a {print a[$0]}' files.same.sorted file2 > files.same

Solution 10 - Shell

For how to do this for multiple files, see the linked answer to Finding matching lines across many files.


Combining these two answers (answer 1 and answer 2), I think you can get the result you are needing without sorting the files:

#!/bin/bash
ans="matching_lines"

for file1 in *
do 
    for file2 in *
        do 
            if  [ "$file1" != "$ans" ] && [ "$file2" != "$ans" ] && [ "$file1" != "$file2" ] ; then
                echo "Comparing: $file1 $file2 ..." >> $ans
                perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/' $file1 $file2 >> $ans
            fi
         done 
done

Simply save it, give it execution rights (chmod +x compareFiles.sh) and run it. It will take all the files present in the current working directory and will make an all-vs-all comparison leaving in the "matching_lines" file the result.

Things to be improved:

  • Skip directories

  • Avoid comparing all the files two times (file1 vs file2 and file2 vs file1).

  • Maybe add the line number next to the matching string

Solution 11 - Shell

Not exactly what you were asking, but something I think still may be useful to cover a slightly different scenario

If you just want to quickly have certainty of whether there is any repeated line between a bunch of files, you can use this quick solution:

cat a_bunch_of_files* | sort | uniq | wc

If the number of lines you get is less than the one you get from

cat a_bunch_of_files* | wc

then there is some repeated line.

Solution 12 - Shell

rm file3.txt

cat file1.out | while read line1
do
        cat file2.out | while read line2
        do
                if [[ $line1 == $line2 ]]; then
                        echo $line1 >>file3.out
                fi
        done
done

This should do it.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questiontoo much phpView Question on Stackoverflow
Solution 1 - ShellJonathan LefflerView Answer on Stackoverflow
Solution 2 - ShellStephan WehnerView Answer on Stackoverflow
Solution 3 - ShellTatjana HeuserView Answer on Stackoverflow
Solution 4 - ShellJohannes Schaub - litbView Answer on Stackoverflow
Solution 5 - ShellferdyView Answer on Stackoverflow
Solution 6 - ShellBasjView Answer on Stackoverflow
Solution 7 - Shelluser2592005View Answer on Stackoverflow
Solution 8 - ShellR S JohnView Answer on Stackoverflow
Solution 9 - ShellMaster DJonView Answer on Stackoverflow
Solution 10 - ShellakarpovskyView Answer on Stackoverflow
Solution 11 - ShellKiteloopdesignView Answer on Stackoverflow
Solution 12 - ShellAlan JosephView Answer on Stackoverflow