Find unique lines

LinuxSortingUniqueUniq

Linux Problem Overview


How can I find the unique lines and remove all duplicates from a file? My input file is

1
1
2
3
5
5
7
7

I would like the result to be:

2
3

sort file | uniq will not do the job. Will show all values 1 time

Linux Solutions


Solution 1 - Linux

uniq has the option you need:

   -u, --unique
          only print unique lines

$ cat file.txt
1
1
2
3
5
5
7
7
$ uniq -u file.txt
2
3

Solution 2 - Linux

Use as follows:

sort < filea | uniq > fileb

Solution 3 - Linux

You could also print out the unique value in "file" using the cat command by piping to sort and uniq

cat file | sort | uniq -u

Solution 4 - Linux

While sort takes O(n log(n)) time, I prefer using

awk '!seen[$0]++'

awk '!seen[$0]++' is an abbreviation for awk '!seen[$0]++ {print}', print line(=$0) if seen[$0] is not zero. It take more space but only O(n) time.

Solution 5 - Linux

you can use:

sort data.txt| uniq -u

this sort data and filter by unique values

Solution 6 - Linux

uniq -u has been driving me crazy because it did not work.

So instead of that, if you have python (most Linux distros and servers already have it):

#Assuming you have the data file in notUnique.txt

#Python
#Assuming file has data on different lines
#Otherwise fix split() accordingly.

uniqueData = []
fileData = open('notUnique.txt').read().split('\n')

for i in fileData:
  if i.strip()!='':
    uniqueData.append(i)

print uniqueData

###Another option (less keystrokes):
set(open('notUnique.txt').read().split('\n'))

#Note that due to empty lines, the final set may contain '' or only-space strings. You can remove that later. Or just get away with copying from the terminal ;)

############### Just FYI, From the uniq Man page:

"Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without 'uniq'. Also, comparisons honor the rules specified by 'LC_COLLATE'."

One of the correct ways, to invoke with:

sort nonUnique.txt | uniq

Example run:

$ cat x
3
1
2
2
2
3
1
3

$ uniq x
3
1
2
3
1
3

$ uniq -u x
3
1
3
1
3

$ sort x | uniq
1
2
3

#Spaces might be printed, so be prepared!

Solution 7 - Linux

I find this easier.

sort -u input_filename > output_filename

-u stands for unique.

Solution 8 - Linux

uniq -u < file will do the job.

Solution 9 - Linux

uniq should do fine if you're file is/can be sorted, if you can't sort the file for some reason you can use awk:

awk '{a[$0]++}END{for(i in a)if(a[i]<2)print i}'

Solution 10 - Linux

sort -d "file name" | uniq -u

this worked for me for a similar one. Use this if it is not arranged. You can remove sort if it is arranged

Solution 11 - Linux

This was the first i tried

skilla:~# uniq -u all.sorted  

76679787
76679787 
76794979
76794979 
76869286
76869286 
......

After doing a cat -e all.sorted

skilla:~# cat -e all.sorted 
$
76679787$
76679787 $
76701427$
76701427$
76794979$
76794979 $
76869286$
76869286 $

Every second line has a trailing space :( After removing all trailing spaces it worked!

thank you

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionamprantinoView Question on Stackoverflow
Solution 1 - LinuxLev LevitskyView Answer on Stackoverflow
Solution 2 - LinuxkasavbereView Answer on Stackoverflow
Solution 3 - LinuxoctocatsupView Answer on Stackoverflow
Solution 4 - LinuxhychouView Answer on Stackoverflow
Solution 5 - LinuxblackerView Answer on Stackoverflow
Solution 6 - Linuxashmew2View Answer on Stackoverflow
Solution 7 - LinuxAnant MittalView Answer on Stackoverflow
Solution 8 - LinuxShiplu MokaddimView Answer on Stackoverflow
Solution 9 - Linuxuser4401178View Answer on Stackoverflow
Solution 10 - Linuxa_rookie_seeking_answersView Answer on Stackoverflow
Solution 11 - LinuxamprantinoView Answer on Stackoverflow