counting duplicates in a sorted sequence using command line tools

BashCommand LineSortingCountDuplicates

Bash Problem Overview


I have a command (cmd1) that greps through a log file to filter out a set of numbers. The numbers are in random order, so I use sort -gr to get a reverse sorted list of numbers. There may be duplicates within this sorted list. I need to find the count for each unique number in that list.

For e.g. if the output of cmd1 is:

100 
100 
100 
99 
99 
26 
25 
24 
24

I need another command that I can pipe the above output to, so that, I get:

100     3
99      2
26      1
25      1
24      2

Bash Solutions


Solution 1 - Bash

how about;

$ echo "100 100 100 99 99 26 25 24 24" \
    | tr " " "\n" \
    | sort \
    | uniq -c \
    | sort -k2nr \
    | awk '{printf("%s\t%s\n",$2,$1)}END{print}'

The result is :

100	3
99	2
26	1
25	1
24	2

Solution 2 - Bash

uniq -c works for GNU uniq 8.23 at least, and does exactly what you want (assuming sorted input).

Solution 3 - Bash

if order is not important

# echo "100 100 100 99 99 26 25 24 24" | awk '{for(i=1;i<=NF;i++)a[$i]++}END{for(o in a) printf "%s %s ",o,a[o]}'
26 1 100 3 99 2 24 2 25 1

Solution 4 - Bash

Numerically sort the numbers in reverse, then count the duplicates, then swap the left and the right words. Align into columns.

printf '%d\n' 100 99 26 25 100 24 100 24 99 \
   | sort -nr | uniq -c | awk '{printf "%-8s%s\n", $2, $1}'

100     3
99      2
26      1
25      1
24      2

Solution 5 - Bash

In Bash, we can use an associative array to count instances of each input value. Assuming we have the command $cmd1, e.g.

#!/bin/bash

cmd1='printf %d\n 100 99 26 25 100 24 100 24 99'

Then we can count values in the array variable a using the ++ mathematical operator on the relevant array entries:

while read i
do
    ((++a["$i"]))
done < <($cmd1)

We can print the resulting values:

for i in "${!a[@]}"
do
    echo "$i ${a[$i]}"
done

If the order of output is important, we might need an external sort of the keys:

for i in $(printf '%s\n' "${!a[@]}" | sort -nr)
do
    echo "$i ${a[$i]}"
done

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionletronjeView Question on Stackoverflow
Solution 1 - BashStephen Paul LesniewskiView Answer on Stackoverflow
Solution 2 - BashIbrahimView Answer on Stackoverflow
Solution 3 - Bashghostdog74View Answer on Stackoverflow
Solution 4 - BashericcurtinView Answer on Stackoverflow
Solution 5 - BashToby SpeightView Answer on Stackoverflow