Sort CSV file by multiple columns using the "sort" command
CsvSortingUnixCsv Problem Overview
I have a CSV-like file, and I would like to sort it by column priority, like "ORDER BY" in SQL. For example, given the following rows,
3;1;2
1;3;2
1;2;3
2;3;1
2;1;3
3;2;1
If "ORDER BY" were column2, column1, column3
, the result would be:
2;1;3
3;1;2
1;2;3
3;2;1
1;3;2
2;3;1
I'd like to know how to get this same result using the sort
command on Unix.
Csv Solutions
Solution 1 - Csv
You need to use two options for the sort
command:
--field-separator
(or-t
)--key=<start,end>
(or-k
), to specify the sort key, i.e. which range of columns (start through end index) to sort by. Since you want to sort on 3 columns, you'll need to specify-k
3 times, for columns2,2
,1,1
, and3,3
.
To put it all together,
sort -t ';' -k 2,2 -k 1,1 -k 3,3
Note that sort
can't handle the situation in which fields contain the separator, even if it's escaped or quoted.
Also note: this is an old question, which belongs on UNIX.SE, and was also asked there a year later.
Old answer: depending on your system's version of sort
, the following might also work:
sort --field-separator=';' --key=2,1,3
Or, you might get "stray character in field spec".
According to the sort manual, if you don't specify the end column of the sort key, it defaults to the end of the line.
Solution 2 - Csv
Suppose you have another row 3;10;3
in your unsorted.csv
file. Then I guess you expect a numerically sorted result:
2;1;3
3;1;2
1;2;3
3;2;1
1;3;2
2;3;1
3;10;3
and not an alphabetically sorted one:
2;1;3
3;1;2
3;10;3
1;2;3
3;2;1
1;3;2
2;3;1
To get that, you have to use -n
:
sort --field-separator=';' -n -k 2,2 -k 1,1 -k 3,3 unsorted.csv
It is worth mentioning that 2,2
has to be used. If only 2
is used, then sort
takes the string from beginning of field 2 to the end. 2,2
makes sure that only field 2
is used.
Solution 3 - Csv
Charlie's answer above didn't work for me on Cygwin (sort version 2.0, GNU textutils), the following did:
sort -t"," -k2 -k1 -k1
Solution 4 - Csv
..and if anyone followed the 'sort' solution but now wants to get more than the single unique entry per line (i.e. the top X number of unique entries), once you've sorted the file using 'sort', you can use a little app I created here:
https://github.com/danieliversen/MiscStuff/blob/master/scripts/findTopUniques.java