An efficient way to transpose a file in Bash


Bash Problem Overview

I have a huge tab-separated file formatted like this

X column1 column2 column3
row1 0 1 2
row2 3 4 5
row3 6 7 8
row4 9 10 11

I would like to transpose it in an efficient way using only bash commands (I could write a ten or so lines Perl script to do that, but it should be slower to execute than the native bash functions). So the output should look like

X row1 row2 row3 row4
column1 0 3 6 9
column2 1 4 7 10
column3 2 5 8 11

I thought of a solution like this

cols=`head -n 1 input | wc -w`
for (( i=1; i <= $cols; i++))
do cut -f $i input | tr $'\n' $'\t' | sed -e "s/\t$/\n/g" >> output

But it's slow and doesn't seem the most efficient solution. I've seen a solution for vi in this post, but it's still over-slow. Any thoughts/suggestions/brilliant ideas? :-)

Bash Solutions

Solution 1 - Bash

awk '
    for (i=1; i<=NF; i++)  {
        a[NR,i] = $i
NF>p { p = NF }
END {    
    for(j=1; j<=p; j++) {
        for(i=2; i<=NR; i++){
            str=str" "a[i,j];
        print str
}' file


$ more file
0 1 2
3 4 5
6 7 8
9 10 11

$ ./
0 3 6 9
1 4 7 10
2 5 8 11

Performance against Perl solution by Jonathan on a 10000 lines file

$ head -5 file
1 0 1 2
2 3 4 5
3 6 7 8
4 9 10 11
1 0 1 2

$  wc -l < file

$ time perl file >/dev/null

real    0m0.480s
user    0m0.442s
sys     0m0.026s

$ time awk -f test.awk file >/dev/null

real    0m0.382s
user    0m0.367s
sys     0m0.011s

$ time perl file >/dev/null

real    0m0.481s
user    0m0.431s
sys     0m0.022s

$ time awk -f test.awk file >/dev/null

real    0m0.390s
user    0m0.370s
sys     0m0.010s

EDIT by Ed Morton (@ghostdog74 feel free to delete if you disapprove).

Maybe this version with some more explicit variable names will help answer some of the questions below and generally clarify what the script is doing. It also uses tabs as the separator which the OP had originally asked for so it'd handle empty fields and it coincidentally pretties-up the output a bit for this particular case.

$ cat tst.awk
BEGIN { FS=OFS="\t" }
    for (rowNr=1;rowNr<=NF;rowNr++) {
        cell[rowNr,NR] = $rowNr
    maxRows = (NF > maxRows ? NF : maxRows)
    maxCols = NR
    for (rowNr=1;rowNr<=maxRows;rowNr++) {
        for (colNr=1;colNr<=maxCols;colNr++) {
            printf "%s%s", cell[rowNr,colNr], (colNr < maxCols ? OFS : ORS)

$ awk -f tst.awk file
X       row1    row2    row3    row4
column1 0       3       6       9
column2 1       4       7       10
column3 2       5       8       11

The above solutions will work in any awk (except old, broken awk of course - there YMMV).

The above solutions do read the whole file into memory though - if the input files are too large for that then you can do this:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
{ printf "%s%s", (FNR>1 ? OFS : ""), $ARGIND }
    print ""
    if (ARGIND < NF) {
$ awk -f tst.awk file
X       row1    row2    row3    row4
column1 0       3       6       9
column2 1       4       7       10
column3 2       5       8       11

which uses almost no memory but reads the input file once per number of fields on a line so it will be much slower than the version that reads the whole file into memory. It also assumes the number of fields is the same on each line and it uses GNU awk for ENDFILE and ARGIND but any awk can do the same with tests on FNR==1 and END.

Solution 2 - Bash


rs is a BSD utility which also comes with macOS, but it is available from package managers on other platforms. It is named after the reshape function in APL.

Use sequences of spaces and tabs as column separator:

rs -T

Use tab as column separator:

rs -c -C -T

Use comma as column separator:

rs -c, -C, -T

-c changes the input column separator and -C changes the output column separator. A lone -c or -C sets the separator to tab. -T transposes rows and columns.

Do not use -t instead of -T, because it automatically selects the number of output columns so that the output lines fill the width of the display (which is 80 characters by default but which can be changed with -w).

When an output column separator is specified using -C, an extra column separator character is added to the end of each row, but you can remove it with sed:

$ seq 4|paste -d, - -|rs -c, -C, -T
$ seq 4|paste -d, - -|rs -c, -C, -T|sed s/.\$//

This fails with tables where the first line ends with one or more empty columns, because the number of columns is determined based on the number of columns on the first row:

$ rs -c, -C, -T<<<$'1,\n3,4'


$ seq 4|paste -d, - -|awk '{for(i=1;i<=NF;i++)a[i][NR]=$i}END{for(i in a)for(j in a[i])printf"%s"(j==NR?"\n":FS),a[i][j]}' FS=,

This uses arrays of arrays which is a gawk extension. macOS comes with a version of nawk from 2007 which does not support arrays of arrays.

To use space as a separator without collapsing sequences of space and tab characters, use FS='[ ]'.


$ seq 4|paste -d, - -|ruby -e'{|x|x.chomp.split(",",-1)}.transpose.each{|x|puts x*","}'

The -1 argument to split disables discarding empty fields at the end:

$ ruby -e'p"a,,".split(",")'
$ ruby -e'p"a,,".split(",",-1)'
["a", "", ""]

Function form:

$ tp(){ ruby -e's=ARGV[0];{|x|x.chomp.split(s==" "?/ /:s,-1)}.transpose.each{|x|puts x*s}' -- "${1-$'\t'}";}
$ seq 4|paste -d, - -|tp ,

s==" "?/ /:s is used above because when the argument to the split function is a single space, it enables awk-like special behavior where strings are split based on contiguous runs of spaces and tabs:

$ ruby -e'p" a  \tb ".split(/ /,-1)'
["", "a", "", "\tb", ""]
$ ruby -e'p" a  \tb ".split(" ",-1)'
["a", "b", ""]


tp(){ jq -R .|jq --arg x "${1-$'\t'}" -sr 'map(./$x)|transpose|map(join($x))[]';}

jq -R . prints each input line as a JSON string literal, -s (--slurp) creates an array for the input lines after parsing each line as JSON, and -r (--raw-output) outputs the contents of strings instead of JSON string literals. The / operator is overloaded to split strings.


$ printf %s\\n 1,2 3,4|Rscript -e 'write.table(t(read.table("stdin",sep=",")),"",sep=",",quote=F,col.names=F,row.names=F)'

If you replace Rscript with R, it echoes the code that is being run to STDOUT. It also results in the error ignoring SIGPIPE signal if it is followed by a command like head -n1 which exits before it has read the whole STDIN.

write.table prints to STDOUT when the argument for the output file is an empty string.

Solution 3 - Bash

A Python solution:

python -c "import sys; print('\n'.join(' '.join(c) for c in zip(*(l.split() for l in sys.stdin.readlines() if l.strip()))))" < input > output

The above is based on the following:

import sys

for c in zip(*(l.split() for l in sys.stdin.readlines() if l.strip())):
    print(' '.join(c))

This code does assume that every line has the same number of columns (no padding is performed).

Solution 4 - Bash

the transpose project on sourceforge is a coreutil-like C program for exactly that.

gcc transpose.c -o transpose
./transpose -t input > output #works with stdin, too.

Solution 5 - Bash

Have a look at [GNU datamash][1] which can be used like datamash transpose. A future version will also support cross tabulation (pivot tables)

Here is how you would do it with space separated columns:

datamash transpose -t ' ' < file > transposed_file

[1]: "GNU datamash"

Solution 6 - Bash

Pure BASH, no additional process. A nice exercise:

declare -a array=( )                      # we build a 1-D-array

read -a line < "$1"                       # read the headline

COLS=${#line[@]}                          # save number of columns
while read -a line ; do
	for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
done < "$1"

for (( ROW = 0; ROW < COLS; ROW++ )); do
  for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
    printf "%s\t" ${array[$COUNTER]}
  printf "\n" 

Solution 7 - Bash

GNU datamash is perfectly suited for this problem with only one line of code and potentially arbitrarily large filesize!

datamash -W transpose infile > outfile

Solution 8 - Bash

Here is a moderately solid Perl script to do the job. There are many structural analogies with @ghostdog74's awk solution.

#!/bin/perl -w
# SO 1729824

use strict;

my(%data);          # main storage
my($maxcol) = 0;
my($rownum) = 0;
while (<>)
    my(@row) = split /\s+/;
    my($colnum) = 0;
    foreach my $val (@row)
        $data{$rownum}{$colnum++} = $val;
    $maxcol = $colnum if $colnum > $maxcol;

my $maxrow = $rownum;
for (my $col = 0; $col < $maxcol; $col++)
    for (my $row = 0; $row < $maxrow; $row++)
        printf "%s%s", ($row == 0) ? "" : "\t",
                defined $data{$row}{$col} ? $data{$row}{$col} : "";
    print "\n";

With the sample data size, the performance difference between perl and awk was negligible (1 millisecond out of 7 total). With a larger data set (100x100 matrix, entries 6-8 characters each), perl slightly outperformed awk - 0.026s vs 0.042s. Neither is likely to be a problem.

Representative timings for Perl 5.10.1 (32-bit) vs awk (version 20040207 when given '-V') vs gawk 3.1.7 (32-bit) on MacOS X 10.5.8 on a file containing 10,000 lines with 5 columns per line:

Osiris JL: time gawk -f tr.awk xxx  > /dev/null

real	0m0.367s
user	0m0.279s
sys	0m0.085s
Osiris JL: time perl -f xxx > /dev/null

real	0m0.138s
user	0m0.128s
sys	0m0.008s
Osiris JL: time awk -f tr.awk xxx  > /dev/null

real	0m1.891s
user	0m0.924s
sys	0m0.961s
Osiris-2 JL: 

Note that gawk is vastly faster than awk on this machine, but still slower than perl. Clearly, your mileage will vary.

Solution 9 - Bash

There is a purpose built utility for this,

GNU datamash utility

apt install datamash  

datamash transpose < yourfile

Taken from this site, and

Solution 10 - Bash

Assuming all your rows have the same number of fields, this awk program solves the problem:

{for (f=1;f<=NF;f++) col[f] = col[f]":"$f} END {for (f=1;f<=NF;f++) print col[f]}

In words, as you loop over the rows, for every field f grow a ':'-separated string col[f] containing the elements of that field. After you are done with all the rows, print each one of those strings in a separate line. You can then substitute ':' for the separator you want (say, a space) by piping the output through tr ':' ' '.


$ echo "1 2 3\n4 5 6"
1 2 3
4 5 6

$ echo "1 2 3\n4 5 6" | awk '{for (f=1;f<=NF;f++) col[f] = col[f]":"$f} END {for (f=1;f<=NF;f++) print col[f]}' | tr ':' ' '
 1 4
 2 5
 3 6

Solution 11 - Bash

If you have sc installed, you can do:

psc -r < inputfile | sc -W% - > outputfile

Solution 12 - Bash

I normally use this little awk snippet for this requirement:

  awk '{for (i=1; i<=NF; i++) a[i,NR]=$i
        END {for (i=1; i<=max; i++)
              {for (j=1; j<=NR; j++) 
                  printf "%s%s", a[i,j], (j==NR?RS:FS)
        }' file

This just loads all the data into a bidimensional array a[line,column] and then prints it back as a[column,line], so that it transposes the given input.

This needs to keep track of the maximum amount of columns the initial file has, so that it is used as the number of rows to print back.

Solution 13 - Bash

A hackish perl solution can be like this. It's nice because it doesn't load all the file in memory, prints intermediate temp files, and then uses the all-wonderful paste

use warnings;
use strict;

my $counter;
open INPUT, "<$ARGV[0]" or die ("Unable to open input file!");
while (my $line = <INPUT>) {
	chomp $line;
	my @array = split ("\t",$line);
	open OUTPUT, ">temp$." or die ("unable to open output file!");
	print OUTPUT join ("\n",@array);
	close OUTPUT;
close INPUT;

# paste files together
my $execute = "paste ";
foreach (1..$counter) {
	$execute.="temp$counter ";
$execute.="> $ARGV[1]";
system $execute;

Solution 14 - Bash

The only improvement I can see to your own example is using awk which will reduce the number of processes that are run and the amount of data that is piped between them:

/bin/rm output 2> /dev/null

cols=`head -n 1 input | wc -w` 
for (( i=1; i <= $cols; i++))
  awk '{printf ("%s%s", tab, $'$i'); tab="\t"} END {print ""}' input
done >> output

Solution 15 - Bash

Some *nix standard util one-liners, no temp files needed. NB: the OP wanted an efficient fix, (i.e. faster), and the top answers are usually faster than this answer. These one-liners are for those who like *nix software tools, for whatever reasons. In rare cases, (e.g. scarce IO & memory), these snippets can actually be faster than some of the top answers.

Call the input file foo.

  1. If we know foo has four columns:

     for f in 1 2 3 4 ; do cut -d ' ' -f $f foo | xargs echo ; done
  2. If we don't know how many columns foo has:

     n=$(head -n 1 foo | wc -w)
     for f in $(seq 1 $n) ; do cut -d ' ' -f $f foo | xargs echo ; done

xargs has a size limit and therefore would make incomplete work with a long file. What size limit is system dependent, e.g.:

    { timeout '.01' xargs --show-limits ; } 2>&1 | grep Max

> Maximum length of command we could actually use: 2088944

  1. tr & echo:

     for f in 1 2 3 4; do cut -d ' ' -f $f foo | tr '\n\ ' ' ; echo; done

...or if the # of columns are unknown:

    n=$(head -n 1 foo | wc -w)
    for f in $(seq 1 $n); do 
        cut -d ' ' -f $f foo | tr '\n' ' ' ; echo

4. Using set, which like xargs, has similar command line size based limitations:

    for f in 1 2 3 4 ; do set - $(cut -d ' ' -f $f foo) ; echo $@ ; done

Solution 16 - Bash

I used fgm's solution (thanks fgm!), but needed to eliminate the tab characters at the end of each row, so modified the script thus:

declare -a array=( )                      # we build a 1-D-array

read -a line < "$1"                       # read the headline

COLS=${#line[@]}                          # save number of columns

while read -a line; do
    for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
done < "$1"

for (( ROW = 0; ROW < COLS; ROW++ )); do
  for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
    printf "%s" ${array[$COUNTER]}
    if [ $COUNTER -lt $(( ${#array[@]} - $COLS )) ]
    	printf "\t"
  printf "\n" 

Solution 17 - Bash

I was just looking for similar bash tranpose but with support for padding. Here is the script I wrote based on fgm's solution, that seem to work. If it can be of help...

declare -a array=( )                      # we build a 1-D-array
declare -a ncols=( )                      # we build a 1-D-array containing number of elements of each row

while read -a line; do
if [ ${#line[@]} -gt ${MAXROWS} ]
    for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
done < "$1"

for (( ROW = 0; ROW < MAXROWS; ROW++ )); do
  for (( indexCol=0; indexCol < ${#ncols[@]}; indexCol++ )); do
if [ $ROW -ge ${ncols[indexCol]} ]
      printf $PADDING
  printf "%s" ${array[$COUNTER]}
if [ $((indexCol+1)) -lt ${#ncols[@]} ]
  printf $SEPARATOR
    COUNTER=$(( COUNTER + ncols[indexCol] ))
  printf "\n" 

Solution 18 - Bash

Not very elegant, but this "single-line" command solves the problem quickly:

cols=4; for((i=1;i<=$cols;i++)); do \
            awk '{print $'$i'}' input | tr '\n' ' '; echo; \

Here cols is the number of columns, where you can replace 4 by head -n 1 input | wc -w.

Solution 19 - Bash

I was looking for a solution to transpose any kind of matrix (nxn or mxn) with any kind of data (numbers or data) and got the following solution:


for ((i=1; $i <= Line2Trans; i++));do
	for ((j=1; $j <=Col2Trans ; j++));do
		awk -v var1="$i" -v var2="$j" 'BEGIN { FS = "," }  ; NR==var1 {print $((var2)) }' $ARCHIVO >> Column_$i

paste -d',' `ls -mv Column_* | sed 's/,//g'` >> $ARCHIVO

Solution 20 - Bash

If you only want to grab a single (comma delimited) line $N out of a file and turn it into a column:

head -$N file | tail -1 | tr ',' '\n'

Solution 21 - Bash

Another awk solution and limited input with the size of memory you have.

awk '{ for (i=1; i<=NF; i++) RtoC[i]= (RtoC[i]? RtoC[i] FS $i: $i) }
    END{ for (i in RtoC) print RtoC[i] }' infile

This joins each same filed number positon into together and in END prints the result that would be first row in first column, second row in second column, etc. Will output:

X row1 row2 row3 row4
column1 0 3 6 9
column2 1 4 7 10
column3 2 5 8 11

Solution 22 - Bash

Here is a Bash one-liner that is based on simply converting each line to a column and paste-ing them together:

echo '' > tmp1;  \
cat m.txt | while read l ; \
            do    paste tmp1 <(echo $l | tr -s ' ' \\n) > tmp2; \
                  cp tmp2 tmp1; \
            done; \
cat tmp1


0 1 2
4 5 6
7 8 9
10 11 12
  1. creates tmp1 file so it's not empty.

  2. reads each line and transforms it into a column using tr

  3. pastes the new column to the tmp1 file

  4. copies result back into tmp1.

PS: I really wanted to use io-descriptors but couldn't get them to work.

Solution 23 - Bash


aline="$(head -n 1 file.txt)"
set -- $aline

#set -x
while read line; do
  set -- $line
  for i in $(seq $colNum); do
    eval col$i="\"\$col$i \$$i\""
done < file.txt

for i in $(seq $colNum); do
  eval echo \${col$i}

another version with set eval

Solution 24 - Bash

Another bash variant

$ cat file 
XXXX	col1	col2	col3
row1	0	    1   	2
row2	3	    4   	5
row3	6	    7   	8
row4	9	    10  	11



while read line; do
    for item in $line; { printf -v A$I[$i] $item; ((i++)); }
done < file
indexes=$(seq 0 $i)

for i in $indexes; {
    while ((J<I)); do
        printf "${!arr}\t"


$ ./test 
XXXX	row1	row2	row3	row4	
col1	0   	3	    6   	9	
col2	1	    4	    7	    10	
col3	2	    5   	8	    11

Solution 25 - Bash

Here's a Haskell solution. When compiled with -O2, it runs slightly faster than ghostdog's awk and slightly slower than Stephan's thinly wrapped c python on my machine for repeated "Hello world" input lines. Unfortunately GHC's support for passing command line code is non-existent as far as I can tell, so you will have to write it to a file yourself. It will truncate the rows to the length of the shortest row.

transpose :: [[a]] -> [[a]]
transpose = foldr (zipWith (:)) (repeat [])

main :: IO ()
main = interact $ unlines . map unwords . transpose . map words . lines

Solution 26 - Bash

An awk solution that store the whole array in memory

    awk '$0!~/^$/{    i++;
                  for (j in arr) {
                      if (maxr<j){ maxr=j}     # max number of output rows.
    END {
        maxc=i                 # max number of output columns.
        for     (j=1; j<=maxr; j++) {
            for (i=1; i<=maxc; i++) {
                printf( "%s:", out[i,j])
            printf( "%s\n","" )
    }' infile

But we may "walk" the file as many times as output rows are needed:

maxf="$(awk '{if (mf<NF); mf=NF}; END{print mf}' infile)"
for (( i=1; i<=rowcount; i++ )); do
    awk -v i="$i" -F " " '{printf("%s\t ", $i)}' infile

Which (for a low count of output rows is faster than the previous code).

Solution 27 - Bash

A oneliner using R...

  cat file | Rscript -e "d <- read.table(file('stdin'), sep=' ', row.names=1, header=T); write.table(t(d), file=stdout(), quote=F, col.names=NA) "

Solution 28 - Bash

I've used below two scripts to do similar operations before. The first is in awk which is a lot faster than the second which is in "pure" bash. You might be able to adapt it to your own application.

awk '
    for (i = 1; i <= NF; i++) {
        s[i] = s[i]?s[i] FS $i:$i
    for (i in s) {
        print s[i]
}' file.txt
declare -a arr

while IFS= read -r line
    for word in $line
        [[ ${arr[$i]} ]] && arr[$i]="${arr[$i]} $word" || arr[$i]=$word
done < file.txt

for ((i=0; i < ${#arr[@]}; i++))
    echo ${arr[i]}

Solution 29 - Bash

Simple 4 line answer, keep it readable.

col="$(head -1 file.txt | wc -w)"
for i in $(seq 1 $col); do
    awk '{ print $'$i' }' file.txt | paste -s -d "\t"

Solution 30 - Bash

I'm a little late to the game but how about this:

cat table.tsv | python -c "import pandas as pd, sys; pd.read_csv(sys.stdin, sep='\t').T.to_csv(sys.stdout, sep='\t')"

or zcat if it's gzipped.

This is assuming you have pandas installed in your version of python

Solution 31 - Bash

for i in $(seq $(head -n1 file.txt | tr ' ' '\n' | wc -l))
  cut -d' ' -f"$i" file.txt | paste -s -d' ' -


seq $(head -n1 file.txt | tr " " "\n" | wc -l) | xargs -I{} sh -c 'cut -d" " -f"{}" file.txt | paste -s -d" " -'


All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionFederico GiorgiView Question on Stackoverflow
Solution 1 - Bashghostdog74View Answer on Stackoverflow
Solution 2 - BashnisetamaView Answer on Stackoverflow
Solution 3 - BashStephan202View Answer on Stackoverflow
Solution 4 - Bashflying sheepView Answer on Stackoverflow
Solution 5 - BashpixelbeatView Answer on Stackoverflow
Solution 6 - BashFritz G. MehnerView Answer on Stackoverflow
Solution 7 - BashPalView Answer on Stackoverflow
Solution 8 - BashJonathan LefflerView Answer on Stackoverflow
Solution 9 - BashnelaaroView Answer on Stackoverflow
Solution 10 - BashGuilherme FreitasView Answer on Stackoverflow
Solution 11 - BashDennis WilliamsonView Answer on Stackoverflow
Solution 12 - BashfedorquiView Answer on Stackoverflow
Solution 13 - BashFederico GiorgiView Answer on Stackoverflow
Solution 14 - BashSimon CView Answer on Stackoverflow
Solution 15 - BashagcView Answer on Stackoverflow
Solution 16 - BashdtwView Answer on Stackoverflow
Solution 17 - Bashuser3251704View Answer on Stackoverflow
Solution 18 - BashFelipeView Answer on Stackoverflow
Solution 19 - BashAnother.ChemistView Answer on Stackoverflow
Solution 20 - BashallanbcampbellView Answer on Stackoverflow
Solution 21 - BashαғsнιηView Answer on Stackoverflow
Solution 22 - Bashkirill_igumView Answer on Stackoverflow
Solution 23 - BashDyno FuView Answer on Stackoverflow
Solution 24 - BashIvanView Answer on Stackoverflow
Solution 25 - BashstellegView Answer on Stackoverflow
Solution 26 - Bashuser2350426View Answer on Stackoverflow
Solution 27 - BashdputhierView Answer on Stackoverflow
Solution 28 - BashSamView Answer on Stackoverflow
Solution 29 - BashPenny LiuView Answer on Stackoverflow
Solution 30 - BashO.rkaView Answer on Stackoverflow
Solution 31 - BashJiangge ZhangView Answer on Stackoverflow