Python: CSV write by column rather than row

PythonCsv

Python Problem Overview


I have a python script that generates a bunch of data in a while loop. I need to write this data to a CSV file, so it writes by column rather than row.

For example in loop 1 of my script I generate:

(1, 2, 3, 4)

I need this to reflect in my csv script like so:

Result_1    1
Result_2    2
Result_3    3
Result_4    4

On my second loop i generate:

(5, 6, 7, 8)

I need this to look in my csv file like so:

Result_1    1    5
Result_2    2    6
Result_3    3    7
Result_4    4    8

and so forth until the while loop finishes. Can anybody help me?


EDIT

The while loop can last over 100,000 loops

Python Solutions


Solution 1 - Python

The reason csv doesn't support that is because variable-length lines are not really supported on most filesystems. What you should do instead is collect all the data in lists, then call zip() on them to transpose them after.

>>> l = [('Result_1', 'Result_2', 'Result_3', 'Result_4'), (1, 2, 3, 4), (5, 6, 7, 8)]
>>> zip(*l)
[('Result_1', 1, 5), ('Result_2', 2, 6), ('Result_3', 3, 7), ('Result_4', 4, 8)]

Solution 2 - Python

wr.writerow(item)  #column by column
wr.writerows(item) #row by row

This is quite simple if your goal is just to write the output column by column.

If your item is a list:

yourList = []

with open('yourNewFileName.csv', 'w', ) as myfile:
    wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
    for word in yourList:
        wr.writerow([word])

Solution 3 - Python

Updating lines in place in a file is not supported on most file system (a line in a file is just some data that ends with newline, the next line start just after that).

As I see it you have two options:

  1. Have your data generating loops be generators, this way they won't consume a lot of memory - you'll get data for each row "just in time"
  2. Use a database (sqlite?) and update the rows there. When you're done - export to CSV

Small example for the first method:

from itertools import islice, izip, count
print list(islice(izip(count(1), count(2), count(3)), 10))

This will print

[(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9), (8, 9, 10), (9, 10, 11), (10, 11, 12)]

even though count generate an infinite sequence of numbers

Solution 4 - Python

what about Result_* there also are generated in the loop (because i don't think it's possible to add to the csv file)

i will go like this ; generate all the data at one rotate the matrix write in the file:

A = []

A.append(range(1, 5))  # an Example of you first loop

A.append(range(5, 9))  # an Example of you second loop

data_to_write = zip(*A)

# then you can write now row by row

Solution 5 - Python

Let's assume that (1) you don't have a large memory (2) you have row headings in a list (3) all the data values are floats; if they're all integers up to 32- or 64-bits worth, that's even better.

On a 32-bit Python, storing a float in a list takes 16 bytes for the float object and 4 bytes for a pointer in the list; total 20. Storing a float in an array.array('d') takes only 8 bytes. Increasingly spectacular savings are available if all your data are int (any negatives?) that will fit in 8, 4, 2 or 1 byte(s) -- especially on a recent Python where all ints are longs.

The following pseudocode assumes floats stored in array.array('d'). In case you don't really have a memory problem, you can still use this method; I've put in comments to indicate the changes needed if you want to use a list.

# Preliminary:
import array # list: delete
hlist = []
dlist = []
for each row: 
    hlist.append(some_heading_string)
    dlist.append(array.array('d')) # list: dlist.append([])
# generate data
col_index = -1
for each column:
    col_index += 1
    for row_index in xrange(len(hlist)):
        v = calculated_data_value(row_index, colindex)
        dlist[row_index].append(v)
# write to csv file
for row_index in xrange(len(hlist)):
    row = [hlist[row_index]]
    row.extend(dlist[row_index])
    csv_writer.writerow(row)

Solution 6 - Python

Read it in by row and then transpose it in the command line. If you're using Unix, install csvtool and follow the directions in: https://unix.stackexchange.com/a/314482/186237

Solution 7 - Python

As an alternate streaming approach:

  • dump each col into a file

  • use python or unix paste command to rejoin on tab, csv, whatever.

Both steps should handle steaming just fine.

Pitfalls:

  • if you have 1000s of columns, you might run into the unix file handle limit!

Solution 8 - Python

After thinkering for a while i was able to come up with an easier way of achieving same goal. Assuming you have the code as below:

fruitList = ["Mango", "Apple", "Guava", "Grape", "Orange"]
vegList = ["Onion", "Garlic", "Shallot", "Pumpkin", "Potato"]
with open("NEWFILE.csv", "w") as csvfile:
    writer = csv.writer(csvfile)
    for value in range(len(fruitList)):
        writer.writerow([fruitList[value], vegList[value]])

Solution 9 - Python

zip will only take number of elements equal to the shortest length list. If your columns are of equal length, you need to use zip_longest

import csv
from itertools import zip_longest

data = [[1,2,3,4],[5,6]]
columns_data = zip_longest(*data)

with open("file.csv","w") as f:
    writer = csv.writer(f)
    writer.writerows(columns_data)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionHarpalView Question on Stackoverflow
Solution 1 - PythonIgnacio Vazquez-AbramsView Answer on Stackoverflow
Solution 2 - Pythonthe curious mindView Answer on Stackoverflow
Solution 3 - Pythonlazy1View Answer on Stackoverflow
Solution 4 - PythonmouadView Answer on Stackoverflow
Solution 5 - PythonJohn MachinView Answer on Stackoverflow
Solution 6 - PythonAnthony EbertView Answer on Stackoverflow
Solution 7 - PythonGregg LindView Answer on Stackoverflow
Solution 8 - PythonRubyView Answer on Stackoverflow
Solution 9 - PythonAbhi25tView Answer on Stackoverflow