Python import csv to list

PythonCsv

Python Problem Overview


I have a CSV file with about 2000 records.

Each record has a string, and a category to it:

This is the first line,Line1
This is the second line,Line2
This is the third line,Line3

I need to read this file into a list that looks like this:

data = [('This is the first line', 'Line1'),
        ('This is the second line', 'Line2'),
        ('This is the third line', 'Line3')]

How can import this CSV to the list I need using Python?

Python Solutions


Solution 1 - Python

Using the csv module:

import csv

with open('file.csv', newline='') as f:
    reader = csv.reader(f)
    data = list(reader)

print(data)

Output:

[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]

If you need tuples:

import csv

with open('file.csv', newline='') as f:
    reader = csv.reader(f)
    data = [tuple(row) for row in reader]

print(data)

Output:

[('This is the first line', 'Line1'), ('This is the second line', 'Line2'), ('This is the third line', 'Line3')]

Old Python 2 answer, also using the csv module:

import csv
with open('file.csv', 'rb') as f:
    reader = csv.reader(f)
    your_list = list(reader)

print your_list
# [['This is the first line', 'Line1'],
#  ['This is the second line', 'Line2'],
#  ['This is the third line', 'Line3']]

Solution 2 - Python

Updated for Python 3:

import csv

with open('file.csv', newline='') as f:
    reader = csv.reader(f)
    your_list = list(reader)

print(your_list)

Output:

[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]

Solution 3 - Python

Pandas is pretty good at dealing with data. Here is one example how to use it:

import pandas as pd

# Read the CSV into a pandas data frame (df)
#   With a df you can do many things
#   most important: visualize data with Seaborn
df = pd.read_csv('filename.csv', delimiter=',')

# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]

# or export it as a list of dicts
dicts = df.to_dict().values()

One big advantage is that pandas deals automatically with header rows.

If you haven't heard of Seaborn, I recommend having a look at it.

See also: How do I read and write CSV files with Python?

Pandas #2

import pandas as pd

# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()

# Convert
dicts = df.to_dict('records')

The content of df is:

     country   population population_time    EUR
0    Germany   82521653.0      2016-12-01   True
1     France   66991000.0      2017-01-01   True
2  Indonesia  255461700.0      2017-01-01  False
3    Ireland    4761865.0             NaT   True
4      Spain   46549045.0      2017-06-01   True
5    Vatican          NaN             NaT   True

The content of dicts is

[{'country': 'Germany', 'population': 82521653.0, 'population_time': Timestamp('2016-12-01 00:00:00'), 'EUR': True},
 {'country': 'France', 'population': 66991000.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': True},
 {'country': 'Indonesia', 'population': 255461700.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': False},
 {'country': 'Ireland', 'population': 4761865.0, 'population_time': NaT, 'EUR': True},
 {'country': 'Spain', 'population': 46549045.0, 'population_time': Timestamp('2017-06-01 00:00:00'), 'EUR': True},
 {'country': 'Vatican', 'population': nan, 'population_time': NaT, 'EUR': True}]

Pandas #3

import pandas as pd

# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()

# Convert
lists = [[row[col] for col in df.columns] for row in df.to_dict('records')]

The content of lists is:

[['Germany', 82521653.0, Timestamp('2016-12-01 00:00:00'), True],
 ['France', 66991000.0, Timestamp('2017-01-01 00:00:00'), True],
 ['Indonesia', 255461700.0, Timestamp('2017-01-01 00:00:00'), False],
 ['Ireland', 4761865.0, NaT, True],
 ['Spain', 46549045.0, Timestamp('2017-06-01 00:00:00'), True],
 ['Vatican', nan, NaT, True]]

Solution 4 - Python

Update for Python3:
import csv
from pprint import pprint

with open('text.csv', newline='') as file:
    reader = csv.reader(file)
    res = list(map(tuple, reader))

pprint(res)

Output:

[('This is the first line', ' Line1'), ('This is the second line', ' Line2'), ('This is the third line', ' Line3')]

If csvfile is a file object, it should be opened with newline=''.
csv module

Solution 5 - Python

If you are sure there are no commas in your input, other than to separate the category, you can read the file line by line and split on ,, then push the result to List

That said, it looks like you are looking at a CSV file, so you might consider using the modules for it

Solution 6 - Python

result = []
for line in text.splitlines():
    result.append(tuple(line.split(",")))

Solution 7 - Python

You can use the list() function to convert csv reader object to list

import csv

with open('input.csv', newline='') as csv_file:
    reader = csv.reader(csv_file, delimiter=',')
    rows = list(reader)
    print(rows)

Solution 8 - Python

A simple loop would suffice:

lines = []
with open('test.txt', 'r') as f:
    for line in f.readlines():
        l,name = line.strip().split(',')
        lines.append((l,name))

print lines

Solution 9 - Python

As said already in the comments you can use the csv library in python. csv means comma separated values which seems exactly your case: a label and a value separated by a comma.

Being a category and value type I would rather use a dictionary type instead of a list of tuples.

Anyway in the code below I show both ways: d is the dictionary and l is the list of tuples.

import csv

file_name = "test.txt"
try:
    csvfile = open(file_name, 'rt')
except:
    print("File not found")
csvReader = csv.reader(csvfile, delimiter=",")
d = dict()
l =  list()
for row in csvReader:
    d[row[1]] = row[0]
    l.append((row[0], row[1]))
print(d)
print(l)

Solution 10 - Python

Unfortunately I find none of the existing answers particularly satisfying.

Here is a straightforward and complete Python 3 solution, using the csv module.

import csv

with open('../resources/temp_in.csv', newline='') as f:
    reader = csv.reader(f, skipinitialspace=True)
    rows = list(reader)

print(rows)

Notice the skipinitialspace=True argument. This is necessary since, unfortunately, OP's CSV contains whitespace after each comma.

Output:

[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]

Solution 11 - Python

Extending your requirements a bit and assuming you do not care about the order of lines and want to get them grouped under categories, the following solution may work for you:

>>> fname = "lines.txt"
>>> from collections import defaultdict
>>> dct = defaultdict(list)
>>> with open(fname) as f:
...     for line in f:
...         text, cat = line.rstrip("\n").split(",", 1)
...         dct[cat].append(text)
...
>>> dct
defaultdict(<type 'list'>, {' CatA': ['This is the first line', 'This is the another line'], ' CatC': ['This is the third line'], ' CatB': ['This is the second line', 'This is the last line']})

This way you get all relevant lines available in the dictionary under key being the category.

Solution 12 - Python

Here is the easiest way in Python 3.x to import a CSV to a multidimensional array, and its only 4 lines of code without importing anything!

#pull a CSV into a multidimensional array in 4 lines!

L=[]                            #Create an empty list for the main array
for line in open('log.txt'):    #Open the file and read all the lines
    x=line.rstrip()             #Strip the \n from each line
    L.append(x.split(','))      #Split each line into a list and add it to the
                                #Multidimensional array
print(L)

Solution 13 - Python

Next is a piece of code which uses csv module but extracts file.csv contents to a list of dicts using the first line which is a header of csv table

import csv
def csv2dicts(filename):
  with open(filename, 'rb') as f:
    reader = csv.reader(f)
    lines = list(reader)
    if len(lines) < 2: return None
    names = lines[0]
    if len(names) < 1: return None
    dicts = []
    for values in lines[1:]:
      if len(values) != len(names): return None
      d = {}
      for i,_ in enumerate(names):
        d[names[i]] = values[i]
      dicts.append(d)
    return dicts
  return None

if __name__ == '__main__':
  your_list = csv2dicts('file.csv')
  print your_list

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMorganTNView Question on Stackoverflow
Solution 1 - PythonMaciej GolView Answer on Stackoverflow
Solution 2 - PythonseokhoonleeView Answer on Stackoverflow
Solution 3 - PythonMartin ThomaView Answer on Stackoverflow
Solution 4 - PythonAbstProcDoView Answer on Stackoverflow
Solution 5 - PythonMiquelView Answer on Stackoverflow
Solution 6 - PythonAcid_SnakeView Answer on Stackoverflow
Solution 7 - PythonKiddoView Answer on Stackoverflow
Solution 8 - PythonHunter McMillenView Answer on Stackoverflow
Solution 9 - PythonFrancesco BoiView Answer on Stackoverflow
Solution 10 - PythonAMCView Answer on Stackoverflow
Solution 11 - PythonJan VlcinskyView Answer on Stackoverflow
Solution 12 - PythonJason BoucherView Answer on Stackoverflow
Solution 13 - PythonAlexey AntonenkoView Answer on Stackoverflow