How to convert string representation of list to a list?

Python Problem Overview

I was wondering what the simplest way is to convert a string representation of a list like the following to a list:

x = '[ "A","B","C" , " D"]'

Even in cases where the user puts spaces in between the commas, and spaces inside of the quotes, I need to handle that as well and convert it to:

x = ["A", "B", "C", "D"]

I know I can strip spaces with strip() and split() and check for non-letter characters. But the code was getting very kludgy. Is there a quick function that I'm not aware of?

Python Solutions

Solution 1 - Python

>>> import ast
>>> x = '[ "A","B","C" , " D"]'
>>> x = ast.literal_eval(x)
>>> x
['A', 'B', 'C', ' D']
>>> x = [n.strip() for n in x]
>>> x
['A', 'B', 'C', 'D']

ast.literal_eval:

> With ast.literal_eval you can safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, booleans, and None.

Solution 2 - Python

The json module is a better solution whenever there is a stringified list of dictionaries. The json.loads(your_data) function can be used to convert it to a list.

>>> import json
>>> x = '[ "A","B","C" , " D"]'
>>> json.loads(x)
['A', 'B', 'C', ' D']

Similarly

>>> x = '[ "A","B","C" , {"D":"E"}]'
>>> json.loads(x)
['A', 'B', 'C', {'D': 'E'}]

Solution 3 - Python

The eval is dangerous - you shouldn't execute user input.

If you have 2.6 or newer, use ast instead of eval:

>>> import ast
>>> ast.literal_eval('["A","B" ,"C" ," D"]')
["A", "B", "C", " D"]

Once you have that, strip the strings.

If you're on an older version of Python, you can get very close to what you want with a simple regular expression:

>>> x='[  "A",  " B", "C","D "]'
>>> re.findall(r'"\s*([^"]*?)\s*"', x)
['A', 'B', 'C', 'D']

This isn't as good as the ast solution, for example it doesn't correctly handle escaped quotes in strings. But it's simple, doesn't involve a dangerous eval, and might be good enough for your purpose if you're on an older Python without ast.

Solution 4 - Python

There is a quick solution:

x = eval('[ "A","B","C" , " D"]')

Unwanted whitespaces in the list elements may be removed in this way:

x = [x.strip() for x in eval('[ "A","B","C" , " D"]')]

Solution 5 - Python

Inspired from some of the answers above that work with base python packages I compared the performance of a few (using Python 3.7.3):

Method 1: ast

import ast
list(map(str.strip, ast.literal_eval(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, ast.literal_eval(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import ast', number=100000)
# 1.292875313000195

Method 2: json

import json
list(map(str.strip, json.loads(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, json.loads(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import json', number=100000)
# 0.27833264000014424

Method 3: no import

list(map(str.strip, u'[ "A","B","C" , " D"]'.strip('][').replace('"', '').split(',')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, u'[ \"A\",\"B\",\"C\" , \" D\"]'.strip('][').replace('\"', '').split(',')))", number=100000)
# 0.12935059100027502

I was disappointed to see what I considered the method with the worst readability was the method with the best performance... there are tradeoffs to consider when going with the most readable option... for the type of workloads I use python for I usually value readability over a slightly more performant option, but as usual it depends.

Solution 6 - Python

import ast
l = ast.literal_eval('[ "A","B","C" , " D"]')
l = [i.strip() for i in l]

Solution 7 - Python

If it's only a one dimensional list, this can be done without importing anything:

>>> x = u'[ "A","B","C" , " D"]'
>>> ls = x.strip('[]').replace('"', '').replace(' ', '').split(',')
>>> ls
['A', 'B', 'C', 'D']

Solution 8 - Python

This u can do,

x = '[ "A","B","C" , " D"]'
print(list(eval(x)))

** best one is the accepted answer

Though this is not a safe way, the best answer is the accepted one. wasn't aware of the eval danger when answer was posted.

Solution 9 - Python

Assuming that all your inputs are lists and that the double quotes in the input actually don't matter, this can be done with a simple regexp replace. It is a bit perl-y but works like a charm. Note also that the output is now a list of unicode strings, you didn't specify that you needed that, but it seems to make sense given unicode input.

import re
x = u'[ "A","B","C" , " D"]'
junkers = re.compile('[[" \]]')
result = junkers.sub('', x).split(',')
print result
--->  [u'A', u'B', u'C', u'D']

The junkers variable contains a compiled regexp (for speed) of all characters we don't want, using ] as a character required some backslash trickery. The re.sub replaces all these characters with nothing, and we split the resulting string at the commas.

Note that this also removes spaces from inside entries u'["oh no"]' ---> [u'ohno']. If this is not what you wanted, the regexp needs to be souped up a bit.

Solution 10 - Python

If you know that your lists only contain quoted strings, this pyparsing example will give you your list of stripped strings (even preserving the original Unicode-ness).

>>> from pyparsing import *
>>> x =u'[ "A","B","C" , " D"]'
>>> LBR,RBR = map(Suppress,"[]")
>>> qs = quotedString.setParseAction(removeQuotes, lambda t: t[0].strip())
>>> qsList = LBR + delimitedList(qs) + RBR
>>> print qsList.parseString(x).asList()
[u'A', u'B', u'C', u'D']

If your lists can have more datatypes, or even contain lists within lists, then you will need a more complete grammar - like this one on the pyparsing wiki, which will handle tuples, lists, ints, floats, and quoted strings. Will work with Python versions back to 2.4.

Solution 11 - Python

No need to import anything and no need evaluate. You can do this in one line for most basic use cases, including the one given in original question.

One liner

l_x = [i.strip() for i in x[1:-1].replace('"',"").split(',')]

Explanation

x = '[ "A","B","C" , " D"]'
# str indexing to eliminate the brackets
# replace as split will otherwise retain the quotes in returned list
# split to conv to list
l_x = x[1:-1].replace('"',"").split(',')

Outputs:

for i in range(0, len(l_x)):
    print(l_x[i])
# vvvv output vvvvv
'''
 A
B
C 
  D
'''
print(type(l_x)) # out: class 'list'
print(len(l_x)) # out: 4

You can parse and clean up this list as needed using list comprehension.

l_x = [i.strip() for i in l_x] # list comprehension to clean up
for i in range(0, len(l_x)):
    print(l_x[i])
# vvvvv output vvvvv
'''
A
B
C
D
'''

Nested lists

If you have nested lists, it does get a bit more annoying. Without using regex (which would simplify the replace), and assuming you want to return a flattened list (and the zen of python says flat is better than nested):

x = '[ "A","B","C" , " D", ["E","F","G"]]'
l_x = x[1:-1].split(',')
l_x = [i    .replace(']', '')    .replace('[', '')    .replace('"', '')    .strip() for i in l_x]
# returns ['A', 'B', 'C', 'D', 'E', 'F', 'G']

If you need to retain the nested list it gets a bit uglier, but can still be done just with re and list comprehension:

import re
x = '[ "A","B","C" , " D", "["E","F","G"]","Z", "Y", "["H","I","J"]", "K", "L"]'
# clean it up so regex is simpler
x = x.replace('"', '').replace(' ', '') 
# look ahead for the bracketed text that signifies nested list
l_x = re.split(r',(?=\[[A-Za-z0-9\',]+\])|(?<=\]),', x[1:-1])
print(l_x)
# flatten and split the non nested list items
l_x0 = [item for items in l_x for item in items.split(',') if not '[' in items]
# convert the nested lists to lists
l_x1 = [
    i[1:-1].split(',') for i in l_x if '[' in i 
]
# add the two lists 
l_x = l_x0 + l_x1

This last solution will work on any list stored as a string, nested or not.

Solution 12 - Python

To further complete @Ryan 's answer using json, one very convenient function to convert unicode is the one posted here: https://stackoverflow.com/a/13105359/7599285

ex with double or single quotes:

>print byteify(json.loads(u'[ "A","B","C" , " D"]')
>print byteify(json.loads(u"[ 'A','B','C' , ' D']".replace('\'','"')))
['A', 'B', 'C', ' D']
['A', 'B', 'C', ' D']

Solution 13 - Python

This usually happens when you load list stored as string to CSV

If you have your list stored in CSV in form like OP asked:

x = '[ "A","B","C" , " D"]'

Here is how you can load it back to list:

import csv
with open('YourCSVFile.csv') as csv_file:
    reader = csv.reader(csv_file, delimiter=',')
    rows = list(reader)

listItems = rows[0]

listItems is now list

Solution 14 - Python

You may run into such problem while dealing with scraped data stored as Pandas DataFrame.

This solution works like charm if the list of values is present as text.

def textToList(hashtags):
    return hashtags.strip('[]').replace('\'', '').replace(' ', '').split(',')

hashtags = "[ 'A','B','C' , ' D']"
hashtags = textToList(hashtags)

Output: ['A', 'B', 'C', 'D']

> No external library required.

Solution 15 - Python

I would like to provide a more intuitive patterning solution with regex. The below function takes as input a stringified list containing arbitrary strings.

Stepwise explanation: You remove all whitespacing,bracketing and value_separators (provided they are not part of the values you want to extract, else make the regex more complex). Then you split the cleaned string on single or double quotes and take the non-empty values (or odd indexed values, whatever the preference).

def parse_strlist(sl):
import re
clean = re.sub("[\[\],\s]","",sl)
splitted = re.split("[\'\"]",clean)
values_only = [s for s in splitted if s != '']
return values_only

testsample: "['21',"foo" '6', '0', " A"]"

Solution 16 - Python

So, following all the answers I decided to time the most common methods:

from time import time
import re
import json

 
my_str = str(list(range(19)))
print(my_str)

reps = 100000

start = time()
for i in range(0, reps):
    re.findall("\w+", my_str)
print("Regex method:\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    json.loads(my_str)
print("json method:\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    ast.literal_eval(my_str)
print("ast method:\t\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    [n.strip() for n in my_str]
print("strip method:\t", (time() - start) / reps)

     
     
    regex method:	 6.391477584838867e-07
    json method:	 2.535374164581299e-06
    ast method:		 2.4425282478332518e-05
    strip method:	 4.983267784118653e-06

So in the end regex wins!

Solution 17 - Python

you can save yourself the .strip() fcn by just slicing off the first and last characters from the string representation of the list (see third line below)

>>> mylist=[1,2,3,4,5,'baloney','alfalfa']
>>> strlist=str(mylist)
['1', ' 2', ' 3', ' 4', ' 5', " 'baloney'", " 'alfalfa'"]
>>> mylistfromstring=(strlist[1:-1].split(', '))
>>> mylistfromstring[3]
'4'
>>> for entry in mylistfromstring:
...     print(entry)
...     type(entry)
... 
1
<class 'str'>
2
<class 'str'>
3
<class 'str'>
4
<class 'str'>
5
<class 'str'>
'baloney'
<class 'str'>
'alfalfa'
<class 'str'>

Solution 18 - Python

and with pure python - not importing any libraries

[x for x in  x.split('[')[1].split(']')[0].split('"')[1:-1] if x not in[',',' , ',', ']]

Solution 19 - Python

This solution is simpler than some I read above but requires to match all features of the list

x = '[ "A","B","C" , " D"]'
[i.strip() for i in x.split('"') if len(i.strip().strip(',').strip(']').strip('['))>0]

>> ['A', 'B', 'C', 'D']

Content Type	Original Author	Original Content on Stackoverflow
Question	harijay	View Question on Stackoverflow
Solution 1 - Python	Roger Pate	View Answer on Stackoverflow
Solution 2 - Python	Ryan	View Answer on Stackoverflow
Solution 3 - Python	Mark Byers	View Answer on Stackoverflow
Solution 4 - Python	Alexei Sholik	View Answer on Stackoverflow
Solution 5 - Python	kinzleb	View Answer on Stackoverflow
Solution 6 - Python	tosh	View Answer on Stackoverflow
Solution 7 - Python	ruohola	View Answer on Stackoverflow
Solution 8 - Python	Tomato Master	View Answer on Stackoverflow
Solution 9 - Python	dirkjot	View Answer on Stackoverflow
Solution 10 - Python	PaulMcG	View Answer on Stackoverflow
Solution 11 - Python	born_naked	View Answer on Stackoverflow
Solution 12 - Python	CptHwK	View Answer on Stackoverflow
Solution 13 - Python	Hrvoje	View Answer on Stackoverflow
Solution 14 - Python	dobydx	View Answer on Stackoverflow
Solution 15 - Python	Jordy Van Landeghem	View Answer on Stackoverflow
Solution 16 - Python	passs	View Answer on Stackoverflow
Solution 17 - Python	JCMontalbano	View Answer on Stackoverflow
Solution 18 - Python	Ioannis Nasios	View Answer on Stackoverflow
Solution 19 - Python	CassAndr	View Answer on Stackoverflow