Split by comma and strip whitespace in Python

PythonWhitespaceStrip

Python Problem Overview


I have some python code that splits on comma, but doesn't strip the whitespace:

>>> string = "blah, lots  ,  of ,  spaces, here "
>>> mylist = string.split(',')
>>> print mylist
['blah', ' lots  ', '  of ', '  spaces', ' here ']

I would rather end up with whitespace removed like this:

['blah', 'lots', 'of', 'spaces', 'here']

I am aware that I could loop through the list and strip() each item but, as this is Python, I'm guessing there's a quicker, easier and more elegant way of doing it.

Python Solutions


Solution 1 - Python

Use list comprehension -- simpler, and just as easy to read as a for loop.

my_string = "blah, lots  ,  of ,  spaces, here "
result = [x.strip() for x in my_string.split(',')]
# result is ["blah", "lots", "of", "spaces", "here"]

See: Python docs on List Comprehension
A good 2 second explanation of list comprehension.

Solution 2 - Python

I came to add:

map(str.strip, string.split(','))

but saw it had already been mentioned by Jason Orendorff in a comment.

Reading Glenn Maynard's comment on the same answer suggesting list comprehensions over map I started to wonder why. I assumed he meant for performance reasons, but of course he might have meant for stylistic reasons, or something else (Glenn?).

So a quick (possibly flawed?) test on my box (Python 2.6.5 on Ubuntu 10.04) applying the three methods in a loop revealed:

$ time ./list_comprehension.py  # [word.strip() for word in string.split(',')]
real	0m22.876s

$ time ./map_with_lambda.py     # map(lambda s: s.strip(), string.split(','))
real	0m25.736s

$ time ./map_with_str.strip.py  # map(str.strip, string.split(','))
real	0m19.428s

making map(str.strip, string.split(',')) the winner, although it seems they are all in the same ballpark.

Certainly though map (with or without a lambda) should not necessarily be ruled out for performance reasons, and for me it is at least as clear as a list comprehension.

Solution 3 - Python

Split using a regular expression. Note I made the case more general with leading spaces. The list comprehension is to remove the null strings at the front and back.

>>> import re
>>> string = "  blah, lots  ,  of ,  spaces, here "
>>> pattern = re.compile("^\s+|\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['blah', 'lots', 'of', 'spaces', 'here']

This works even if ^\s+ doesn't match:

>>> string = "foo,   bar  "
>>> print([x for x in pattern.split(string) if x])
['foo', 'bar']
>>>

Here's why you need ^\s+:

>>> pattern = re.compile("\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['  blah', 'lots', 'of', 'spaces', 'here']

See the leading spaces in blah?

Clarification: above uses the Python 3 interpreter, but results are the same in Python 2.

Solution 4 - Python

Just remove the white space from the string before you split it.

mylist = my_string.replace(' ','').split(',')

Solution 5 - Python

I know this has already been answered, but if you end doing this a lot, regular expressions may be a better way to go:

>>> import re
>>> re.sub(r'\s', '', string).split(',')
['blah', 'lots', 'of', 'spaces', 'here']

The \s matches any whitespace character, and we just replace it with an empty string ''. You can find more info here: http://docs.python.org/library/re.html#re.sub

Solution 6 - Python

map(lambda s: s.strip(), mylist) would be a little better than explicitly looping. Or for the whole thing at once: map(lambda s:s.strip(), string.split(','))

Solution 7 - Python

re (as in regular expressions) allows splitting on multiple characters at once:

$ string = "blah, lots  ,  of ,  spaces, here "
$ re.split(', ',string)
['blah', 'lots  ', ' of ', ' spaces', 'here ']

This doesn't work well for your example string, but works nicely for a comma-space separated list. For your example string, you can combine the re.split power to split on regex patterns to get a "split-on-this-or-that" effect.

$ re.split('[, ]',string)
['blah',
 '',
 'lots',
 '',
 '',
 '',
 '',
 'of',
 '',
 '',
 '',
 'spaces',
 '',
 'here',
 '']

Unfortunately, that's ugly, but a filter will do the trick:

$ filter(None, re.split('[, ]',string))
['blah', 'lots', 'of', 'spaces', 'here']

Voila!

Solution 8 - Python

import re
result=[x for x in re.split(',| ',your_string) if x!='']

this works fine for me.

Solution 9 - Python

s = 'bla, buu, jii'

sp = []
sp = s.split(',')
for st in sp:
    print st

Solution 10 - Python

import re
mylist = [x for x in re.compile('\s*[,|\s+]\s*').split(string)]

Simply, comma or at least one white spaces with/without preceding/succeeding white spaces.

Please try!

Solution 11 - Python

Instead of splitting the string first and then worrying about white space you can first deal with it and then split it

string.replace(" ", "").split(",")

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMr_ChimpView Question on Stackoverflow
Solution 1 - PythonSean VieiraView Answer on Stackoverflow
Solution 2 - PythonSeanView Answer on Stackoverflow
Solution 3 - Pythontbc0View Answer on Stackoverflow
Solution 4 - Pythonuser489041View Answer on Stackoverflow
Solution 5 - PythonBrad MontgomeryView Answer on Stackoverflow
Solution 6 - Pythonuser470379View Answer on Stackoverflow
Solution 7 - PythonDannidView Answer on Stackoverflow
Solution 8 - PythonZiengView Answer on Stackoverflow
Solution 9 - PythonParikshit PandyaView Answer on Stackoverflow
Solution 10 - PythonghchoiView Answer on Stackoverflow
Solution 11 - PythoncrazysraView Answer on Stackoverflow