How do I reliably split a string in Python, when it may not contain the pattern, or all n elements?

PythonStringPython 3.xSplit

Python Problem Overview


In Perl I can do:

my ($x, $y) = split /:/, $str;

And it will work whether or not the string contains the pattern.

In Python, however this won't work:

a, b = "foo".split(":")  # ValueError: not enough values to unpack

What's the canonical way to prevent errors in such cases?

Python Solutions


Solution 1 - Python

If you're splitting into just two parts (like in your example) you can use str.partition() to get a guaranteed argument unpacking size of 3:

>>> a, sep, b = 'foo'.partition(':')
>>> a, sep, b
('foo', '', '')

str.partition() always returns a 3-tuple, whether the separator is found or not.

Another alternative for Python 3.x is to use extended iterable unpacking:

>>> a, *b = 'foo'.split(':')
>>> a, b
('foo', [])

This assigns the first split item to a and the list of remaining items (if any) to b.

Solution 2 - Python

Since you are on Python 3, it is easy. PEP 3132 introduced a welcome simplification of the syntax when assigning to tuples - Extended iterable unpacking. In the past, if assigning to variables in a tuple, the number of items on the left of the assignment must be exactly equal to that on the right.

In Python 3 we can designate any variable on the left as a list by prefixing with an asterisk *. That will grab as many values as it can, while still populating the variables to its right (so it need not be the rightmost item). This avoids many nasty slices when we don't know the length of a tuple.

a, *b = "foo".split(":")  
print("a:", a, "b:", b)

Gives:

a: foo b: []

EDIT following comments and discussion:

In comparison to the Perl version, this is considerably different, but it is the Python (3) way. In comparison with the Perl version, re.split() would be more similar, however invoking the RE engine for splitting around a single character is an unnecessary overhead.

With multiple elements in Python:

s = 'hello:world:sailor'
a, *b = s.split(":")
print("a:", a, "b:", b)

gives:

a: hello b: ['world', 'sailor']

However in Perl:

my $s = 'hello:world:sailor';
my ($a, $b) = split /:/, $s;
print "a: $a b: $b\n";

gives:

a: hello b: world

It can be seen that additional elements are ignored, or lost, in Perl. That is fairly easy to replicate in Python if required:

s = 'hello:world:sailor'
a, *b = s.split(":")
b = b[0]
print("a:", a, "b:", b)

So, a, *b = s.split(":") equivalent in Perl would be

my ($a, @b) = split /:/, $s;

NB: we shouldn't use $a and $b in general Perl since they have a special meaning when used with sort. I have used them here for consistency with the Python example.

Python does have an extra trick up its sleeve, we can unpack to any element in the tuple on the left:

s = "one:two:three:four"
a, *b, c = s.split(':')
print("a:", a, "b:", b, "c:", c)

Gives:

a: one b: ['two', 'three'] c: four

Whereas in the Perl equivalent, the array (@b) is greedy, and the scalar $c is undef:

use strict;
use warnings;

my $s = 'one:two:three:four';
my ($a, @b, $c) = split /:/, $s;
print "a: $a b: @b c: $c\n";

Gives:

Use of uninitialized value $c in concatenation (.) or string at gash.pl line 8.
a: one b: two three four c: 

Solution 3 - Python

You are always free to catch the exception.

For example:

some_string = "foo"

try:
    a, b = some_string.split(":")
except ValueError:
    a = some_string
    b = ""

If assigning the whole original string to a and an empty string to b is the desired behaviour, I would probably use str.partition() as eugene y suggests. However, this solution gives you more control over exactly what happens when there is no separator in the string, which might be useful in some cases.

Solution 4 - Python

split will always return a list. a, b = ... will always expect list length to be two. You can use something like l = string.split(':'); a = l[0]; ....

Here is a one liner: a, b = (string.split(':') + [None]*2)[:2]

Solution 5 - Python

How about using Regular Expressions:

import re 
string = 'one:two:three:four'

in 3.X:

a, *b = re.split(':', string)

in 2.X:

a, b = re.split(':', string)[0], re.split(':', string)[1:]

This way you can also use regular expressions to split(i. e. \d)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionplanetpView Question on Stackoverflow
Solution 1 - PythonEugene YarmashView Answer on Stackoverflow
Solution 2 - PythoncdarkeView Answer on Stackoverflow
Solution 3 - PythonPhilippe AubertinView Answer on Stackoverflow
Solution 4 - PythonAaron SchifView Answer on Stackoverflow
Solution 5 - PythonCheyn ShmuelView Answer on Stackoverflow