How would I get everything before a : in a string Python

PythonRegexStringSplit

Python Problem Overview


I am looking for a way to get all of the letters in a string before a : but I have no idea on where to start. Would I use regex? If so how?

string = "Username: How are you today?"

Can someone show me a example on what I could do?

Python Solutions


Solution 1 - Python

Just use the split function. It returns a list, so you can keep the first element:

>>> s1.split(':')
['Username', ' How are you today?']
>>> s1.split(':')[0]
'Username'

Solution 2 - Python

Using index:

>>> string = "Username: How are you today?"
>>> string[:string.index(":")]
'Username'

The index will give you the position of : in string, then you can slice it.

If you want to use regex:

>>> import re
>>> re.match("(.*?):",string).group()
'Username'                       

match matches from the start of the string.

you can also use itertools.takewhile

>>> import itertools
>>> "".join(itertools.takewhile(lambda x: x!=":", string))
'Username'

Solution 3 - Python

You don't need regex for this

>>> s = "Username: How are you today?"

You can use the split method to split the string on the ':' character

>>> s.split(':')
['Username', ' How are you today?']

And slice out element [0] to get the first part of the string

>>> s.split(':')[0]
'Username'

Solution 4 - Python

I have benchmarked these various technics under Python 3.7.0 (IPython).

TLDR
  • fastest (when the split symbol c is known): pre-compiled regex.
  • fastest (otherwise): s.partition(c)[0].
  • safe (i.e., when c may not be in s): partition, split.
  • unsafe: index, regex.
Code
import string, random, re

SYMBOLS = string.ascii_uppercase + string.digits
SIZE = 100

def create_test_set(string_length):
    for _ in range(SIZE):
        random_string = ''.join(random.choices(SYMBOLS, k=string_length))
        yield (random.choice(random_string), random_string)

for string_length in (2**4, 2**8, 2**16, 2**32):
    print("\nString length:", string_length)
    print("  regex (compiled):", end=" ")
    test_set_for_regex = ((re.compile("(.*?)" + c).match, s) for (c, s) in test_set)
    %timeit [re_match(s).group() for (re_match, s) in test_set_for_regex]
    test_set = list(create_test_set(16))
    print("  partition:       ", end=" ")
    %timeit [s.partition(c)[0] for (c, s) in test_set]
    print("  index:           ", end=" ")
    %timeit [s[:s.index(c)] for (c, s) in test_set]
    print("  split (limited): ", end=" ")
    %timeit [s.split(c, 1)[0] for (c, s) in test_set]
    print("  split:           ", end=" ")
    %timeit [s.split(c)[0] for (c, s) in test_set]
    print("  regex:           ", end=" ")
    %timeit [re.match("(.*?)" + c, s).group() for (c, s) in test_set]
Results
String length: 16
  regex (compiled): 156 ns ± 4.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
  partition:        19.3 µs ± 430 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
  index:            26.1 µs ± 341 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split (limited):  26.8 µs ± 1.26 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split:            26.3 µs ± 835 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  regex:            128 µs ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

String length: 256
  regex (compiled): 167 ns ± 2.7 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
  partition:        20.9 µs ± 694 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  index:            28.6 µs ± 2.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split (limited):  27.4 µs ± 979 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split:            31.5 µs ± 4.86 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  regex:            148 µs ± 7.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

String length: 65536
  regex (compiled): 173 ns ± 3.95 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
  partition:        20.9 µs ± 613 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
  index:            27.7 µs ± 515 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split (limited):  27.2 µs ± 796 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split:            26.5 µs ± 377 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  regex:            128 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

String length: 4294967296
  regex (compiled): 165 ns ± 1.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
  partition:        19.9 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
  index:            27.7 µs ± 571 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split (limited):  26.1 µs ± 472 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split:            28.1 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  regex:            137 µs ± 6.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Solution 5 - Python

partition() may be better then split() for this purpose as it has the better predicable results for situations you have no delimiter or more delimiters.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Question0CoolView Question on Stackoverflow
Solution 1 - PythonfredtantiniView Answer on Stackoverflow
Solution 2 - PythonHackaholicView Answer on Stackoverflow
Solution 3 - PythonCory KramerView Answer on Stackoverflow
Solution 4 - PythonAristideView Answer on Stackoverflow
Solution 5 - PythonMarv-CZView Answer on Stackoverflow