Split text after the second occurrence of character

PythonPython 3.xStringSplit

Python Problem Overview


I need to split text before the second occurrence of the '-' character. What I have now is producing inconsistent results. I've tried various combinations of rsplit and read through and tried other solutions on SO, with no results.

Sample file name to split: 'some-sample-filename-to-split' returned in data.filename. In this case, I would only like to have 'some-sample' returned.

fname, extname = os.path.splitext(data.filename)
file_label = fname.rsplit('/',1)[-1]
file_label2 = file_label.rsplit('-',maxsplit=3)
print(file_label2,'\n','---------------','\n')

Python Solutions


Solution 1 - Python

You can do something like this:

>>> a = "some-sample-filename-to-split"
>>> "-".join(a.split("-", 2)[:2])
'some-sample'

a.split("-", 2) will split the string upto the second occurrence of -.

a.split("-", 2)[:2] will give the first 2 elements in the list. Then simply join the first 2 elements.

OR

You could use regular expression : ^([\w]+-[\w]+)

>>> import re
>>> reg = r'^([\w]+-[\w]+)'
>>> re.match(reg, a).group()
'some-sample'

EDIT: As discussed in the comments, here is what you need:

def hyphen_split(a):
    if a.count("-") == 1:
        return a.split("-")[0]
    return "-".join(a.split("-", 2)[:2])

>>> hyphen_split("some-sample-filename-to-split")
'some-sample'
>>> hyphen_split("some-sample")
'some'

Solution 2 - Python

A generic form to split a string into halves on the nth occurence of the separator would be:

def split(strng, sep, pos):
    strng = strng.split(sep)
    return sep.join(strng[:pos]), sep.join(strng[pos:])

If pos is negative it will count the occurrences from the end of string.

>>> strng = 'some-sample-filename-to-split'
>>> split(strng, '-', 3)
('some-sample-filename', 'to-split')
>>> split(strng, '-', -4)
('some', 'sample-filename-to-split')
>>> split(strng, '-', 1000)
('some-sample-filename-to-split', '')
>>> split(strng, '-', -1000)
('', 'some-sample-filename-to-split')

Solution 3 - Python

You can use str.index():

def hyphen_split(s):
    pos = s.index('-')
    try:
        return s[:s.index('-', pos + 1)]
    except ValueError:
        return s[:pos]

test:

>>> hyphen_split("some-sample-filename-to-split")
'some-sample'
>>> hyphen_split("some-sample")
'some'

Solution 4 - Python

You could use regular expressions:

import re

file_label = re.search('(.*?-.*?)-', fname).group(1)

Solution 5 - Python

When proceeding with the dataframe and the split needed for the entire column values, lambda function is better than regex.

df['column_name'].apply(lambda x: "-".join(x.split('-',2)[:2]))

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDBSView Question on Stackoverflow
Solution 1 - PythonJRodDynamiteView Answer on Stackoverflow
Solution 2 - PythonNuno AndréView Answer on Stackoverflow
Solution 3 - PythonMike MüllerView Answer on Stackoverflow
Solution 4 - PythonChristianView Answer on Stackoverflow
Solution 5 - PythonShashank Singh YadavView Answer on Stackoverflow