Python split() without removing the delimiter

PythonSplitDelimiter

Python Problem Overview


This code almost does what I need it to..

for line in all_lines:
    s = line.split('>')

Except it removes all the '>' delimiters.

So,

<html><head>

Turns into

['<html','<head']

Is there a way to use the split() method but keep the delimiter, instead of removing it?

With these results..

['<html>','<head>']

Python Solutions


Solution 1 - Python

d = ">"
for line in all_lines:
    s =  [e+d for e in line.split(d) if e]

Solution 2 - Python

If you are parsing HTML with splits, you are most likely doing it wrong, except if you are writing a one-shot script aimed at a fixed and secure content file. If it is supposed to work on any HTML input, how will you handle something like <a title='growth > 8%' href='#something'>?

Anyway, the following works for me:

>>> import re
>>> re.split('(<[^>]*>)', '<body><table><tr><td>')[1::2]
['<body>', '<table>', '<tr>', '<td>']

Solution 3 - Python

How about this:

import re
s = '<html><head>'
re.findall('[^>]+>', s)

Solution 4 - Python

Just split it, then for each element in the array/list (apart from the last one) add a trailing ">" to it.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionsome1View Question on Stackoverflow
Solution 1 - PythonP.MelchView Answer on Stackoverflow
Solution 2 - Pythongb.View Answer on Stackoverflow
Solution 3 - PythonÓscar LópezView Answer on Stackoverflow
Solution 4 - PythonorangethingView Answer on Stackoverflow