How to split a dos path into its components in Python

Python

Python Problem Overview


I have a string variable which represents a dos path e.g:

var = "d:\stuff\morestuff\furtherdown\THEFILE.txt"

I want to split this string into:

[ "d", "stuff", "morestuff", "furtherdown", "THEFILE.txt" ]

I have tried using split() and replace() but they either only process the first backslash or they insert hex numbers into the string.

I need to convert this string variable into a raw string somehow so that I can parse it.

What's the best way to do this?

I should also add that the contents of var i.e. the path that I'm trying to parse, is actually the return value of a command line query. It's not path data that I generate myself. Its stored in a file, and the command line tool is not going to escape the backslashes.

Python Solutions


Solution 1 - Python

I would do

import os
path = os.path.normpath(path)
path.split(os.sep)

First normalize the path string into a proper string for the OS. Then os.sep must be safe to use as a delimiter in string function split.

Solution 2 - Python

I've been bitten loads of times by people writing their own path fiddling functions and getting it wrong. Spaces, slashes, backslashes, colons -- the possibilities for confusion are not endless, but mistakes are easily made anyway. So I'm a stickler for the use of os.path, and recommend it on that basis.

(However, the path to virtue is not the one most easily taken, and many people when finding this are tempted to take a slippery path straight to damnation. They won't realise until one day everything falls to pieces, and they -- or, more likely, somebody else -- has to work out why everything has gone wrong, and it turns out somebody made a filename that mixes slashes and backslashes -- and some person suggests that the answer is "not to do that". Don't be any of these people. Except for the one who mixed up slashes and backslashes -- you could be them if you like.)

You can get the drive and path+file like this:

drive, path_and_file = os.path.splitdrive(path)

Get the path and the file:

path, file = os.path.split(path_and_file)

Getting the individual folder names is not especially convenient, but it is the sort of honest middling discomfort that heightens the pleasure of later finding something that actually works well:

folders = []
while 1:
    path, folder = os.path.split(path)

    if folder != "":
        folders.append(folder)
    elif path != "":
        folders.append(path)

        break

folders.reverse()

(This pops a "\" at the start of folders if the path was originally absolute. You could lose a bit of code if you didn't want that.)

Solution 3 - Python

In Python >=3.4 this has become much simpler. You can now use pathlib.Path.parts to get all the parts of a path.

Example:

>>> from pathlib import Path
>>> Path('C:/path/to/file.txt').parts
('C:\\', 'path', 'to', 'file.txt')
>>> Path(r'C:\path\to\file.txt').parts
('C:\\', 'path', 'to', 'file.txt')

On a Windows install of Python 3 this will assume that you are working with Windows paths, and on *nix it will assume that you are working with posix paths. This is usually what you want, but if it isn't you can use the classes pathlib.PurePosixPath or pathlib.PureWindowsPath as needed:

>>> from pathlib import PurePosixPath, PureWindowsPath
>>> PurePosixPath('/path/to/file.txt').parts
('/', 'path', 'to', 'file.txt')
>>> PureWindowsPath(r'C:\path\to\file.txt').parts
('C:\\', 'path', 'to', 'file.txt')
>>> PureWindowsPath(r'\\host\share\path\to\file.txt').parts
('\\\\host\\share\\', 'path', 'to', 'file.txt')

Edit: There is also a backport to python 2 available: pathlib2

Solution 4 - Python

You can simply use the most Pythonic approach (IMHO):

import os

your_path = r"d:\stuff\morestuff\furtherdown\THEFILE.txt"
path_list = your_path.split(os.sep)
print path_list

Which will give you:

['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

The clue here is to use os.sep instead of '\\' or '/', as this makes it system independent.

To remove colon from the drive letter (although I don't see any reason why you would want to do that), you can write:

path_list[0] = path_list[0][0]

Solution 5 - Python

For a somewhat more concise solution, consider the following:

def split_path(p):
    a,b = os.path.split(p)
    return (split_path(a) if len(a) and len(b) else []) + [b]

Solution 6 - Python

The problem here starts with how you're creating the string in the first place.

a = "d:\stuff\morestuff\furtherdown\THEFILE.txt"

Done this way, Python is trying to special case these: \s, \m, \f, and \T. In your case, \f is being treated as a formfeed (0x0C) while the other backslashes are handled correctly. What you need to do is one of these:

b = "d:\\stuff\\morestuff\\furtherdown\\THEFILE.txt"      # doubled backslashes
c = r"d:\stuff\morestuff\furtherdown\THEFILE.txt"         # raw string, no doubling necessary

Then once you split either of these, you'll get the result you want.

Solution 7 - Python

I can't actually contribute a real answer to this one (as I came here hoping to find one myself), but to me the number of differing approaches and all the caveats mentioned is the surest indicator that Python's os.path module desperately needs this as a built-in function.

Solution 8 - Python

The stuff about about mypath.split("\\") would be better expressed as mypath.split(os.sep). sep is the path separator for your particular platform (e.g., \ for Windows, / for Unix, etc.), and the Python build knows which one to use. If you use sep, then your code will be platform agnostic.

Solution 9 - Python

The functional way, with a generator.

def split(path):
    (drive, head) = os.path.splitdrive(path)
    while (head != os.sep):
        (head, tail) = os.path.split(head)
        yield tail

In action:

>>> print([x for x in split(os.path.normpath('/path/to/filename'))])
['filename', 'to', 'path']

Solution 10 - Python

You can recursively os.path.split the string

import os
def parts(path):
    p,f = os.path.split(path)
    return parts(p) + [f] if f else [p]

Testing this against some path strings, and reassembling the path with os.path.join

>>> for path in [
...         r'd:\stuff\morestuff\furtherdown\THEFILE.txt',
...         '/path/to/file.txt',
...         'relative/path/to/file.txt',
...         r'C:\path\to\file.txt',
...         r'\\host\share\path\to\file.txt',
...     ]:
...     print parts(path), os.path.join(*parts(path))
... 
['d:\\', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt'] d:\stuff\morestuff\furtherdown\THEFILE.txt
['/', 'path', 'to', 'file.txt'] /path\to\file.txt
['', 'relative', 'path', 'to', 'file.txt'] relative\path\to\file.txt
['C:\\', 'path', 'to', 'file.txt'] C:\path\to\file.txt
['\\\\', 'host', 'share', 'path', 'to', 'file.txt'] \\host\share\path\to\file.txt

The first element of the list may need to be treated differently depending on how you want to deal with drive letters, UNC paths and absolute and relative paths. Changing the last [p] to [os.path.splitdrive(p)] forces the issue by splitting the drive letter and directory root out into a tuple.

import os
def parts(path):
    p,f = os.path.split(path)
    return parts(p) + [f] if f else [os.path.splitdrive(p)]

[('d:', '\\'), 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
[('', '/'), 'path', 'to', 'file.txt']
[('', ''), 'relative', 'path', 'to', 'file.txt']
[('C:', '\\'), 'path', 'to', 'file.txt']
[('', '\\\\'), 'host', 'share', 'path', 'to', 'file.txt']

Edit: I have realised that this answer is very similar to that given above by user1556435. I'm leaving my answer up as the handling of the drive component of the path is different.

Solution 11 - Python

really easy and simple way to do it:

var.replace('\\', '/').split('/')

Solution 12 - Python

It works for me:

>>> a=r"d:\stuff\morestuff\furtherdown\THEFILE.txt"
>>> a.split("\\")
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

Sure you might need to also strip out the colon from the first component, but keeping it makes it possible to re-assemble the path.

The r modifier marks the string literal as "raw"; notice how embedded backslashes are not doubled.

Solution 13 - Python

I use the following as since it uses the os.path.basename function it doesn't add any slashes to the returned list. It also works with any platform's slashes: i.e window's \\\\ or unix's /. And furthermore, it doesn't add the \\\\\\\\ that windows uses for server paths :)

def SplitPath( split_path ):
    pathSplit_lst	= []
	while os.path.basename(split_path):
		pathSplit_lst.append( os.path.basename(split_path) )
		split_path = os.path.dirname(split_path)
	pathSplit_lst.reverse()
	return pathSplit_lst

So for:

\\\\\\\server\\\\folder1\\\\folder2\\\\folder3\\\\folder4

You get:

['server','folder1','folder2','folder3','folder4']

Solution 14 - Python

Just like others explained - your problem stemmed from using \, which is escape character in string literal/constant. OTOH, if you had that file path string from another source (read from file, console or returned by os function) - there wouldn't have been problem splitting on '\' or r''.

And just like others suggested, if you want to use \ in program literal, you have to either duplicate it \\ or the whole literal has to be prefixed by r, like so r'lite\ral' or r"lite\ral" to avoid the parser converting that \ and r to CR (carriage return) character.

There is one more way though - just don't use backslash \ pathnames in your code! Since last century Windows recognizes and works fine with pathnames which use forward slash as directory separator /! Somehow not many people know that.. but it works:

>>> var = "d:/stuff/morestuff/furtherdown/THEFILE.txt"
>>> var.split('/')
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

This by the way will make your code work on Unix, Windows and Mac... because all of them do use / as directory separator... even if you don't want to use the predefined constants of module os.

Solution 15 - Python

Let assume you have have a file filedata.txt with content:

d:\stuff\morestuff\furtherdown\THEFILE.txt
d:\otherstuff\something\otherfile.txt

You can read and split the file paths:

>>> for i in open("filedata.txt").readlines():
...     print i.strip().split("\\")
... 
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
['d:', 'otherstuff', 'something', 'otherfile.txt']

Solution 16 - Python

re.split() can help a little more then string.split()

import re    
var = "d:\stuff\morestuff\furtherdown\THEFILE.txt"
re.split( r'[\\/]', var )
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

If you also want to support Linux and Mac paths, just add filter(None,result), so it will remove the unwanted '' from the split() since their paths starts with '/' or '//'. for example '//mount/...' or '/var/tmp/'

import re    
var = "/var/stuff/morestuff/furtherdown/THEFILE.txt"
result = re.split( r'[\\/]', var )
filter( None, result )
['var', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']





Solution 17 - Python

I'm not actually sure if this fully answers the question, but I had a fun time writing this little function that keeps a stack, sticks to os.path-based manipulations, and returns the list/stack of items.

def components(path):
    ret = []
    while len(path) > 0:
        path, crust = split(path)
        ret.insert(0, crust)
    return ret

Solution 18 - Python

Below line of code can handle:

  1. C:/path/path
  2. C://path//path
  3. C:\path\path
  4. C:\path\path

path = re.split(r'[///\]', path)

Solution 19 - Python

One recursive for the fun.

Not the most elegant answer, but should work everywhere:

import os

def split_path(path):
    head = os.path.dirname(path)
    tail = os.path.basename(path)
    if head == os.path.dirname(head):
        return [tail]
    return split_path(head) + [tail]

Solution 20 - Python

Adapted the solution of @Mike Robins avoiding empty path elements at the beginning:

def parts(path):
    p,f = os.path.split(os.path.normpath(path))
    return parts(p) + [f] if f and p else [p] if p else []

os.path.normpath() is actually required only once and could be done in a separate entry function to the recursion.

Solution 21 - Python

from os import path as os_path

and then

def split_path_iter(string, lst):
	head, tail = os_path.split(string)
	if head == '':
		return [string] + lst
	else:
		return split_path_iter(head, [tail] + lst)

def split_path(string):
    return split_path_iter(string, [])

or, inspired by the above answers (more elegant):

def split_path(string):
	head, tail = os_path.split(string)
	if head == '':
		return [string]
	else:
		return split_path(head) + [tail]

Solution 22 - Python

It is a shame! python os.path doesn't have something like os.path.splitall

anyhow, this is what works for me, credit: https://www.oreilly.com/library/view/python-cookbook/0596001673/ch04s16.html

import os

a = '/media//max/Data/'

def splitall(path):
    # https://www.oreilly.com/library/view/python-cookbook/0596001673/ch04s16.html
    allparts = []
    while 1:
        parts = os.path.split(path)
        if parts[0] == path:  # sentinel for absolute paths
            allparts.insert(0, parts[0])
            break
        elif parts[1] == path: # sentinel for relative paths
            allparts.insert(0, parts[1])
            break
        else:
            path = parts[0]
            allparts.insert(0, parts[1])
    return allparts

x = splitall(a)
print(x)

z = os.path.join(*x)
print(z)

output:

['/', 'media', 'max', 'Data', '']
/media/max/Data/

Solution 23 - Python

use ntpath.split()

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBeeBandView Question on Stackoverflow
Solution 1 - PythonTompaView Answer on Stackoverflow
Solution 2 - Pythonplease delete meView Answer on Stackoverflow
Solution 3 - PythonfreidrichenView Answer on Stackoverflow
Solution 4 - PythonMaciek D.View Answer on Stackoverflow
Solution 5 - Pythonuser1556435View Answer on Stackoverflow
Solution 6 - PythonCraig TraderView Answer on Stackoverflow
Solution 7 - PythonantredView Answer on Stackoverflow
Solution 8 - PythonChrisView Answer on Stackoverflow
Solution 9 - PythonBenoitView Answer on Stackoverflow
Solution 10 - PythonMike RobinsView Answer on Stackoverflow
Solution 11 - Pythonyuval kalanView Answer on Stackoverflow
Solution 12 - PythonunwindView Answer on Stackoverflow
Solution 13 - PythonJayView Answer on Stackoverflow
Solution 14 - PythonNas BanovView Answer on Stackoverflow
Solution 15 - Pythonzoli2kView Answer on Stackoverflow
Solution 16 - PythonAsiView Answer on Stackoverflow
Solution 17 - PythonmallyvaiView Answer on Stackoverflow
Solution 18 - PythonGour BeraView Answer on Stackoverflow
Solution 19 - PythonDuGNuView Answer on Stackoverflow
Solution 20 - PythonFrank-Rene SchäferView Answer on Stackoverflow
Solution 21 - PythonSmiley1000View Answer on Stackoverflow
Solution 22 - PythonMahmoud ElshahatView Answer on Stackoverflow
Solution 23 - Pythondeft_codeView Answer on Stackoverflow