Extract file name from path, no matter what the os/path format
PythonPython Problem Overview
Which Python library can I use to extract filenames from paths, no matter what the operating system or path format could be?
For example, I'd like all of these paths to return me c
:
a/b/c/
a/b/c
\a\b\c
\a\b\c\
a\b\c
a/b/../../a/b/c/
a/b/../../a/b/c
Python Solutions
Solution 1 - Python
Actually, there's a function that returns exactly what you want
import os
print(os.path.basename(your_path))
WARNING: When os.path.basename()
is used on a POSIX system to get the base name from a Windows styled path (e.g. "C:\\my\\file.txt"
), the entire path will be returned.
Example below from interactive python shell running on a Linux host:
Python 3.8.2 (default, Mar 13 2020, 10:14:16)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> filepath = "C:\\my\\path\\to\\file.txt" # A Windows style file path.
>>> os.path.basename(filepath)
'C:\\my\\path\\to\\file.txt'
Solution 2 - Python
Using os.path.split
or os.path.basename
as others suggest won't work in all cases: if you're running the script on Linux and attempt to process a classic windows-style path, it will fail.
Windows paths can use either backslash or forward slash as path separator. Therefore, the ntpath
module (which is equivalent to os.path when running on windows) will work for all(1) paths on all platforms.
import ntpath
ntpath.basename("a/b/c")
Of course, if the file ends with a slash, the basename will be empty, so make your own function to deal with it:
def path_leaf(path):
head, tail = ntpath.split(path)
return tail or ntpath.basename(head)
Verification:
>>> paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c', ... 'a/b/../../a/b/c/', 'a/b/../../a/b/c']
>>> [path_leaf(path) for path in paths]
['c', 'c', 'c', 'c', 'c', 'c', 'c']
(1) There's one caveat: Linux filenames may contain backslashes. So on linux, r'a/b\c'
always refers to the file b\c
in the a
folder, while on Windows, it always refers to the c
file in the b
subfolder of the a
folder. So when both forward and backward slashes are used in a path, you need to know the associated platform to be able to interpret it correctly. In practice it's usually safe to assume it's a windows path since backslashes are seldom used in Linux filenames, but keep this in mind when you code so you don't create accidental security holes.
Solution 3 - Python
os.path.split is the function you are looking for
head, tail = os.path.split("/tmp/d/a.dat")
>>> print(tail)
a.dat
>>> print(head)
/tmp/d
Solution 4 - Python
In python 3.4 or later, with pathlib.Path
:
>>> from pathlib import Path
>>> Path("/tmp/d/a.dat").name
'a.dat'
The .name
property will give the full name of the final child element in the path, regardless of whether it is a file or a folder.
Solution 5 - Python
import os
head, tail = os.path.split('path/to/file.exe')
tail is what you want, the filename.
See python os module docs for detail
Solution 6 - Python
import os
file_location = '/srv/volume1/data/eds/eds_report.csv'
file_name = os.path.basename(file_location ) #eds_report.csv
location = os.path.dirname(file_location ) #/srv/volume1/data/eds
Solution 7 - Python
If you want to get the filename automatically you can do
import glob
for f in glob.glob('/your/path/*'):
print(os.path.split(f)[-1])
Solution 8 - Python
My personal favourite is:
filename = fullname.split(os.sep)[-1]
Solution 9 - Python
fname = str("C:\Windows\paint.exe").split('\\')[-1:][0]
this will return : paint.exe
> change the sep value of the split function regarding your path or OS.
Solution 10 - Python
In your example you will also need to strip slash from right the right side to return c
:
>>> import os
>>> path = 'a/b/c/'
>>> path = path.rstrip(os.sep) # strip the slash from the right side
>>> os.path.basename(path)
'c'
Second level:
>>> os.path.filename(os.path.dirname(path))
'b'
update: I think lazyr
has provided the right answer. My code will not work with windows-like paths on unix systems and vice versus with unix-like paths on windows system.
Solution 11 - Python
This is working for linux and windows as well with standard library
paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c', 'a/b/../../a/b/c/', 'a/b/../../a/b/c']
def path_leaf(path):
return path.strip('/').strip('\\').split('/')[-1].split('\\')[-1]
[path_leaf(path) for path in paths]
Results:
['c', 'c', 'c', 'c', 'c', 'c', 'c']
Solution 12 - Python
If your file path not ended with "/" and directories separated by "/" then use the following code. As we know generally path doesn't end with "/".
import os
path_str = "/var/www/index.html"
print(os.path.basename(path_str))
But in some cases like URLs end with "/" then use the following code
import os
path_str = "/home/some_str/last_str/"
split_path = path_str.rsplit("/",1)
print(os.path.basename(split_path[0]))
but when your path sperated by "" which you generally find in windows paths then you can use the following codes
import os
path_str = "c:\\var\www\index.html"
print(os.path.basename(path_str))
import os
path_str = "c:\\home\some_str\last_str\\"
split_path = path_str.rsplit("\\",1)
print(os.path.basename(split_path[0]))
You can combine both into one function by check OS type and return the result.
Solution 13 - Python
Here's a regex-only solution, which seems to work with any OS path on any OS.
No other module is needed, and no preprocessing is needed either :
import re
def extract_basename(path):
"""Extracts basename of a given path. Should Work with any OS Path on any OS"""
basename = re.search(r'[^\\/]+(?=[\\/]?$)', path)
if basename:
return basename.group(0)
paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c',
'a/b/../../a/b/c/', 'a/b/../../a/b/c']
print([extract_basename(path) for path in paths])
# ['c', 'c', 'c', 'c', 'c', 'c', 'c']
extra_paths = ['C:\\', 'alone', '/a/space in filename', 'C:\\multi\nline']
print([extract_basename(path) for path in extra_paths])
# ['C:', 'alone', 'space in filename', 'multi\nline']
Update:
If you only want a potential filename, if present (i.e., /a/b/
is a dir and so is c:\windows\
), change the regex to: r'[^\\/]+(?![\\/])$'
. For the "regex challenged," this changes the positive forward lookahead for some sort of slash to a negative forward lookahead, causing pathnames that end with said slash to return nothing instead of the last sub-directory in the pathname. Of course there is no guarantee that the potential filename actually refers to a file and for that os.path.is_dir()
or os.path.is_file()
would need to be employed.
This will match as follows:
/a/b/c/ # nothing, pathname ends with the dir 'c'
c:\windows\ # nothing, pathname ends with the dir 'windows'
c:hello.txt # matches potential filename 'hello.txt'
~it_s_me/.bashrc # matches potential filename '.bashrc'
c:\windows\system32 # matches potential filename 'system32', except
# that is obviously a dir. os.path.is_dir()
# should be used to tell us for sure
The regex can be tested here.
Solution 14 - Python
It’s work!
os.path.basename(name)
But you can’t get file name in Linux with Windows file path. Windows too. os.path load different module on different operator system :
- Linux - posixpath
- Windows - npath
So you can use os.path get correct result always
Solution 15 - Python
Maybe just my all in one solution without important some new(regard the tempfile for creating temporary files :D )
import tempfile
abc = tempfile.NamedTemporaryFile(dir='/tmp/')
abc.name
abc.name.replace("/", " ").split()[-1]
Getting the values of abc.name
will be a string like this: '/tmp/tmpks5oksk7'
So I can replace the /
with a space .replace("/", " ")
and then call split()
. That will return a list and I get the
last element of the list with [-1]
No need to get any module imported.
Solution 16 - Python
If you have a number of files in a directory and want to store those file names into a list. Use the below code.
import os as os
import glob as glob
path = 'mypath'
file_list= []
for file in glob.glob(path):
data_file_list = os.path.basename(file)
file_list.append(data_file_list)
Solution 17 - Python
I have never seen double-backslashed paths, are they existing? The built-in feature of python module os
fails for those. All others work, also the caveat given by you with os.path.normpath()
:
paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c',
... 'a/b/../../a/b/c/', 'a/b/../../a/b/c', 'a/./b/c', 'a\b/c']
for path in paths:
os.path.basename(os.path.normpath(path))
Solution 18 - Python
File name with extension
filepath = './dir/subdir/filename.ext'
basename = os.path.basename(filepath)
print(basename)
# filename.ext
print(type(basename))
# <class 'str'>
File name without extension
basename_without_ext = os.path.splitext(os.path.basename(filepath))[0]
print(basename_without_ext)
# filename
Solution 19 - Python
The Windows separator can be in a Unix filename or Windows Path. The Unix separator can only exist in the Unix path. The presence of a Unix separator indicates a non-Windows path.
The following will strip (cut trailing separator) by the OS specific separator, then split and return the rightmost value. It's ugly, but simple based on the assumption above. If the assumption is incorrect, please update and I will update this response to match the more accurate conditions.
a.rstrip("\\\\" if a.count("/") == 0 else '/').split("\\\\" if a.count("/") == 0 else '/')[-1]
sample code:
b = ['a/b/c/','a/b/c','\\a\\b\\c','\\a\\b\\c\\','a\\b\\c','a/b/../../a/b/c/','a/b/../../a/b/c']
for a in b:
print (a, a.rstrip("\\" if a.count("/") == 0 else '/').split("\\" if a.count("/") == 0 else '/')[-1])
Solution 20 - Python
For completeness sake, here is the pathlib
solution for python 3.2+:
>>> from pathlib import PureWindowsPath
>>> paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c',
... 'a/b/../../a/b/c/', 'a/b/../../a/b/c']
>>> [PureWindowsPath(path).name for path in paths]
['c', 'c', 'c', 'c', 'c', 'c', 'c']
This works on both Windows and Linux.
Solution 21 - Python
In both Python 2 and 3, using the module pathlib2:
import posixpath # to generate unix paths
from pathlib2 import PurePath, PureWindowsPath, PurePosixPath
def path2unix(path, nojoin=True, fromwinpath=False):
"""From a path given in any format, converts to posix path format
fromwinpath=True forces the input path to be recognized as a Windows path (useful on Unix machines to unit test Windows paths)"""
if not path:
return path
if fromwinpath:
pathparts = list(PureWindowsPath(path).parts)
else:
pathparts = list(PurePath(path).parts)
if nojoin:
return pathparts
else:
return posixpath.join(*pathparts)
Usage:
In [9]: path2unix('lala/lolo/haha.dat')
Out[9]: ['lala', 'lolo', 'haha.dat']
In [10]: path2unix(r'C:\lala/lolo/haha.dat')
Out[10]: ['C:\\', 'lala', 'lolo', 'haha.dat']
In [11]: path2unix(r'C:\lala/lolo/haha.dat') # works even with malformatted cases mixing both Windows and Linux path separators
Out[11]: ['C:\\', 'lala', 'lolo', 'haha.dat']
With your testcase:
In [12]: testcase = paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c',
...: ... 'a/b/../../a/b/c/', 'a/b/../../a/b/c']
In [14]: for t in testcase:
...: print(path2unix(t)[-1])
...:
...:
c
c
c
c
c
c
c
The idea here is to convert all paths into the unified internal representation of pathlib2
, with different decoders depending on the platform. Fortunately, pathlib2
includes a generic decoder called PurePath
that should work on any path. In case this does not work, you can force the recognition of windows path using fromwinpath=True
. This will split the input string into parts, the last one is the leaf you are looking for, hence the path2unix(t)[-1]
.
If the argument nojoin=False
, the path will be joined back, so that the output is simply the input string converted to a Unix format, which can be useful to compare subpaths across platforms.
Solution 22 - Python
I use this method on Windows and Ubuntu (WSL) and it works as (I) expected only using 'import os': So basically, replace() put the right path seperator based on your current os platform.
If the path finished by a slash '/', then it's not a file but a directory, so it returns an empty string.
import os
my_fullpath = r"D:\MY_FOLDER\TEST\20201108\20201108_073751.DNG"
os.path.basename(my_fullpath.replace('\\',os.sep))
my_fullpath = r"/MY_FOLDER/TEST/20201108/20201108_073751.DNG"
os.path.basename(my_fullpath.replace('\\',os.sep))
my_fullpath = r"/MY_FOLDER/TEST/20201108/"
os.path.basename(my_fullpath.replace('\\',os.sep))
my_fullpath = r"/MY_FOLDER/TEST/20201108"
os.path.basename(my_fullpath.replace('\\',os.sep))