Sort filenames in directory in ascending order

PythonSorting

Python Problem Overview


I have a directory with jpgs and other files in it, the jpgs all have filenames with numbers in them. Some may have additional strings in the filename.

For example.

01.jpg

Or it could be

Picture 03.jpg

In Python I need a list of all the jpgs in ascending order. Here is the code snippet for this

import os
import numpy as np

myimages = [] #list of image filenames
dirFiles = os.listdir('.') #list of directory files
dirFiles.sort() #good initial sort but doesnt sort numerically very well
sorted(dirFiles) #sort numerically in ascending order

for files in dirFiles: #filter out all non jpgs
    if '.jpg' in files:
	    myimages.append(files)
print len(myimages)
print myimages

What I get is this

['0.jpg', '1.jpg', '10.jpg', '11.jpg', '12.jpg', '13.jpg', '14.jpg', '15.jpg', '16.jpg', '17.jpg', '18.jpg', '19.jpg', '2.jpg', '20.jpg', '21.jpg', '22.jpg', '23.jpg', '24.jpg', '25.jpg', '26.jpg', '27.jpg', '28.jpg', '29.jpg', '3.jpg', '30.jpg', '31.jpg', '32.jpg', '33.jpg', '34.jpg', '35.jpg', '36.jpg', '37.jpg', '4.jpg', '5.jpg', '6.jpg', '7.jpg', '8.jpg', '9.jpg']

Clearly it sorts blindly the most significant number first. I tried using sorted() as you can see hoping that it would fix it but it makes no difference.

Python Solutions


Solution 1 - Python

Assuming there's just one number in each file name:

>>> dirFiles = ['Picture 03.jpg', '02.jpg', '1.jpg']
>>> dirFiles.sort(key=lambda f: int(filter(str.isdigit, f)))
>>> dirFiles
['1.jpg', '02.jpg', 'Picture 03.jpg']

A version that also works in Python 3:

>>> dirFiles.sort(key=lambda f: int(re.sub('\D', '', f)))

Solution 2 - Python

there is a module natsort. Just do pip install natsort.

>>> import natsort 
>>> ll = ['Picture 13.jpg', 'Picture 14.jpg', 'Picture 15.jpg','Picture 0.jpg', 'Picture 1.jpg', 'Picture 10.jpg', 'Picture 11.jpg', 'Picture 12.jpg',  'Picture 16.jpg', 'Picture 17.jpg', 'Picture 18.jpg', 'Picture 19.jpg', 'Picture 2.jpg', 'Picture 20.jpg', 'Picture 21.jpg', 'Picture 22.jpg', 'Picture 23.jpg', 'Picture 24.jpg', 'Picture 25.jpg', 'Picture 26.jpg', 'Picture 27.jpg', 'Picture 28.jpg', 'Picture 29.jpg', 'Picture 3.jpg', 'Picture 30.jpg', 'Picture 31.jpg', 'Picture 32.jpg', 'Picture 33.jpg', 'Picture 34.jpg', 'Picture 35.jpg', 'Picture 36.jpg', 'Picture 37.jpg']         
>>> print(natsort.natsorted(ll,reverse=True))
['Picture 37.jpg', 'Picture 36.jpg', 'Picture 35.jpg', 'Picture 34.jpg', 'Picture 33.jpg', 'Picture 32.jpg', 'Picture 31.jpg', 'Picture 30.jpg', 'Picture 29.jpg', 'Picture 28.jpg', 'Picture 27.jpg', 'Picture 26.jpg', 'Picture 25.jpg', 'Picture 24.jpg', 'Picture 23.jpg', 'Picture 22.jpg', 'Picture 21.jpg', 'Picture 20.jpg', 'Picture 19.jpg', 'Picture 18.jpg', 'Picture 17.jpg', 'Picture 16.jpg', 'Picture 15.jpg', 'Picture 14.jpg', 'Picture 13.jpg', 'Picture 12.jpg', 'Picture 11.jpg', 'Picture 10.jpg', 'Picture 3.jpg', 'Picture 2.jpg', 'Picture 1.jpg', 'Picture 0.jpg']

Solution 3 - Python

> I have a directory with jpgs and other files in it.

[...]

> > ['0.jpg', '1.jpg', '10.jpg', '11.jpg', '12.jpg', '13.jpg', '14.jpg', > '15.jpg', '16.jpg', '17.jpg', '18.jpg', '19.jpg', '2.jpg', '20.jpg', > '21.jpg', '22.jpg', '23.jpg', '24.jpg', '25.jpg', '26.jpg', '27.jpg', > '28.jpg', '29.jpg', '3.jpg', '30.jpg', '31.jpg', '32.jpg', '33.jpg', > '34.jpg', '35.jpg', '36.jpg', '37.jpg', '4.jpg', '5.jpg', '6.jpg', > '7.jpg', '8.jpg', '9.jpg'] Clearly it sorts blindly the most > significant number first. I tried using sorted() as you can see hoping > that it would fix it but it makes no difference

You can use splitext to get the part without the extension and convert it to an int for the sorting. If the list is named 'l' and the sorted list is named 'lsorted' you can use:

lsorted = sorted(imgs_list, key=lambda x: int(os.path.splitext(x)[0]))

"imgs_list" here is the list of images. If you have a directory of images, simply obtain a list of these images by :

l = os.listdir('/path/to/directory/of/images')

Explanation: os.path.splitext on '10.jpg' returns ['10','.jpg'] so taking the int() of element zero will give you want you want as long as the filenames without the extention only contain strings that can be converted to integers with int(). Otherwise you will run into an Error.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionChumbiChubaGoView Question on Stackoverflow
Solution 1 - PythonStefan PochmannView Answer on Stackoverflow
Solution 2 - PythonLetzerWilleView Answer on Stackoverflow
Solution 3 - PythonhftView Answer on Stackoverflow