Delete multiple files matching a pattern

Python

Python Problem Overview


I have made an online gallery using Python and Django. I've just started to add editing functionality, starting with a rotation. I use sorl.thumbnail to auto-generate thumbnails on demand.

When I edit the original file, I need to clean up all the thumbnails so new ones are generated. There are three or four of them per image (I have different ones for different occasions).

I could hard-code in the file-varients... But that's messy and if I change the way I do things, I'll need to revisit the code.

Ideally I'd like to do a regex-delete. In regex terms, all my originals are named like so:

^(?P<photo_id>\d+)\.jpg$

So I want to delete:

^(?P<photo_id>\d+)[^\d].*jpg$

(Where I replace photo_id with the ID I want to clean.)

Python Solutions


Solution 1 - Python

Using the glob module:

import glob, os
for f in glob.glob("P*.jpg"):
    os.remove(f)

Alternatively, using pathlib:

from pathlib import Path
for p in Path(".").glob("P*.jpg"):
    p.unlink()

Solution 2 - Python

Try something like this:

import os, re

def purge(dir, pattern):
	for f in os.listdir(dir):
		if re.search(pattern, f):
			os.remove(os.path.join(dir, f))

Then you would pass the directory containing the files and the pattern you wish to match.

Solution 3 - Python

If you need recursion into several subdirectories, you can use this method:

import os, re, os.path
pattern = "^(?P<photo_id>\d+)[^\d].*jpg$"
mypath = "Photos"
for root, dirs, files in os.walk(mypath):
    for file in filter(lambda x: re.match(pattern, x), files):
        os.remove(os.path.join(root, file))

You can safely remove subdirectories on the fly from dirs, which contains the list of the subdirectories to visit at each node.

Note that if you are in a directory, you can also get files corresponding to a simple pattern expression with glob.glob(pattern). In this case you would have to substract the set of files to keep from the whole set, so the code above is more efficient.

Solution 4 - Python

How about this?

import glob, os, multiprocessing
p = multiprocessing.Pool(4)
p.map(os.remove, glob.glob("P*.jpg"))

Mind you this does not do recursion and uses wildcards (not regex).

UPDATE In Python 3 the map() function will return an iterator, not a list. This is useful since you will probably want to do some kind processing on the items anyway, and an iterator will always be more memory-efficient to that end.

If however, a list is what you really need, just do this:

...
list(p.map(os.remove, glob.glob("P*.jpg")))

I agree it's not the most functional way, but it's concise and does the job.

Solution 5 - Python

It's not clear to me that you actually want to do any named-group matching -- in the use you describe, the photoid is an input to the deletion function, and named groups' purpose is "output", i.e., extracting certain substrings from the matched string (and accessing them by name in the match object). So, I would recommend a simpler approach:

import re
import os

def delete_thumbnails(photoid, photodirroot):
  matcher = re.compile(r'^%s\d+\D.*jpg$' % photoid)
  numdeleted = 0
  for rootdir, subdirs, filenames in os.walk(photodirroot):
    for name in filenames:
      if not matcher.match(name):
        continue
      path = os.path.join(rootdir, name)
      os.remove(path)
      numdeleted += 1
  return "Deleted %d thumbnails for %r" % (numdeleted, photoid)

You can pass the photoid as a normal string, or as a RE pattern piece if you need to remove several matchable IDs at once (e.g., r'abc[def] to remove abcd, abce, and abcf in a single call) -- that's the reason I'm inserting it literally in the RE pattern, rather than inserting the string re.escape(photoid) as would be normal practice. Certain parts such as counting the number of deletions and returning an informative message at the end are obviously frills which you should remove if they give you no added value in your use case.

Others, such as the "if not ... // continue" pattern, are highly recommended practice in Python (flat is better than nested: bailing out to the next leg of the loop as soon as you determine there is nothing to do on this one is better than nesting the actions to be done within an if), although of course other arrangements of the code would work too.

Solution 6 - Python

My recomendation:

def purge(dir, pattern, inclusive=True):
    regexObj = re.compile(pattern)
    for root, dirs, files in os.walk(dir, topdown=False):
        for name in files:
            path = os.path.join(root, name)
            if bool(regexObj.search(path)) == bool(inclusive):
                os.remove(path)
        for name in dirs:
            path = os.path.join(root, name)
            if len(os.listdir(path)) == 0:
                os.rmdir(path)

This will recursively remove every file that matches the pattern by default, and every file that doesn't if inclusive is true. It will then remove any empty folders from the directory tree.

Solution 7 - Python

import os, sys, glob, re

def main():

    mypath = "<Path to Root Folder to work within>"
    for root, dirs, files in os.walk(mypath):
        for file in files:
            p = os.path.join(root, file)
            if os.path.isfile(p):
                if p[-4:] == ".jpg": #Or any pattern you want
                os.remove(p)

Solution 8 - Python

I find Popen(["rm " + file_name + "*.ext"], shell=True, stdout=PIPE).communicate() to be a much simpler solution to this problem. Although this is prone to injection attacks, I don't see any issues if your program is using this internally.

Solution 9 - Python

def recursive_purge(dir, pattern):
    for f in os.listdir(dir):
        if os.path.isdir(os.path.join(dir, f)):
            recursive_purge(os.path.join(dir, f), pattern)
        elif re.search(pattern, os.path.join(dir, f)):
            os.remove(os.path.join(dir, f))

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionOliView Question on Stackoverflow
Solution 1 - PythonSam BullView Answer on Stackoverflow
Solution 2 - PythonAndrew HareView Answer on Stackoverflow
Solution 3 - PythonRedGlyphView Answer on Stackoverflow
Solution 4 - PythonValeriu PaloČ™ View Answer on Stackoverflow
Solution 5 - PythonAlex MartelliView Answer on Stackoverflow
Solution 6 - PythonDRayXView Answer on Stackoverflow
Solution 7 - PythonCharlieView Answer on Stackoverflow
Solution 8 - PythonKartosView Answer on Stackoverflow
Solution 9 - PythonYanay ManhaimView Answer on Stackoverflow