How to delete or purge old files on S3?

Amazon S3TimestampDelete FileS3cmdPurge

Amazon S3 Problem Overview


Are there existing solutions to delete any files older than x days?

Amazon S3 Solutions


Solution 1 - Amazon S3

Amazon has introduced object expiration recently.

> Amazon S3 Announces Object Expiration > > Amazon S3 announced a new > feature, Object Expiration that allows you to schedule the deletion of > your objects after a pre-defined time period. Using Object Expiration > to schedule periodic removal of objects eliminates the need for you > to identify objects for deletion and submit delete requests to Amazon > S3. > > You can define Object Expiration rules for a set of objects in > your bucket. Each Object Expiration rule allows you to specify a > prefix and an expiration period in days. The prefix field (e.g. > logs/) identifies the object(s) subject to the expiration rule, and > the expiration period specifies the number of days from creation date > (i.e. age) after which object(s) should be removed. Once the objects > are past their expiration date, they will be queued for deletion. You > will not be billed for storage for objects on or after their > expiration date.

Solution 2 - Amazon S3

Here is some info on how to do it...

http://docs.amazonwebservices.com/AmazonS3/latest/dev/ObjectExpiration.html

Hope this helps.

Solution 3 - Amazon S3

You can use AWS S3 Life cycle rules to expire the files and delete them. All you have to do is select the bucket, click on "Add lifecycle rules" button and configure it and AWS will take care of them for you.

You can refer the below blog post from Joe for step by step instructions. It's quite simple actually:

https://www.joe0.com/2017/05/24/amazon-s3-how-to-delete-files-older-than-x-days/

Hope it helps!

Solution 4 - Amazon S3

Here is how to implement it using a CloudFormation template:

  JenkinsArtifactsBucket:
    Type: "AWS::S3::Bucket"
    Properties:
      BucketName: !Sub "jenkins-artifacts"
      LifecycleConfiguration:
        Rules:
          - Id: "remove-old-artifacts"
            ExpirationInDays: 3
            NoncurrentVersionExpirationInDays: 3
            Status: Enabled

This creates a lifecycle rule as explained by @Ravi Bhatt

Read more on that: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-s3-bucket-lifecycleconfig-rule.html

How object lifecycle management works: https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html

Solution 5 - Amazon S3

You can use the following Powershell script to delete object expired after x days.

[CmdletBinding()]
Param(  
  [Parameter(Mandatory=$True)]
  [string]$BUCKET_NAME,             #Name of the Bucket
  
  [Parameter(Mandatory=$True)]
  [string]$OBJ_PATH,                #Key prefix of s3 object (directory path)
  
  [Parameter(Mandatory=$True)]
  [string]$EXPIRY_DAYS             #Number of days to expire
)

$CURRENT_DATE = Get-Date
$OBJECTS = Get-S3Object $BUCKET_NAME -KeyPrefix $OBJ_PATH
Foreach($OBJ in $OBJECTS){
    IF($OBJ.key -ne $OBJ_PATH){
        IF(($CURRENT_DATE - $OBJ.LastModified).Days -le $EXPIRY_DAYS){
            Write-Host "Deleting Object= " $OBJ.key
            Remove-S3Object -BucketName $BUCKET_NAME -Key $OBJ.Key -Force
        }
    }
}

Solution 6 - Amazon S3

Here is a Python script to delete N days old files

from boto3 import client, Session
from botocore.exceptions import ClientError
from datetime import datetime, timezone
import argparse

if __name__ == '__main__':

    parser = argparse.ArgumentParser()
    
    parser.add_argument('--access_key_id', required=True)
    parser.add_argument('--secret_access_key', required=True)
    parser.add_argument('--delete_after_retention_days', required=False, default=15)
    parser.add_argument('--bucket', required=True)
    parser.add_argument('--prefix', required=False, default="")
    parser.add_argument('--endpoint', required=True)

    args = parser.parse_args()

    access_key_id = args.access_key_id
    secret_access_key = args.secret_access_key
    delete_after_retention_days = int(args.delete_after_retention_days)
    bucket = args.bucket
    prefix = args.prefix
    endpoint = args.endpoint

    # get current date
    today = datetime.now(timezone.utc)

    try:
        # create a connection to Wasabi
        s3_client = client(
            's3',
            endpoint_url=endpoint,
            access_key_id=access_key_id,
            secret_access_key=secret_access_key)
    except Exception as e:
        raise e

    try:
        # list all the buckets under the account
        list_buckets = s3_client.list_buckets()
    except ClientError:
        # invalid access keys
        raise Exception("Invalid Access or Secret key")

    # create a paginator for all objects.
    object_response_paginator = s3_client.get_paginator('list_object_versions')
    if len(prefix) > 0:
        operation_parameters = {'Bucket': bucket,
                                'Prefix': prefix}
    else:
        operation_parameters = {'Bucket': bucket}

    # instantiate temp variables.
    delete_list = []
    count_current = 0
    count_non_current = 0

    print("$ Paginating bucket " + bucket)
    for object_response_itr in object_response_paginator.paginate(**operation_parameters):
        for version in object_response_itr['Versions']:
            if version["IsLatest"] is True:
                count_current += 1
            elif version["IsLatest"] is False:
                count_non_current += 1
            if (today - version['LastModified']).days > delete_after_retention_days:
                delete_list.append({'Key': version['Key'], 'VersionId': version['VersionId']})

    # print objects count
    print("-" * 20)
    print("$ Before deleting objects")
    print("$ current objects: " + str(count_current))
    print("$ non-current objects: " + str(count_non_current))
    print("-" * 20)

    # delete objects 1000 at a time
    print("$ Deleting objects from bucket " + bucket)
    for i in range(0, len(delete_list), 1000):
        response = s3_client.delete_objects(
            Bucket=bucket,
            Delete={
                'Objects': delete_list[i:i + 1000],
                'Quiet': True
            }
        )
        print(response)

    # reset counts
    count_current = 0
    count_non_current = 0

    # paginate and recount
    print("$ Paginating bucket " + bucket)
    for object_response_itr in object_response_paginator.paginate(Bucket=bucket):
        if 'Versions' in object_response_itr:
            for version in object_response_itr['Versions']:
                if version["IsLatest"] is True:
                    count_current += 1
                elif version["IsLatest"] is False:
                    count_non_current += 1

    # print objects count
    print("-" * 20)
    print("$ After deleting objects")
    print("$ current objects: " + str(count_current))
    print("$ non-current objects: " + str(count_non_current))
    print("-" * 20)
    print("$ task complete")

And here is how I run it

python s3_cleanup.py --aws_access_key_id="access-key" --aws_secret_access_key="secret-key-here" --endpoint="https://s3.us-west-1.wasabisys.com" --bucket="ondemand-downloads" --prefix="" --delete_after_retention_days=5

If you want to delete files only from a specific folder then use prefix parameter

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionErikView Question on Stackoverflow
Solution 1 - Amazon S3Ravi BhattView Answer on Stackoverflow
Solution 2 - Amazon S3JonLovettView Answer on Stackoverflow
Solution 3 - Amazon S3Raghu ChinnannanView Answer on Stackoverflow
Solution 4 - Amazon S3sashok_bgView Answer on Stackoverflow
Solution 5 - Amazon S3Mithun BiswasView Answer on Stackoverflow
Solution 6 - Amazon S3Umair AyubView Answer on Stackoverflow