Quick way to list all files in Amazon S3 bucket?

Amazon S3

Amazon S3 Problem Overview


I have an amazon s3 bucket that has tens of thousands of filenames in it. What's the easiest way to get a text file that lists all the filenames in the bucket?

Amazon S3 Solutions


Solution 1 - Amazon S3

I'd recommend using boto. Then it's a quick couple of lines of python:

from boto.s3.connection import S3Connection

conn = S3Connection('access-key','secret-access-key')
bucket = conn.get_bucket('bucket')
for key in bucket.list():
    print(key.name.encode('utf-8'))

Save this as list.py, open a terminal, and then run:

$ python list.py > results.txt

Solution 2 - Amazon S3

AWS CLI

Documentation for aws s3 ls

AWS have recently release their Command Line Tools. This works much like boto and can be installed using sudo easy_install awscli or sudo pip install awscli

Once you have installed, you can then simply run

aws s3 ls

Which will show you all of your available buckets

CreationTime Bucket
       ------------ ------
2013-07-11 17:08:50 mybucket
2013-07-24 14:55:44 mybucket2

You can then query a specific bucket for files.

Command:

aws s3 ls s3://mybucket

Output:

Bucket: mybucket
Prefix:

      LastWriteTime     Length Name
      -------------     ------ ----
                           PRE somePrefix/
2013-07-25 17:06:27         88 test.txt

This will show you all of your files.

Solution 3 - Amazon S3

http://s3tools.org/s3cmd">s3cmd</a> is invaluable for this kind of thing

$ s3cmd ls -r s3://yourbucket/ | awk '{print $4}' > objects_in_bucket

Solution 4 - Amazon S3

Be carefull, amazon list only returns 1000 files. If you want to iterate over all files you have to paginate the results using markers :

In ruby using aws-s3

bucket_name = 'yourBucket'
marker = ""

AWS::S3::Base.establish_connection!(
  :access_key_id => 'your_access_key_id',
  :secret_access_key => 'your_secret_access_key'
)

loop do
  objects = Bucket.objects(bucket_name, :marker=>marker, :max_keys=>1000)
  break if objects.size == 0
  marker = objects.last.key

  objects.each do |obj|
      puts "#{obj.key}"
  end
end

end

Hope this helps, vincent

Solution 5 - Amazon S3

Update 15-02-2019:

This command will give you a list of all buckets in AWS S3:

aws s3 ls

This command will give you a list of all top-level objects inside an AWS S3 bucket:

aws s3 ls bucket-name

This command will give you a list of ALL objects inside an AWS S3 bucket:

aws s3 ls bucket-name --recursive

This command will place a list of ALL inside an AWS S3 bucket... inside a text file in your current directory:

aws s3 ls bucket-name --recursive | cat >> file-name.txt

Solution 6 - Amazon S3

There are couple of ways you can go about it. Using Python

import boto3

sesssion = boto3.Session(aws_access_key_id, aws_secret_access_key)

s3 = sesssion.resource('s3')

bucketName = 'testbucket133'
bucket = s3.Bucket(bucketName)

for obj in bucket.objects.all():
	print(obj.key)

Another way is using AWS cli for it

aws s3 ls s3://{bucketname}
example : aws s3 ls s3://testbucket133

Solution 7 - Amazon S3

For Scala developers, here it is recursive function to execute a full scan and map the contents of an AmazonS3 bucket using the official AWS SDK for Java

import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest}
import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}

def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {

  def scan(acc:List[T], listing:ObjectListing): List[T] = {
    val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())
    val mapped = (for (summary <- summaries) yield f(summary)).toList

    if (!listing.isTruncated) mapped.toList
    else scan(acc ::: mapped, s3.listNextBatchOfObjects(listing))
  }

  scan(List(), s3.listObjects(bucket, prefix))
}

To invoke the above curried map() function, simply pass the already constructed (and properly initialized) AmazonS3Client object (refer to the official AWS SDK for Java API Reference), the bucket name and the prefix name in the first parameter list. Also pass the function f() you want to apply to map each object summary in the second parameter list.

For example

val keyOwnerTuples = map(s3, bucket, prefix)(s => (s.getKey, s.getOwner))

will return the full list of (key, owner) tuples in that bucket/prefix

or

map(s3, "bucket", "prefix")(s => println(s))

as you would normally approach by Monads in Functional Programming

Solution 8 - Amazon S3

After zach I would also recommend boto, but I needed to make a slight difference to his code:

conn = boto.connect_s3('access-key', 'secret'key')
bucket = conn.lookup('bucket-name')
for key in bucket:
    print key.name

Solution 9 - Amazon S3

aws s3api list-objects --bucket bucket-name

For more details see here - http://docs.aws.amazon.com/cli/latest/reference/s3api/list-objects.html

Solution 10 - Amazon S3

For Python's boto3 after having used aws configure:

import boto3
s3 = boto3.resource('s3')

bucket = s3.Bucket('name')
for obj in bucket.objects.all():
    print(obj.key)

Solution 11 - Amazon S3

First make sure you are on an instance terminal and you have all access of S3 in IAM you are using. For example I used an ec2 instance.

pip3 install awscli

Then Configure aws

aws configure

Then fill outcredantials ex:-

$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]: json (or just press enter)

Now, See all buckets

aws s3 ls

Store all buckets name

aws s3 ls > output.txt

See all file structure in a bucket

aws s3 ls bucket-name --recursive

Store file structure in each bucket

aws s3 ls bucket-name --recursive > file_Structure.txt

Hope this helps.

Solution 12 - Amazon S3

AWS CLI can let you see all files of an S3 bucket quickly and help in performing other operations too.

To use AWS CLI follow steps below:

  1. Install AWS CLI.

  2. Configure AWS CLI for using default security credentials and default AWS Region.

  3. To see all files of an S3 bucket use command

    aws s3 ls s3://your_bucket_name --recursive

Reference to use AWS cli for different AWS services: https://docs.aws.amazon.com/cli/latest/reference/

Solution 13 - Amazon S3

In Java you can get the keys using ListObjects (see AWS documentation)

FileWriter fileWriter;
BufferedWriter bufferedWriter;
// [...]

AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider());        

ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
.withBucketName(bucketName)
.withPrefix("myprefix");
ObjectListing objectListing;

do {
	objectListing = s3client.listObjects(listObjectsRequest);
	for (S3ObjectSummary objectSummary : 
		objectListing.getObjectSummaries()) {
        // write to file with e.g. a bufferedWriter
        bufferedWriter.write(objectSummary.getKey());
	}
	listObjectsRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());

Solution 14 - Amazon S3

Code in python using the awesome "boto" lib. The code returns a list of files in a bucket and also handles exceptions for missing buckets.

import boto

conn = boto.connect_s3( <ACCESS_KEY>, <SECRET_KEY> )
try:
    bucket = conn.get_bucket( <BUCKET_NAME>, validate = True )
except boto.exception.S3ResponseError, e:
    do_something() # The bucket does not exist, choose how to deal with it or raise the exception
      
return [ key.name.encode( "utf-8" ) for key in bucket.list() ]

Don't forget to replace the < PLACE_HOLDERS > with your values.

Solution 15 - Amazon S3

You can use standard s3 api -

aws s3 ls s3://root/folder1/folder2/

Solution 16 - Amazon S3

The below command will get all the file names from your AWS S3 bucket and write into text file in your current directory:

aws s3 ls s3://Bucketdirectory/Subdirectory/ | cat >> FileNames.txt

Solution 17 - Amazon S3

I know its old topic, but I'd like to contribute too.

With the newer version of boto3 and python, you can get the files as follow:

import os
import boto3
from botocore.exceptions import ClientError    

client = boto3.client('s3')

bucket = client.list_objects(Bucket=BUCKET_NAME)
for content in bucket["Contents"]:
    key = content["Key"]

Keep in mind that this solution not comprehends pagination.

For more information: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects

Edit: Fixed the name for the "Key" from small to capital K

Solution 18 - Amazon S3

Here's a way to use the stock AWS CLI to generate a diff-able list of just object names:

aws s3api list-objects --bucket "$BUCKET" --query "Contents[].{Key: Key}" --output text

(based on https://stackoverflow.com/a/54378943/53529)

This gives you the full object name of every object in the bucket, separated by new lines. Useful if you want to diff between the contents of an S3 bucket and a GCS bucket, for example.

Solution 19 - Amazon S3

function showUploads(){
	if (!class_exists('S3')) require_once 'S3.php';
	// AWS access info
	if (!defined('awsAccessKey')) define('awsAccessKey', '234567665464tg');
	if (!defined('awsSecretKey')) define('awsSecretKey', 'dfshgfhfghdgfhrt463457');
	$bucketName = 'my_bucket1234';
	$s3 = new S3(awsAccessKey, awsSecretKey);
	$contents = $s3->getBucket($bucketName);
	echo "<hr/>List of Files in bucket : {$bucketName} <hr/>";
	$n = 1;
	foreach ($contents as $p => $v):
		echo $p."<br/>";
		$n++;
	endforeach;
}

Solution 20 - Amazon S3

Alternatively you can use Minio Client aka mc. Its Open Source and compatible with AWS S3. It is available for Linux, Windows, Mac, FreeBSD.

All you have do do is to run mc ls command for listing the contents.

$ mc ls s3/kline/
[2016-04-30 13:20:47 IST] 1.1MiB 1.jpg
[2016-04-30 16:03:55 IST] 7.5KiB docker.png
[2016-04-30 15:16:17 IST]  50KiB pi.png
[2016-05-10 14:34:39 IST] 365KiB upton.pdf

Note:

  • s3: Alias for Amazon S3
  • kline: AWS S3 bucket name

Installing Minio Client Linux Download mc for:

$ chmod 755 mc
$ ./mc --help

Setting up AWS credentials with Minio Client

$ mc config host add mys3 https://s3.amazonaws.com BKIKJAA5BMMU2RHO6IBB V7f1CwQqAcwo80UEIJEjc5gVQUSSx5ohQ9GSrr12

Note: Please replace mys3 with alias you would like for this account and ,BKIKJAA5BMMU2RHO6IBB, V7f1CwQqAcwo80UEIJEjc5gVQUSSx5ohQ9GSrr12 with your AWS ACCESS-KEY and SECRET-KEY

Hope it helps.

Disclaimer: I work for Minio

Solution 21 - Amazon S3

You can list all the files, in the aws s3 bucket using the command

aws s3 ls path/to/file

and to save it in a file, use

aws s3 ls path/to/file >> save_result.txt

if you want to append your result in a file otherwise:

aws s3 ls path/to/file > save_result.txt

if you want to clear what was written before.

It will work both in windows and Linux.

Solution 22 - Amazon S3

In javascript you can use

s3.listObjects(params, function (err, result) {});

to get all objects inside bucket. you have to pass bucket name inside params (Bucket: name).

Solution 23 - Amazon S3

public static Dictionary<string, DateTime> ListBucketsByCreationDate(string AccessKey, string SecretKey)  
{  

    return AWSClientFactory.CreateAmazonS3Client(AccessKey,
        SecretKey).ListBuckets().Buckets.ToDictionary(s3Bucket => s3Bucket.BucketName,
        s3Bucket => DateTime.Parse(s3Bucket.CreationDate));

}

Solution 24 - Amazon S3

In PHP you can get complete list of AWS-S3 objects inside specific bucket using following call

$S3 = \Aws\S3\S3Client::factory(array('region' => $region,));
$iterator = $S3->getIterator('ListObjects', array('Bucket' => $bucket));
foreach ($iterator as $obj) {
    echo $obj['Key'];
}

You can redirect output of the above code in to a file to get list of keys.

Solution 25 - Amazon S3

Simplified and updated version of the Scala answer by Paolo:

import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}
import com.amazonaws.services.s3.AmazonS3
import com.amazonaws.services.s3.model.{ListObjectsRequest, ObjectListing, S3ObjectSummary}

def buildListing(s3: AmazonS3, request: ListObjectsRequest): List[S3ObjectSummary] = {
  def buildList(listIn: List[S3ObjectSummary], bucketList:ObjectListing): List[S3ObjectSummary] = {
    val latestList: List[S3ObjectSummary] = bucketList.getObjectSummaries.toList

    if (!bucketList.isTruncated) listIn ::: latestList
    else buildList(listIn ::: latestList, s3.listNextBatchOfObjects(bucketList))
  }

  buildList(List(), s3.listObjects(request))
}

Stripping out the generics and using the ListObjectRequest generated by the SDK builders.

Solution 26 - Amazon S3

# find like file listing for s3 files
aws s3api --profile <<profile-name>> \
--endpoint-url=<<end-point-url>> list-objects \
--bucket <<bucket-name>> --query 'Contents[].{Key: Key}'

Solution 27 - Amazon S3

Use plumbum to wrap the cli and you will have a clear syntax:

import plumbum as pb
folders = pb.local['aws']('s3', 'ls')

Solution 28 - Amazon S3

please try this bash script. it uses curl command with no need for any external dependencies

bucket=<bucket_name>
region=<region_name>
awsAccess=<access_key>
awsSecret=<secret_key>
awsRegion="${region}"
baseUrl="s3.${awsRegion}.amazonaws.com"

m_sed() {
  if which gsed > /dev/null 2>&1; then
    gsed "$@"
  else
    sed "$@"
  fi
}

awsStringSign4() {
  kSecret="AWS4$1"
  kDate=$(printf         '%s' "$2" | openssl dgst -sha256 -hex -mac HMAC -macopt "key:${kSecret}"     2>/dev/null | m_sed 's/^.* //')
  kRegion=$(printf       '%s' "$3" | openssl dgst -sha256 -hex -mac HMAC -macopt "hexkey:${kDate}"    2>/dev/null | m_sed 's/^.* //')
  kService=$(printf      '%s' "$4" | openssl dgst -sha256 -hex -mac HMAC -macopt "hexkey:${kRegion}"  2>/dev/null | m_sed 's/^.* //')
  kSigning=$(printf 'aws4_request' | openssl dgst -sha256 -hex -mac HMAC -macopt "hexkey:${kService}" 2>/dev/null | m_sed 's/^.* //')
  signedString=$(printf  '%s' "$5" | openssl dgst -sha256 -hex -mac HMAC -macopt "hexkey:${kSigning}" 2>/dev/null | m_sed 's/^.* //')
  printf '%s' "${signedString}"
}

if [ -z "${region}" ]; then
  region="${awsRegion}"
fi


# Initialize helper variables

authType='AWS4-HMAC-SHA256'
service="s3"
dateValueS=$(date -u +'%Y%m%d')
dateValueL=$(date -u +'%Y%m%dT%H%M%SZ')

# 0. Hash the file to be uploaded

# 1. Create canonical request

# NOTE: order significant in ${signedHeaders} and ${canonicalRequest}

signedHeaders='host;x-amz-content-sha256;x-amz-date'

canonicalRequest="\
GET
/

host:${bucket}.s3.amazonaws.com
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:${dateValueL}

${signedHeaders}
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"

# Hash it

canonicalRequestHash=$(printf '%s' "${canonicalRequest}" | openssl dgst -sha256 -hex 2>/dev/null | m_sed 's/^.* //')

# 2. Create string to sign

stringToSign="\
${authType}
${dateValueL}
${dateValueS}/${region}/${service}/aws4_request
${canonicalRequestHash}"

# 3. Sign the string

signature=$(awsStringSign4 "${awsSecret}" "${dateValueS}" "${region}" "${service}" "${stringToSign}")

# Upload

curl -g -k "https://${baseUrl}/${bucket}" \
  -H "x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" \
  -H "x-amz-Date: ${dateValueL}" \
  -H "Authorization: ${authType} Credential=${awsAccess}/${dateValueS}/${region}/${service}/aws4_request,SignedHeaders=${signedHeaders},Signature=${signature}"

Solution 29 - Amazon S3

For getting full links run

aws s3 ls s3://bucket/ | awk '{print $4}' | xargs -I{} echo "s3://bucket/{}"

Solution 30 - Amazon S3

This is an old question but the number of responses tells me many people hit this page.

The easiest way I found is to just use the built in AWS console for creating an inventory. It's easy to set up but the first CSV file can take up to 48 hours to show up. After that you can create either a daily or weekly output to a bucket of your choosing.

Solution 31 - Amazon S3

The EASIEST way to get a very usable text file is to download S3 Browser http://s3browser.com/ and use the Web URLs Generator to produce a list of complete link paths. It is very handy and involves about 3 clicks.

-Browse to Folder
-Select All
-Generate Urls

Best of luck to you.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSteveView Question on Stackoverflow
Solution 1 - Amazon S3Zachary OzerView Answer on Stackoverflow
Solution 2 - Amazon S3LaykeView Answer on Stackoverflow
Solution 3 - Amazon S3mat kelceyView Answer on Stackoverflow
Solution 4 - Amazon S3vdaubryView Answer on Stackoverflow
Solution 5 - Amazon S3Khalil GharbaouiView Answer on Stackoverflow
Solution 6 - Amazon S3Mahesh MogalView Answer on Stackoverflow
Solution 7 - Amazon S3pangioleView Answer on Stackoverflow
Solution 8 - Amazon S3DatageekView Answer on Stackoverflow
Solution 9 - Amazon S3sysuserView Answer on Stackoverflow
Solution 10 - Amazon S3AndréView Answer on Stackoverflow
Solution 11 - Amazon S3Hari_pbView Answer on Stackoverflow
Solution 12 - Amazon S3singh30View Answer on Stackoverflow
Solution 13 - Amazon S3H6.View Answer on Stackoverflow
Solution 14 - Amazon S3OranView Answer on Stackoverflow
Solution 15 - Amazon S3NrjView Answer on Stackoverflow
Solution 16 - Amazon S3Praveenkumar SekarView Answer on Stackoverflow
Solution 17 - Amazon S3Brenno LealView Answer on Stackoverflow
Solution 18 - Amazon S3Pete HodgsonView Answer on Stackoverflow
Solution 19 - Amazon S3Sandeep PenmetsaView Answer on Stackoverflow
Solution 20 - Amazon S3koolhead17View Answer on Stackoverflow
Solution 21 - Amazon S3Aklank JainView Answer on Stackoverflow
Solution 22 - Amazon S3murtaza sanjeliwalaView Answer on Stackoverflow
Solution 23 - Amazon S3user1172192View Answer on Stackoverflow
Solution 24 - Amazon S3Shriganesh ShintreView Answer on Stackoverflow
Solution 25 - Amazon S3wildgoozeView Answer on Stackoverflow
Solution 26 - Amazon S3Yordan GeorgievView Answer on Stackoverflow
Solution 27 - Amazon S3JaviOverflowView Answer on Stackoverflow
Solution 28 - Amazon S3Bahram ZaeriView Answer on Stackoverflow
Solution 29 - Amazon S3Łukasz KidzińskiView Answer on Stackoverflow
Solution 30 - Amazon S3Craig.PearceView Answer on Stackoverflow
Solution 31 - Amazon S3Elliot ThorntonView Answer on Stackoverflow