Check file size on S3 without downloading?
Amazon S3Amazon S3 Problem Overview
I have customer files uploaded to Amazon S3, and I would like to add a feature to count the size of those files for each customer. Is there a way to "peek" into the file size without downloading them? I know you can view from the Amazon control panel but I need to do it pro grammatically.
Amazon S3 Solutions
Solution 1 - Amazon S3
Send an HTTP HEAD request to the object. A HEAD request will retrieve the same HTTP headers as a GET request, but it will not retrieve the body of the object (saving you bandwidth). You can then parse out the Content-Length header value from the HTTP response headers.
Solution 2 - Amazon S3
Node.js example:
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
function sizeOf(key, bucket) {
return s3.headObject({ Key: key, Bucket: bucket })
.promise()
.then(res => res.ContentLength);
}
// A test
sizeOf('ahihi.mp4', 'output').then(size => console.log(size));
Doc is here.
Solution 3 - Amazon S3
You can simply use the s3 ls
command:
aws s3 ls s3://mybucket --recursive --human-readable --summarize
Outputs
2013-09-02 21:37:53 10 Bytes a.txt
2013-09-02 21:37:53 2.9 MiB foo.zip
2013-09-02 21:32:57 23 Bytes foo/bar/.baz/a
2013-09-02 21:32:58 41 Bytes foo/bar/.baz/b
2013-09-02 21:32:57 281 Bytes foo/bar/.baz/c
2013-09-02 21:32:57 73 Bytes foo/bar/.baz/d
2013-09-02 21:32:57 452 Bytes foo/bar/.baz/e
2013-09-02 21:32:57 896 Bytes foo/bar/.baz/hooks/bar
2013-09-02 21:32:57 189 Bytes foo/bar/.baz/hooks/foo
2013-09-02 21:32:57 398 Bytes z.txt
Total Objects: 10
Total Size: 2.9 MiB
Reference: https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html
Solution 4 - Amazon S3
This is a solution for whoever is using Java and the S3 java library provided by Amazon. If you are using com.amazonaws.services.s3.AmazonS3
you can use a GetObjectMetadataRequest
request which allows you to query the object length.
The libraries you have to use are:
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3 -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.11.511</version>
</dependency>
Imports:
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.*;
And the code you need to get the content length:
GetObjectMetadataRequest metadataRequest = new GetObjectMetadataRequest(bucketName, fileName);
final ObjectMetadata objectMetadata = s3Client.getObjectMetadata(metadataRequest);
long contentLength = objectMetadata.getContentLength();
Before you can execute the code above, you will need to build the S3 client. Here is some example code for that:
AWSCredentials credentials = new BasicAWSCredentials(
accessKey,
secretKey
);
s3Client = AmazonS3ClientBuilder.standard()
.withRegion(clientRegion)
.withCredentials(new AWSStaticCredentialsProvider(credentials))
.build();
Solution 5 - Amazon S3
Using Michael's advice, my successful code looked like this:
require 'net/http'
require 'uri'
file_url = MyObject.first.file.url
url = URI.parse(file_url)
req = Net::HTTP::Head.new url.path
res = Net::HTTP.start(url.host, url.port) {|http|
http.request(req)
}
file_length = res["content-length"]
Solution 6 - Amazon S3
.NET AWS SDK ---- ListObjectsRequest, ListObjectsResponse, S3Object
AmazonS3Client s3 = new AmazonS3Client();
SpaceUsed(s3, "putBucketNameHere");
static void SpaceUsed(AmazonS3Client s3Client, string bucketName)
{
ListObjectsRequest request = new ListObjectsRequest();
request.BucketName = bucketName;
ListObjectsResponse response = s3Client.ListObjects(request);
long totalSize = 0;
foreach (S3Object o in response.S3Objects)
{
totalSize += o.Size;
}
Console.WriteLine("Total Size of bucket " + bucketName + " is " +
Math.Round(totalSize / 1024.0 / 1024.0, 2) + " MB");
}
Solution 7 - Amazon S3
I do something like this in Python to get the cumulative size of all files under a given prefix:
import boto3
bucket = 'your-bucket-name'
prefix = 'some/s3/prefix/'
s3 = boto3.client('s3')
size = 0
result = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
size += sum([x['Size'] for x in result['Contents']])
while result['IsTruncated']:
result = s3.list_objects_v2(
Bucket=bucket, Prefix=prefix,
ContinuationToken=result['NextContinuationToken'])
size += sum([x['Size'] for x in result['Contents']])
print('Total size in MB: ' + str(size / (1000**2)))
Solution 8 - Amazon S3
There is better solution.
$info = $s3->getObjectInfo($yourbucketName, $yourfilename);
print $info['size'];
Solution 9 - Amazon S3
You can also do a listing of the contents of the bucket. The metadata in the listing contains the file sizes of all of the objects. This is how it's implemented in the AWS SDK for PHP.
Solution 10 - Amazon S3
Android Solution
Integrate aws sdk and you get a pretty much straight forward solution:
// ... put this in background thread
List<S3ObjectSummary> s3ObjectSummaries;
s3ObjectSummaries = s3.listObjects(registeredBucket).getObjectSummaries();
for (int i = 0; i < s3ObjectSummaries.size(); i++) {
S3ObjectSummary s3ObjectSummary = s3ObjectSummaries.get(i);
Log.d(TAG, "doInBackground: size " + s3ObjectSummary.getSize());
}
- Here is a link to the official documentation.
- Very important to execute the code in AsyncTask or any means to get you in a background thread, otherwise you get an exception for running network on ui thread.
Solution 11 - Amazon S3
The following python code will provide the size of top 1000 files printing them individually from s3:
import boto3
bucket = 'bucket_name'
prefix = 'prefix'
s3 = boto3.client('s3')
contents = s3.list_objects_v2(Bucket=bucket, MaxKeys=1000, Prefix=prefix)['Contents']
for c in contents:
print('Size (KB):', float(c['Size'])/1000)
Solution 12 - Amazon S3
Ruby solution with head_object:
require 'aws-sdk-s3'
s3 = Aws::S3::Client.new(
region: 'us-east-1', #or any other region
access_key_id: AWS_ACCESS_KEY_ID,
secret_access_key: AWS_SECRET_ACCESS_KEY
)
res = s3.head_object(bucket: bucket_name, key: object_key)
file_size = res[:content_length]
Solution 13 - Amazon S3
PHP code to check s3 object size (or any other object headers), notice the use stream_context_set_default to make sure it only uses a HEAD request
stream_context_set_default(
array(
'http' => array(
'method' => 'HEAD'
)
)
);
$headers = get_headers('http://s3.amazonaws.com/bucketname/filename.jpg', 1);
$headers = array_change_key_case($headers);
$size = trim($headers['content-length'],'"');
Solution 14 - Amazon S3
Golang example, same principle, run head request again the object in question:
func returnKeySizeInMB(bucketName string, key string) {
output, err := svc.HeadObject(
&s3.HeadObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(key),
})
if err != nil {
log.Fatalf("Unable to to send head request to item %q, %v", e.Detail.RequestParameters.Key, err)
}
return int(*output.ContentLength / 1024 / 1024)
}
Here, the parameter key
means the path to the file.
For eg, if the URI of the file is S3://my-personal-bucket/folder1/subfolder1/myfile.pdf
, then the syntax would look like:
output, err := svc.HeadObject(
&s3.HeadObjectInput{
Bucket: aws.String("my-personal-bucket"),
Key: aws.String("folder1/subfolder1/myfile.pdf"),
})
Solution 15 - Amazon S3
Aws C++ solution to get file size
//! Step 1: create s3 client
Aws::S3::S3Client s3Client(cred, config); //!Used cred & config,You can use other options.
//! Step 2: Head Object request
Aws::S3::Model::HeadObjectRequest headObj;
headObj.SetBucket(bucket);
headObj.SetKey(key);
//! Step 3: read size from object header metadata
auto object = s3Client.HeadObject(headObj);
if (object.IsSuccess())
{
fileSize = object.GetResultWithOwnership().GetContentLength();
}
else
{
std::cout << "Head Object error: "
<< object .GetError().GetExceptionName() << " - "
<< object .GetError().GetMessage() << std::endl;
}
Note: Do not use GetObject to extract size, It reads file to extract information.
Solution 16 - Amazon S3
If the file is a private one, we can get the header by SDK.
PHP example:
$head = $client->headObject(
[
'Bucket' => $bucket,
'Key' => $key,
]
);
$result = (int) ($head->get('ContentLength') ?? 0);
Solution 17 - Amazon S3
These days you could also use Amazon S3 Inventory which gives you:
> Size – The object size in bytes.
Solution 18 - Amazon S3
If you are looking to do this with a single file, you can use aws cli head-object
to get the metadata only without downloading the file itself:
$ aws s3api head-object --bucket mybucket --key myfile.csv | jq -r .ContentLength
Explanation
s3api head-object
retrieves the object metadata in json formatjq -r .ContentLength
parses the json to get the size of the body in bytes; the-r
flag removes quotation marks.