AWS S3 copy files and folders between two buckets
Amazon S3CopyAmazon Web-ServicesAmazon S3 Problem Overview
I have been on the lookout for a tool to help me copy content of an AWS S3 bucket into a second AWS S3 bucket without downloading the content first to the local file system.
I have tried to use the AWS S3 console copy option but that resulted in some nested files being missing.
I have tried to use Transmit app (by Panic). The duplicate command downloads the files first to the local system then uploads them back to the second bucket, which quite inefficient.
Amazon S3 Solutions
Solution 1 - Amazon S3
Copy between S3 Buckets
AWS (just recently) released a command line interface for copying between buckets.
$ aws s3 sync s3://mybucket-src s3://mybucket-target --exclude *.tmp
..
This will copy from one target bucket to another bucket.
See the documentation here : S3 CLI Documentation
Solution 2 - Amazon S3
A simplified example using the aws-sdk gem:
AWS.config(:access_key_id => '...', :secret_access_key => '...')
s3 = AWS::S3.new
s3.buckets['bucket-name'].objects['source-key'].copy_to('target-key')
If you want to perform the copy between different buckets, then specify the target bucket name:
s3.buckets['bucket-name'].objects['source-key'].copy_to('target-key', :bucket_name => 'target-bucket')
Solution 3 - Amazon S3
You can now do it from the S3 admin interface. Just go into one bucket select all your folders actions->copy
. Then move into your new bucket actions->paste
.
Solution 4 - Amazon S3
Copy between buckets in different regions
$ aws s3 cp s3://src_bucket/file s3://dst_bucket/file --source-region eu-west-1 --region ap-northeast-1
The above command copies a file from a bucket in Europe (eu-west-1) to Japan (ap-northeast-1). You can get the code name for your bucket's region with this command:
$ aws s3api get-bucket-location --bucket my_bucket
By the way, using Copy and Paste in the S3 web console is easy, but it seems to download from the source bucket into the browser, and then upload to the destination bucket. Using "aws s3" was much faster for me.
Solution 5 - Amazon S3
It's possible with recent aws-sdk gem, see the code sample:
require 'aws-sdk'
AWS.config(
:access_key_id => '***',
:secret_access_key => '***',
:max_retries => 10
)
file = 'test_file.rb'
bucket_0 = {:name => 'bucket_from', :endpoint => 's3-eu-west-1.amazonaws.com'}
bucket_1 = {:name => 'bucket_to', :endpoint => 's3.amazonaws.com'}
s3_interface_from = AWS::S3.new(:s3_endpoint => bucket_0[:endpoint])
bucket_from = s3_interface_from.buckets[bucket_0[:name]]
bucket_from.objects[file].write(open(file))
s3_interface_to = AWS::S3.new(:s3_endpoint => bucket_1[:endpoint])
bucket_to = s3_interface_to.buckets[bucket_1[:name]]
bucket_to.objects[file].copy_from(file, {:bucket => bucket_from})
more details: https://stackoverflow.com/questions/3459177/how-to-copy-file-across-buckets-using-aws-s3-gem
Solution 6 - Amazon S3
I have created a Docker executable of s3s3mirror tool. A utility to copy and mirror from an AWS S3 bucket to another.
It is threaded allowing parallel COPY and very memory efficient, it succeeds where s3cmd completely fails.
Usage:
docker run -e AWS_ACCESS_KEY_ID=FOO -e AWS_SECRET_ACCESS_KEY=BAR pmoust/s3s3mirror [OPTIONS] source_bucket[/prefix] dest_bucket[/prefix]
For a full list of options try:
docker run pmoust/s3s3mirror
Solution 7 - Amazon S3
Checkout the documentation below. I guess thats what you are looking for. http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTObjectCOPY.html
RightAws gem's S3Interface has a copy functions which does the above.
http://rubydoc.info/gems/right_aws/3.0.0/RightAws/S3Interface#copy-instance_method
Solution 8 - Amazon S3
I'd imagine you've probably found a good solution by now, but for others who are encountering this problem (as I was just recently), I've crafted a simple utility specifically for the purpose of mirroring one S3 bucket to another in a highly concurrent, yet CPU and memory efficient manner.
It's on github under an Apache License here: https://github.com/cobbzilla/s3s3mirror
When you have a very large bucket and are looking for maximum performance, it might be worth trying.
If you decide to give it a try please let me know if you have any feedback.
Solution 9 - Amazon S3
If you are in shell and want to copy multiple files but not all files: s3cmd cp --recursive s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]
Solution 10 - Amazon S3
I wrote a script that backs up an S3 bucket: https://github.com/roseperrone/aws-backup-rake-task
#!/usr/bin/env python
from boto.s3.connection import S3Connection
import re
import datetime
import sys
import time
def main():
s3_ID = sys.argv[1]
s3_key = sys.argv[2]
src_bucket_name = sys.argv[3]
num_backup_buckets = sys.argv[4]
connection = S3Connection(s3_ID, s3_key)
delete_oldest_backup_buckets(connection, num_backup_buckets)
backup(connection, src_bucket_name)
def delete_oldest_backup_buckets(connection, num_backup_buckets):
"""Deletes the oldest backup buckets such that only the newest NUM_BACKUP_BUCKETS - 1 buckets remain."""
buckets = connection.get_all_buckets() # returns a list of bucket objects
num_buckets = len(buckets)
backup_bucket_names = []
for bucket in buckets:
if (re.search('backup-' + r'\d{4}-\d{2}-\d{2}' , bucket.name)):
backup_bucket_names.append(bucket.name)
backup_bucket_names.sort(key=lambda x: datetime.datetime.strptime(x[len('backup-'):17], '%Y-%m-%d').date())
# The buckets are sorted latest to earliest, so we want to keep the last NUM_BACKUP_BUCKETS - 1
delete = len(backup_bucket_names) - (int(num_backup_buckets) - 1)
if delete <= 0:
return
for i in range(0, delete):
print 'Deleting the backup bucket, ' + backup_bucket_names[i]
connection.delete_bucket(backup_bucket_names[i])
def backup(connection, src_bucket_name):
now = datetime.datetime.now()
# the month and day must be zero-filled
new_backup_bucket_name = 'backup-' + str('%02d' % now.year) + '-' + str('%02d' % now.month) + '-' + str(now.day);
print "Creating new bucket " + new_backup_bucket_name
new_backup_bucket = connection.create_bucket(new_backup_bucket_name)
copy_bucket(src_bucket_name, new_backup_bucket_name, connection)
def copy_bucket(src_bucket_name, dst_bucket_name, connection, maximum_keys = 100):
src_bucket = connection.get_bucket(src_bucket_name);
dst_bucket = connection.get_bucket(dst_bucket_name);
result_marker = ''
while True:
keys = src_bucket.get_all_keys(max_keys = maximum_keys, marker = result_marker)
for k in keys:
print 'Copying ' + k.key + ' from ' + src_bucket_name + ' to ' + dst_bucket_name
t0 = time.clock()
dst_bucket.copy_key(k.key, src_bucket_name, k.key)
print time.clock() - t0, ' seconds'
if len(keys) < maximum_keys:
print 'Done backing up.'
break
result_marker = keys[maximum_keys - 1].key
if __name__ =='__main__':main()
I use this in a rake task (for a Rails app):
desc "Back up a file onto S3"
task :backup do
S3ID = "AKIAJM3NRWC7STXWUWVQ"
S3KEY = "0A5kuzV+E1dkaPjZxHQAezz1GlSddJd0iS5sNpry"
SRCBUCKET = "primary-mzgd"
NUM_BACKUP_BUCKETS = 2
Dir.chdir("#{Rails.root}/lib/tasks")
system "./do_backup.py #{S3ID} #{S3KEY} #{SRCBUCKET} #{NUM_BACKUP_BUCKETS}"
end
Solution 11 - Amazon S3
To copy from one S3 bucket to same or another S3 bucket without downloading to local, its pretty simple. Use the below shell command.
hdfs dfs -cp -f "s3://AccessKey:SecurityKey@ExternalBucket/SourceFoldername/*.*" "s3://AccessKey:SecurityKey@ExternalBucket/TargetFoldername"
This will copy all the files from the source bucket's SourceFoldername
folder to target bucket's TargetFoldername
folder. In the above code, please replace AccessKey
,SecurityKey
and ExternalBucket
with your corresponding values.
Solution 12 - Amazon S3
from AWS cli https://aws.amazon.com/cli/ you could do
aws s3 ls
- This will list all the S3 buckets
aws cp --recursive s3://<source bucket> s3://<destination bucket>
- This will copy the files from one bucket to another
Note* Very useful when creating cross region replication buckets, by doing the above, you files are all tracked and an update to the source region file will be propagated to the replicated bucket. Everything but the file deletions are synced.
For CRR make sure you have versioning enabled on the buckets.
Solution 13 - Amazon S3
I hear there's a node module for that if you're into javascript :p
From the knox-copy docs:
knoxCopy = require 'knox-copy'
client = knoxCopy.createClient
key: '<api-key-here>'
secret: '<secret-here>'
bucket: 'backups'
client.copyBucket
fromBucket: 'uploads'
fromPrefix: '/nom-nom'
toPrefix: "/upload_backups/#{new Date().toISOString()}"
(err, count) ->
console.log "Copied #{count} files"
Solution 14 - Amazon S3
I was informed that you can also do this using s3distcp on an EMR cluster. It is supposed to be faster for data containing large files. It works well enough on small sets of data - but I would have preferred another solution given the learning curve it took to set up for so little data (I've never worked with EMR before).
Here's a link from the AWS Documentation: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html
Update: For the same data set, s3s3mirror was much faster than s3distcp or the AWS cli. Much easier to set up, too.
Solution 15 - Amazon S3
As Neel Bhaat has explained in this blog, there are many different tools that can be used for this purpose. Some are AWS provided, where most are third party tools. All these tools require you to save your AWS account key and secret in the tool itself. Be very cautious when using third party tools, as the credentials you save in might cost you, your entire worth and drop you dead.
Therefore, I always recommend using the AWS CLI for this purpose. You can simply install this from this link. Next, run the following command and save your key, secret values in AWS CLI.
aws configure
And use the following command to sync your AWS S3 Bucket to your local machine. (The local machine should have AWS CLI installed)
aws s3 sync <source> <destination>
Examples:
-
For AWS S3 to Local Storage
aws s3 sync
-
From Local Storage to AWS S3
aws s3 sync
-
From AWS s3 bucket to another bucket
aws s3 sync
Solution 16 - Amazon S3
How about aws s3 sync
cli command.
aws s3 sync s3://bucket1/ s3://bucket2/
Solution 17 - Amazon S3
The best way to copy S3 bucket is using the AWS CLI.
It involves these 3 steps:
- Installing AWS CLI on your server.
> https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html
-
If you are copying buckets between two AWS accounts, you need to attach correct policy with each bucket.
-
After this use this command to copy from one bucket to another.
> aws s3 sync s3://sourcebucket s3://destinationbucket
The details of step 2 and step 3 are given in this link:
> https://aws.amazon.com/premiumsupport/knowledge-center/account-transfer-s3/
Solution 18 - Amazon S3
You can write a Java App - maybe even a GUI SWING App that uses the AWS Java APIs To copy objects see -
Solution 19 - Amazon S3
Adding Copying objects across AWS accounts using S3 Batch Operations because it hasn't been mentioned here yet. This is the method I'm currently trying out because I have about 1 million objects I need to move to a new account, and cp and sync don't work for me because of expirations of some token, and I don't have a way to figure out what token it is, as my general access token is working just fine.
Solution 20 - Amazon S3
As of 2020 if you are using s3cmd you can copy a folder from bucket1 to bucket2 using the following command
s3cmd cp --recursive s3://bucket1/folder_name/ s3://bucket2/folder_name/
--recursive
is necessary to recursively copy everything in the folder, also note that you have to specify "/" after the folder name, otherwise it will fail.