Faster s3 bucket duplication
Amazon Web-ServicesAmazon S3Amazon Web-Services Problem Overview
I have been trying to find a better command line tool for duplicating buckets than s3cmd. s3cmd
can duplicate buckets without having to download and upload each file. The command I normally run to duplicate buckets using s3cmd is:
s3cmd cp -r --acl-public s3://bucket1 s3://bucket2
This works, but it is very slow as it copies each file via the API one at a time. If s3cmd
could run in parallel mode, I'd be very happy.
Are there other options available as a command line tools or code that people use to duplicate buckets that are faster than s3cmd
?
Edit: Looks like s3cmd-modification is exactly what I'm looking for. Too bad it does not work. Are there any other options?
Amazon Web-Services Solutions
Solution 1 - Amazon Web-Services
AWS CLI seems to do the job perfectly, and has the bonus of being an officially supported tool.
aws s3 sync s3://mybucket s3://backup-mybucket
http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
Supports concurrent transfers by default. See http://docs.aws.amazon.com/cli/latest/topic/s3-config.html#max-concurrent-requests
To quickly transfer a huge number of small files, run the script from an EC2 instance to decrease latency, and increase max_concurrent_requests
to reduce the impact of latency. Eg:
aws configure set default.s3.max_concurrent_requests 200
Solution 2 - Amazon Web-Services
If you don't mind using the AWS console, you can:
- Select all of the files/folders in the first bucket
- Click Actions > Copy
- Create a new bucket and select it
- Click Actions > Paste
It's still fairly slow, but you can leave it alone and let it do its thing.
Solution 3 - Amazon Web-Services
I have tried cloning two buckets using the AWS web console, the s3cmd
and the AWS CLI. Although these methods works most of the time, they are painfully slow.
Then I found s3s3mirror
: a specialized tool for syncing two S3 buckets. It's multi-threaded and a lot faster than the other approaches I have tried. I quickly moved Giga-bytes of data from one AWS region to another.
Check it out at https://github.com/cobbzilla/s3s3mirror, or download a Docker container from https://registry.hub.docker.com/u/pmoust/s3s3mirror/
Solution 4 - Amazon Web-Services
For adhoc solution use aws cli
to sync between buckets:
aws s3 sync
speed depends on:
- latency for an API call to S3 endpoint
- amount of API calls made in concurrent
To increase sync speed:
- run
aws s3 sync
from an AWS instance (c3.large on FreeBSD is OK ;-) ) - update ~/.aws/config with:
--max_concurrent_requests = 128
--max_queue_size = 8096
with following config and instance type I was able to sync bucket (309GB, 72K files, us-east-1) within 474 seconds.
For more generic solution consider - AWS DataPipeLine or S3 cross-region replication.
Solution 5 - Amazon Web-Services
As this is about Google's first hit on this subject, adding extra information.
'Cyno' made a newer version of s3cmd-modification, which now supports parallel bucket-to-bucket syncing. Exactly what I was waiting for as well.
Pull request is at https://github.com/pcorliss/s3cmd-modification/pull/2, his version at https://github.com/pearltrees/s3cmd-modification
Solution 6 - Amazon Web-Services
I don't know of any other S3 command line tools but if nothing comes up here, it might be easiest to write your own.
Pick whatever language and Amazon SDK/Toolkit you prefer. Then you just need to list/retrieve the source bucket contents and copy each file (In parallel obviously)
Looking at the source for s3cmd-modification (and I admit I know nothing about python), it looks like they have not parallelised the bucket-to-bucket code but perhaps you could use the standard upload/download parallel code as a starting point to do this.
Solution 7 - Amazon Web-Services
a simple aws s3 cp s3://[original-bucket] s3://[backup-bucket] --recursive
works well (assuming you have aws cli setup)
Solution 8 - Amazon Web-Services
Extending deadwards answer, in 2021 copying objects from one bucket to another takes not more than 2 minutes in AWS console for 1.2 GB data.
- Create bucket, enter the bucket name, choose region, copy settings from existing bucket. Create bucket.
- Once bucket created, go to the source bucket to which you want to copy the files from.
- Select all (if needed or else you can choose desired files and folders), Actions > Copy.
- In destination, you need to browse the bucket to which the files and folders to be copied.
- Once click the copy button, all the files and folders are copied within a minute or two.
Solution 9 - Amazon Web-Services
if you have aws console access, use AWS cloudshell
and
use below command
aws s3 sync s3://mybucket s3://backup-mybucket
no need to install AWS CLI
or any tools.
Command taken from the best answer above. Cloudshell will make sure that your command runs smoothly even if you lose connection and faster too since its straight aws-to-aws. no local machine in between.