Faster s3 bucket duplication

Amazon Web-ServicesAmazon S3

Amazon Web-Services Problem Overview


I have been trying to find a better command line tool for duplicating buckets than s3cmd. s3cmd can duplicate buckets without having to download and upload each file. The command I normally run to duplicate buckets using s3cmd is:

s3cmd cp -r --acl-public s3://bucket1 s3://bucket2

This works, but it is very slow as it copies each file via the API one at a time. If s3cmd could run in parallel mode, I'd be very happy.

Are there other options available as a command line tools or code that people use to duplicate buckets that are faster than s3cmd?

Edit: Looks like s3cmd-modification is exactly what I'm looking for. Too bad it does not work. Are there any other options?

Amazon Web-Services Solutions


Solution 1 - Amazon Web-Services

AWS CLI seems to do the job perfectly, and has the bonus of being an officially supported tool.

aws s3 sync s3://mybucket s3://backup-mybucket

http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

Supports concurrent transfers by default. See http://docs.aws.amazon.com/cli/latest/topic/s3-config.html#max-concurrent-requests

To quickly transfer a huge number of small files, run the script from an EC2 instance to decrease latency, and increase max_concurrent_requests to reduce the impact of latency. Eg:

aws configure set default.s3.max_concurrent_requests 200

Solution 2 - Amazon Web-Services

If you don't mind using the AWS console, you can:

  1. Select all of the files/folders in the first bucket
  2. Click Actions > Copy
  3. Create a new bucket and select it
  4. Click Actions > Paste

It's still fairly slow, but you can leave it alone and let it do its thing.

Solution 3 - Amazon Web-Services

I have tried cloning two buckets using the AWS web console, the s3cmd and the AWS CLI. Although these methods works most of the time, they are painfully slow.

Then I found s3s3mirror : a specialized tool for syncing two S3 buckets. It's multi-threaded and a lot faster than the other approaches I have tried. I quickly moved Giga-bytes of data from one AWS region to another.

Check it out at https://github.com/cobbzilla/s3s3mirror, or download a Docker container from https://registry.hub.docker.com/u/pmoust/s3s3mirror/

Solution 4 - Amazon Web-Services

For adhoc solution use aws cli to sync between buckets:

aws s3 sync speed depends on:

  • latency for an API call to S3 endpoint
  • amount of API calls made in concurrent

To increase sync speed:

  • run aws s3 sync from an AWS instance (c3.large on FreeBSD is OK ;-) )
  • update ~/.aws/config with:
    -- max_concurrent_requests = 128
    -- max_queue_size = 8096

with following config and instance type I was able to sync bucket (309GB, 72K files, us-east-1) within 474 seconds.

For more generic solution consider - AWS DataPipeLine or S3 cross-region replication.

Solution 5 - Amazon Web-Services

As this is about Google's first hit on this subject, adding extra information.

'Cyno' made a newer version of s3cmd-modification, which now supports parallel bucket-to-bucket syncing. Exactly what I was waiting for as well.

Pull request is at https://github.com/pcorliss/s3cmd-modification/pull/2, his version at https://github.com/pearltrees/s3cmd-modification

Solution 6 - Amazon Web-Services

I don't know of any other S3 command line tools but if nothing comes up here, it might be easiest to write your own.

Pick whatever language and Amazon SDK/Toolkit you prefer. Then you just need to list/retrieve the source bucket contents and copy each file (In parallel obviously)

Looking at the source for s3cmd-modification (and I admit I know nothing about python), it looks like they have not parallelised the bucket-to-bucket code but perhaps you could use the standard upload/download parallel code as a starting point to do this.

Solution 7 - Amazon Web-Services

a simple aws s3 cp s3://[original-bucket] s3://[backup-bucket] --recursive works well (assuming you have aws cli setup)

Solution 8 - Amazon Web-Services

Extending deadwards answer, in 2021 copying objects from one bucket to another takes not more than 2 minutes in AWS console for 1.2 GB data.

  1. Create bucket, enter the bucket name, choose region, copy settings from existing bucket. Create bucket.
  2. Once bucket created, go to the source bucket to which you want to copy the files from.
  3. Select all (if needed or else you can choose desired files and folders), Actions > Copy.
  4. In destination, you need to browse the bucket to which the files and folders to be copied.
  5. Once click the copy button, all the files and folders are copied within a minute or two.

Solution 9 - Amazon Web-Services

if you have aws console access, use AWS cloudshell and

use below command

aws s3 sync s3://mybucket s3://backup-mybucket

no need to install AWS CLI or any tools.

Command taken from the best answer above. Cloudshell will make sure that your command runs smoothly even if you lose connection and faster too since its straight aws-to-aws. no local machine in between.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSean McClearyView Question on Stackoverflow
Solution 1 - Amazon Web-ServicespythonjsgeoView Answer on Stackoverflow
Solution 2 - Amazon Web-ServicesdeadwardsView Answer on Stackoverflow
Solution 3 - Amazon Web-ServicesKetilView Answer on Stackoverflow
Solution 4 - Amazon Web-ServicesTom LimeView Answer on Stackoverflow
Solution 5 - Amazon Web-ServicesJean-Pierre DeckersView Answer on Stackoverflow
Solution 6 - Amazon Web-ServicesGeoff ApplefordView Answer on Stackoverflow
Solution 7 - Amazon Web-ServicesmdmjshView Answer on Stackoverflow
Solution 8 - Amazon Web-ServicesDroidDevView Answer on Stackoverflow
Solution 9 - Amazon Web-ServicesAshish NairView Answer on Stackoverflow