Export data from DynamoDB
Amazon Web-ServicesAmazon DynamodbAmazon Dynamodb-LocalAmazon Web-Services Problem Overview
Is it possible to export data from DynamoDB table in some format?
The concrete use case is that I want to export data from my production dynamodb database and import that data into my local dynamodb instance so my application can work with local copy of data instead of production data.
I use link as a local instance of DynamoDB.
Amazon Web-Services Solutions
Solution 1 - Amazon Web-Services
This will export all items as jsons documents
aws dynamodb scan --table-name TABLE_NAME > export.json
This script will read from remote dynamodb table and import into the local the full table.
TABLE=YOURTABLE
maxItems=25
index=0
DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems)
((index+=1))
echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000
nextToken=$(echo $DATA | jq '.NextToken')
while [[ "${nextToken}" != "" ]]
do
DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems --starting-token $nextToken)
((index+=1))
echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000
nextToken=$(echo $DATA | jq '.NextToken')
done
Here are a version of the script using files to keep the exported data on disk.
TABLE=YOURTABLE
maxItems=25
index=0
DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems)
((index+=1))
echo $DATA | cat > "$TABLE-$index.json"
nextToken=$(echo $DATA | jq '.NextToken')
while [[ "${nextToken}" != "" ]]
do
DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems --starting-token $nextToken)
((index+=1))
echo $DATA | cat > "$TABLE-$index.json"
nextToken=$(echo $DATA | jq '.NextToken')
done
for x in `ls *$TABLE*.json`; do
cat $x | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000
done
Solution 2 - Amazon Web-Services
There is a tool named DynamoDBtoCSV
that can be used for export all the data to a CSV file. However, for the other way around you will have to build your own tool. My suggestion is that you add this functionality to the tool, and contribuite it to the Git repository.
Another way is use AWS Data Pipeline for this task (you will save all the costs of reading the data from outside AWS infraestructure). The approach is similar:
- Build the pipeline for output
- Download the file.
- Parse it with a custom reader.
Solution 3 - Amazon Web-Services
Here is a way to export some datas (oftentime we just want to get a sample of our prod data locally) from a table using aws cli and jq.
Let's assume we have a prod table called unsurprisingly my-prod-table
and a local table called my-local-table
To export the data run the following:
aws dynamodb scan --table-name my-prod-table \
| jq '{"my-local-table": [.Items[] | {PutRequest: {Item: .}}]}' > data.json
Basically what happens is that we scan our prod table, transform the output of the scan to shape into the format of the batchWriteItem and dump the result into a file.
To import the data in your local table run:
aws dynamodb batch-write-item \
--request-items file://data.json \
--endpoint-url http://localhost:8000
Note: There are some restriction with the batch-write-item
request - The BatchWriteItem operation can contain up to 25 individual PutItem and DeleteItem requests and can write up to 16 MB of data. (The maximum size of an individual item is 400 KB.).
Solution 4 - Amazon Web-Services
Export it from the DynamoDB interface to S3.
Then convert it to Json using sed:
sed -e 's/$/}/' -e $'s/\x02/,"/g' -e $'s/\x03/":/g' -e 's/^/{"/' <exported_table> > <exported_table>.json
Solution 5 - Amazon Web-Services
I extend Valy dia solution to allow all the process of exporting with only aws-cli | jq
aws dynamodb scan --max-items 3 --table-name <TABLE_NAME> \
| jq '{"<TABLE_NAME>": [.Items[] | {PutRequest: {Item: .}}]}' > data.json
aws dynamodb describe-table --table-name <TABLE_NAME> > describe.json | jq ' .Table | {"TableName": .TableName, "KeySchema": .KeySchema, "AttributeDefinitions": .AttributeDefinitions, "ProvisionedThroughput": {
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 5
}}' > table-definition.json
aws dynamodb create-table --cli-input-json file://table-definition.json --endpoint-url http://localhost:8000 --region us-east-1
aws dynamodb batch-write-item --request-items file://data.json --endpoint-url http://localhost:8000
aws dynamodb scan --table-name <TABLE_NAME> --endpoint-url http://localhost:8000
Solution 6 - Amazon Web-Services
I think my answer is more similar to Ivailo Bardarov , if planning to run this from linux instance run this
1.Login to your AWS account and go to IAM to create a user with limited policy for a role(for security purpose of course). This should be only limited to read dynamodb table that you would like to backup.
2.Copy the access key and secret and update below command to run it on Linux (but make sure your table is not huge and possibly creating a space issue for the box you are running this on)
AWS_ACCESS_KEY_ID='put_your_key' AWS_SECRET_ACCESS_KEY='put_your_secret' aws --region='put_your_region' dynamodb scan --table-name 'your_table_name'>> export_$(date "+%F-%T").json
Note similar command can be executed on Windows/Powershell I have not tested so I'm not adding it here.
Solution 7 - Amazon Web-Services
Try my simple node.js script dynamo-archive. It exports and imports in JSON format.
Solution 8 - Amazon Web-Services
I found the best current tool for simple import/exports (including round-tripping through DynamoDB Local) is this Python script:
https://github.com/bchew/dynamodump
This script supports schema export/import as well as data import/export. It also uses the batch APIs for efficient operations.
I have used it successfully to take data from a DynamoDB table to DynamoDB local for development purposes and it worked pretty well for my needs.
Solution 9 - Amazon Web-Services
Expanding on @Ivailo Bardarov's answer I wrote the following script duplicate tables that are in a remote DynamoDB to a local one:
#!/bin/bash
declare -a arr=("table1" "table2" "table3" "table4")
for i in "${arr[@]}"
do
TABLE=$i
maxItems=25
index=0
echo "Getting table description of $TABLE from remote database..."
aws dynamodb describe-table --table-name $TABLE > table-description.json
echo
echo "Creating table $TABLE in the local database..."
ATTRIBUTE_DEFINITIONS=$(jq .Table.AttributeDefinitions table-description.json)
KEY_SCHEMA=$(jq .Table.KeySchema table-description.json)
BILLING_MODE=$(jq .Table.BillingModeSummary.BillingMode table-description.json)
READ_CAPACITY_UNITS=$(jq .Table.ProvisionedThroughput.ReadCapacityUnits table-description.json)
WRITE_CAPACITY_UNITS=$(jq .Table.ProvisionedThroughput.WriteCapacityUnits table-description.json)
TABLE_DEFINITION=""
if [[ "$READ_CAPACITY_UNITS" > 0 && "$WRITE_CAPACITY_UNITS" > 0 ]]
then
TABLE_DEFINITION="{\"AttributeDefinitions\":$ATTRIBUTE_DEFINITIONS,\"TableName\":\"$TABLE\",\"KeySchema\":$KEY_SCHEMA,\"ProvisionedThroughput\":{\"ReadCapacityUnits\":$READ_CAPACITY_UNITS,\"WriteCapacityUnits\":$WRITE_CAPACITY_UNITS}}"
else
TABLE_DEFINITION="{\"AttributeDefinitions\":$ATTRIBUTE_DEFINITIONS,\"TableName\":\"$TABLE\",\"KeySchema\":$KEY_SCHEMA,\"BillingMode\":$BILLING_MODE}"
fi
echo $TABLE_DEFINITION > create-table.json
aws dynamodb create-table --cli-input-json file://create-table.json --endpoint-url http://localhost:8000
echo "Querying table $TABLE from remote..."
DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems)
((index+=1))
echo "Saving remote table [$TABLE] contents to inserts.json file..."
echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.json
echo "Inserting rows to $TABLE in local database..."
aws dynamodb batch-write-item --request-items file://inserts.json --endpoint-url http://localhost:8000
nextToken=$(echo $DATA | jq '.NextToken')
while [[ "$nextToken" != "" && "$nextToken" != "null" ]]
do
echo "Querying table $TABLE from remote..."
DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems --starting-token $nextToken)
((index+=1))
echo "Saving remote table [$TABLE] contents to inserts.json file..."
echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.json
echo "Inserting rows to $TABLE in local database..."
aws dynamodb batch-write-item --request-items file://inserts.json --endpoint-url http://localhost:8000
nextToken=$(echo "$DATA" | jq '.NextToken')
done
done
echo "Deleting temporary files..."
rm -f table-description.json
rm -f create-table.json
rm -f inserts.json
echo "Database sync complete!"
This script loops over the string array and for each table name it first gets the description of the table and builds a create JSON file with the minimum required parameters and creates the table. Then it uses rest of the @Ivailo Bardarov's logic to generate inserts and pushes them to the created table. Finally it cleans up the generated JSON files.
Keep in mind, my purpose was to just create a rough duplicate (hence the minimum required parameters) of tables for development purposes.
Solution 10 - Amazon Web-Services
For those of you that would rather do this using java, there is DynamodbToCSV4j.
JSONObject config = new JSONObject();
config.put("accessKeyId","REPLACE");
config.put("secretAccessKey","REPLACE");
config.put("region","eu-west-1");
config.put("tableName","testtable");
d2csv d = new d2csv(config);
Solution 11 - Amazon Web-Services
I have created a utility class to help developers with export. This can be used if you don't want to use data-pipeline feature of AWS. Link to git hub repo is -here
Solution 12 - Amazon Web-Services
DynamoDB now has a native Export to S3 feature (in JSON and Amazon Ion formats) https://aws.amazon.com/blogs/aws/new-export-amazon-dynamodb-table-data-to-data-lake-amazon-s3/
Solution 13 - Amazon Web-Services
You can try this code locally. But first the following command should be executed npm init -y && npm install aws-sdk
const AWS = require('aws-sdk');
AWS.config.update({region:'eu-central-1'});
const fs = require('fs');
const TABLE_NAME = "YOURTABLENAME"
const docClient = new AWS.DynamoDB.DocumentClient({
"sslEnabled": false,
"paramValidation": false,
"convertResponseTypes": false,
"convertEmptyValues": true
});
async function exportDB(){
let params = {
TableName: TABLE_NAME
};
let result = [];
let items;
do {
items = await docClient.scan(params).promise();
items.Items.forEach((item) => result.push(item));
params.ExclusiveStartKey = items.LastEvaluatedKey;
} while(typeof items.LastEvaluatedKey != "undefined");
await fs.writeFileSync("exported_data.json", JSON.stringify(result,null, 4));
console.info("Available count size:", result.length);
}
exportDB();
And run node index.js
I hope it works for you
Solution 14 - Amazon Web-Services
Export the dynamoDb data to a json file in your local using AWS CLI. Below is the example with the filters:
aws dynamodb scan --table-name activities --filter-expression "Flag = :val" --expression-attribute-values "{\":val\": {\"S\": \"F\"}}" --select "SPECIFIC_ATTRIBUTES" --projection-expression "Status" > activitiesRecords.json
Solution 15 - Amazon Web-Services
Dynamo DB now provides a way to export and import data to/from S3 http://aws.amazon.com/about-aws/whats-new/2014/03/06/announcing-dynamodb-cross-region-export-import/
Solution 16 - Amazon Web-Services
if you need you can convert Dynamo data into JSON with this https://2json.net/dynamo
Solution 17 - Amazon Web-Services
In a similar use-case, I have used DynamoDB Streams to trigger AWS Lambda which basically wrote to my DW instance. You could probably write your Lambda to write each of the table changes to a table in your non-production account. This way your Devo table would remain quite close to Prod as well.
Solution 18 - Amazon Web-Services
I used the awesome cyberchef site... https://gchq.github.io/CyberChef
With the csv to json
tool.
Solution 19 - Amazon Web-Services
For really big datasets, running a continuous (and parallel) scan might be time consuming and fragile process (imagine it dying in the middle). Fortunately, AWS recently added an ability to export your DynamoDB table data straight to S3. This is probably the easiest way to achieve what you wanted because it does not require you to write any code and run any task/script because it's fully managed.
After it's done, you can download it from S3 and import to the local DynamoDB instance using logic like foreach record in file: documentClient.putItem
or use some other tooling.
Solution 20 - Amazon Web-Services
In DynamoDB web console select your table, than Actions -> Export/Import