What is the difference between scan and query in dynamodb? When use scan / query?
Amazon Web-ServicesAmazon DynamodbAmazon Web-Services Problem Overview
A query operation as specified in DynamoDb documentation:
> A query operation searches only primary key attribute values and supports a subset of comparison operators on key attribute values to refine the search process.
and the scan operation:
> A scan operation scans the entire table. You can specify filters to apply to the results to refine the values returned to you, after the complete scan.
Which is best based on Performance and Cost Considerations.
Amazon Web-Services Solutions
Solution 1 - Amazon Web-Services
When creating a Dynamodb table select Primary Keys and Local Secondary Indexes (LSIs) so that a Query operation returns the items you want.
Query operations only support an equal operator evaluation of the Primary Key, but conditional (=, <, <=, >, >=, Between, Begin) on the Sort Key.
Scan operations are generally slower and more expensive as the operation has to iterate through each item in your table to get the items you are requesting.
Example:
Table: CustomerId, AccountType, Country, LastPurchase
Primary Key: CustomerId + AccountType
In this example, you can use a Query operation to get:
- A CustomerId with a conditional filter on AccountType
A Scan operation would need to be used to return:
- All Customers with a specific AccountType
- Items based on conditional filters by Country, ie All Customers from USA
- Items based on conditional filters by LastPurchase, ie All Customers that made a purchase in the last month
To avoid scan operations on frequently used operations create a Local Secondary Index (LSI) or Global Secondary Index (GSI).
Example:
Table: CustomerId, AccountType, Country, LastPurchase
Primary Key: CustomerId + AccountType
GSI: AccountType + CustomerId
LSI: CustomerId + LastPurchase
In this example a Query operation can allow you to get:
- A CustomerId with a conditional filter on AccountType
- [GSI] A conditional filter on CustomerIds for a specific AccountType
- [LSI] A CustomerId with a conditional filter on LastPurchase
Solution 2 - Amazon Web-Services
You are having dynamodb table partition key/primary key as customer_country
. If you use query, customer_country
is the mandatory field to make query operation. All the filters can be made only items that belongs to customer_country
.
If you perform table scan the filter will be performed on all partition key/primary key. First it fetched all data and apply filter after fetching from table.
eg:
here customer_country
is the partition key/primary key
and id
is the sort_key
-----------------------------------
customer_country | name | id
-----------------------------------
VV | Tom | 1
VV | Jack | 2
VV | Mary | 4
BB | Nancy | 5
BB | Lom | 6
BB | XX | 7
CC | YY | 8
CC | ZZ | 9
------------------------------------
-
If you perform query operation it applies only on
customer_country
value. The value should only be equal operator (=). -
So only items equal to that partition key/primary key value are fetched.
-
If you perform scan operation it fetches all items in that table and filter out data after it takes that data.
Note: Don't perform scan operation it exceeds your RCU.
Solution 3 - Amazon Web-Services
Its similar as in the relational database.
Get query
you are using a primary key in where
condition, The computation complexity is log(n)
as the most of key structure is binary tree.
while scan
query you have to scan whole table then apply filter on every single row
to find the right result. The performance is O(n)
. Its much slower if your table is big.
In short, Try to use query
if you know primary key. only scan
for only the worst case.
Also, think about the global secondary index to support a different kind of queries on different keys to gain performance objective
Solution 4 - Amazon Web-Services
In terms of performance, I think it's good practice to design your table for applications to use Query
instead of Scan
. Because a scan operation always scan the entire table before it filters out the desired values, which means it takes more time and space to process data operations such as read, write and delete. For more information, please refer to the official document
Solution 5 - Amazon Web-Services
Query is much better than Scan - performence wise. scan, as it's name imply, will scan the whole table. But you must be well aware of the table key, sort key, indexes and and related sort indexes in order to know that you can use the Query. if you filter your query using:
- key
- key & key sort
- index
- index and it's related sort key
use Query! otherwise use scan which is more flexible about which columns you can filter.
you can NOT Query if:
- more that 2 fields in the filter (e.g. key, sort and index)
- sort key only (of primary key or index)
- regular fields (not key, index or sort)
- mixed index and sort (index1 with sort of index2)\
- ...
a good explaination: https://medium.com/@amos.shahar/dynamodb-query-vs-scan-sql-syntax-and-join-tables-part-1-371288a7cb8f