What is the difference between scan and query in dynamodb? When use scan / query?

Amazon Web-ServicesAmazon Dynamodb

Amazon Web-Services Problem Overview


A query operation as specified in DynamoDb documentation:

> A query operation searches only primary key attribute values and supports a subset of comparison operators on key attribute values to refine the search process.

and the scan operation:

> A scan operation scans the entire table. You can specify filters to apply to the results to refine the values returned to you, after the complete scan.

Which is best based on Performance and Cost Considerations.

Amazon Web-Services Solutions


Solution 1 - Amazon Web-Services

When creating a Dynamodb table select Primary Keys and Local Secondary Indexes (LSIs) so that a Query operation returns the items you want.

Query operations only support an equal operator evaluation of the Primary Key, but conditional (=, <, <=, >, >=, Between, Begin) on the Sort Key.

Scan operations are generally slower and more expensive as the operation has to iterate through each item in your table to get the items you are requesting.

Example:

Table: CustomerId, AccountType, Country, LastPurchase

Primary Key: CustomerId + AccountType

In this example, you can use a Query operation to get:

  1. A CustomerId with a conditional filter on AccountType

A Scan operation would need to be used to return:

  1. All Customers with a specific AccountType
  2. Items based on conditional filters by Country, ie All Customers from USA
  3. Items based on conditional filters by LastPurchase, ie All Customers that made a purchase in the last month

To avoid scan operations on frequently used operations create a Local Secondary Index (LSI) or Global Secondary Index (GSI).

Example:

Table: CustomerId, AccountType, Country, LastPurchase

Primary Key: CustomerId + AccountType
GSI: AccountType + CustomerId
LSI: CustomerId + LastPurchase

In this example a Query operation can allow you to get:

  1. A CustomerId with a conditional filter on AccountType
  2. [GSI] A conditional filter on CustomerIds for a specific AccountType
  3. [LSI] A CustomerId with a conditional filter on LastPurchase

Solution 2 - Amazon Web-Services

You are having dynamodb table partition key/primary key as customer_country. If you use query, customer_country is the mandatory field to make query operation. All the filters can be made only items that belongs to customer_country.

If you perform table scan the filter will be performed on all partition key/primary key. First it fetched all data and apply filter after fetching from table.

eg:

here customer_country is the partition key/primary key and id is the sort_key

-----------------------------------

customer_country | name   | id

-----------------------------------
VV               | Tom    | 1

VV               | Jack   | 2

VV               | Mary   | 4

BB               | Nancy  | 5

BB               | Lom    | 6

BB               | XX     | 7

CC               | YY     | 8

CC               | ZZ     | 9

------------------------------------
  • If you perform query operation it applies only on customer_country value. The value should only be equal operator (=).

  • So only items equal to that partition key/primary key value are fetched.

  • If you perform scan operation it fetches all items in that table and filter out data after it takes that data.

Note: Don't perform scan operation it exceeds your RCU.

Solution 3 - Amazon Web-Services

Its similar as in the relational database.

Get query you are using a primary key in where condition, The computation complexity is log(n) as the most of key structure is binary tree.

while scan query you have to scan whole table then apply filter on every single row to find the right result. The performance is O(n). Its much slower if your table is big.

In short, Try to use query if you know primary key. only scan for only the worst case.

Also, think about the global secondary index to support a different kind of queries on different keys to gain performance objective

Solution 4 - Amazon Web-Services

In terms of performance, I think it's good practice to design your table for applications to use Query instead of Scan. Because a scan operation always scan the entire table before it filters out the desired values, which means it takes more time and space to process data operations such as read, write and delete. For more information, please refer to the official document

Solution 5 - Amazon Web-Services

Query is much better than Scan - performence wise. scan, as it's name imply, will scan the whole table. But you must be well aware of the table key, sort key, indexes and and related sort indexes in order to know that you can use the Query. if you filter your query using:

  • key
  • key & key sort
  • index
  • index and it's related sort key

use Query! otherwise use scan which is more flexible about which columns you can filter.

you can NOT Query if:

  • more that 2 fields in the filter (e.g. key, sort and index)
  • sort key only (of primary key or index)
  • regular fields (not key, index or sort)
  • mixed index and sort (index1 with sort of index2)\
  • ...

a good explaination: https://medium.com/@amos.shahar/dynamodb-query-vs-scan-sql-syntax-and-join-tables-part-1-371288a7cb8f

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionsamsonView Question on Stackoverflow
Solution 1 - Amazon Web-ServicesKinmanView Answer on Stackoverflow
Solution 2 - Amazon Web-ServicesOK200View Answer on Stackoverflow
Solution 3 - Amazon Web-ServicesJoey TrangView Answer on Stackoverflow
Solution 4 - Amazon Web-ServicesLiutong ChenView Answer on Stackoverflow
Solution 5 - Amazon Web-ServicesOpen VoipView Answer on Stackoverflow