What is difference between AWS S3 Select and AWS Athena?

Amazon Web-ServicesAmazon S3Amazon AthenaAmazon S3-Select

Amazon Web-Services Problem Overview


I am trying to understand what is difference between AWS Athena service and the newly released S3 select (still in preview).

How are use cases different for both of those? It seems both help in selecting partial data from S3.

Amazon Web-Services Solutions


Solution 1 - Amazon Web-Services

Also looks like we are missing one major thing:

S3 Select operates on only one object while Athena to run queries across multiple paths, which will include all files within that path.

Solution 2 - Amazon Web-Services

You can think about AWS S3 Select as a cost-efficient storage optimization that allows retrieving data that matches the predicate in S3 and glacier aka push down filtering.

AWS Athena is fully managed analytical service that allows running arbitrary ANSI SQL compliant queries - group by, having, window and geo functions, SQL DDL and DML.

Solution 3 - Amazon Web-Services

Athena is (from the little I've used it) more intended as a business reporting or analysis tool backed by S3.

S3 select appears to use the same sort of technology, but I would guess it's aimed more at direct use by applications to filter or shard their data sets.

Solution 4 - Amazon Web-Services

Amazon Athena : Amazon Athena is a query service that makes it easy to analyze data stored in S3 using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, pay only for the queries. It scales automatically – executing queries in parallel, this makes it to produce faster results, even with large datasets and complex queries.

use cases : Athena can be used to process logs, perform ad-hoc analysis, and run interactive queries and joins. it run queries across multiple paths which include all the files under that path.

S3 Select : S3 Select is an S3 feature designed It works by retrieving a subset of an object’s data (using simple SQL expressions) instead of the entire object, which can be up to 5 terabytes in size. s3 select runs queries on a single object at a time in the s3 bucket.

Conclusion : Athena can used for complex queries on the files, and span multiple folders under S3 bucket.
S3 Select can used for simple queries based in a single object.

Solution 5 - Amazon Web-Services

S3 Select makes it easy to retrieve specific data from the contents of an object using simple SQL expressions. There is no need to retrieve the entire object. This can be used with Lambda to build serverless apps and can tied up with Big Data frameworks like Apache Spark and Presto. Can improve the performance up to 400%.

Amazon Athena is an interactive query service. It is serverless. No need to load data into Athena. Built on Presto and runs standard SQL. Mainly used to analyze Big Data.

Solution 6 - Amazon Web-Services

To give an overview as per my understanding :

> Amazon Athena is an interactive query service that makes it easy to > analyze data in Amazon S3 using standard SQL. Athena is serverless, so > there is no infrastructure to manage, and you pay only for the queries > that you run.

The Major Advantage of this as of now is :

Athena is out-of-the-box integrated with AWS Glue Data Catalog, you can also use Glue’s fully-managed ETL capabilities to transform data or convert it into columnar formats to optimize cost and improve performance.

Now as far the S3 Select Goes :

  • At present, there is no charge for using S3 Select while it is in preview, and there is no definition of pricing. However, you will need to apply at the reference

  • While in preview S3 Select supports CSV, JSON, and Parquet files with or without GZIP compression. During the preview objects that are encrypted at rest are not supported.

  • Because S3 Select is still in preview, AWS doesn't have internal cases to verify how the service is being used. However, I could find a reference from a blog that might interest you.

In my opinion, you can view this Twitch Video that can help you lot.

Solution 7 - Amazon Web-Services

In addition to @abc123's answer, S3 Select only supports SELECT

https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-glacier-select-sql-reference-select.html

> Amazon S3 Select and S3 Glacier Select support only the SELECT SQL > command. The following ANSI standard clauses are supported for SELECT:

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser3444718View Question on Stackoverflow
Solution 1 - Amazon Web-Servicesabc123View Answer on Stackoverflow
Solution 2 - Amazon Web-ServicesSayat SatybaldView Answer on Stackoverflow
Solution 3 - Amazon Web-ServicesmcfinniganView Answer on Stackoverflow
Solution 4 - Amazon Web-Servicesanuj patelView Answer on Stackoverflow
Solution 5 - Amazon Web-ServicesPhoenixView Answer on Stackoverflow
Solution 6 - Amazon Web-ServicesKush VyasView Answer on Stackoverflow
Solution 7 - Amazon Web-ServicesMarcello RomaniView Answer on Stackoverflow