How to get input file name as column in AWS Athena external tables

SqlAmazon Web-ServicesAmazon S3PrestoAmazon Athena

Sql Problem Overview


I have external tables created in AWS Athena to query S3 data, however, the location path has 1000+ files. So I need the corresponding filename of the record to be displayed as a column in the table.

select file_name , col1 from table where file_name = "test20170516"

In short, I need to know INPUT__FILE__NAME(hive) equivalent in AWS Athena Presto or any other ways to achieve the same.

Sql Solutions


Solution 1 - Sql

You can do this with the $path pseudo column.

select "$path" from table

Solution 2 - Sql

If you need just the filename, you can extract it with regeexp_extract().

To use it in Athena on the "$path" you can do something like this:

SELECT regexp_extract("$path", '[^/]+$') AS filename from table;

If you need the filename without the extension, you can do:

SELECT regexp_extract("$path", '[ \w-]+?(?=\.)') AS filename_without_extension from table;

Here is the documentation on Presto Regular Expression Functions

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRajeevView Question on Stackoverflow
Solution 1 - Sqljens walterView Answer on Stackoverflow
Solution 2 - SqlcampetersonView Answer on Stackoverflow