Reading an JSON file from S3 using Python boto3

PythonJsonAmazon Web-ServicesAmazon S3Boto3

Python Problem Overview


I kept following JSON in S3 bucket 'test'

{
  'Details' : "Something" 
}

I am using following code to read this JSON and printing the key 'Details'

s3 = boto3.resource('s3',
                    aws_access_key_id=<access_key>,
                    aws_secret_access_key=<secret_key>
                    )
content_object = s3.Object('test', 'sample_json.txt')
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(repr(file_content))
print(json_content['Details'])

And i am getting error as 'string indices must be integers' I don't want to download the file from S3 and then reading..

Python Solutions


Solution 1 - Python

As mentioned in the comments above, repr has to be removed and the json file has to use double quotes for attributes. Using this file on aws/s3:

{
  "Details" : "Something"
}

and the following Python code, it works:

import boto3
import json

s3 = boto3.resource('s3')

content_object = s3.Object('test', 'sample_json.txt')
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(file_content)
print(json_content['Details'])
# >> Something

[1]: https://stackoverflow.com/questions/40995251/reading-an-json-file-from-s3-using-python-boto3#comment69199795_40995251 "Alex Hall"

Solution 2 - Python

The following worked for me.

# read_s3.py
from boto3 import client
BUCKET = 'MY_S3_BUCKET_NAME'
FILE_TO_READ = 'FOLDER_NAME/my_file.json'
client = client('s3',
                 aws_access_key_id='MY_AWS_KEY_ID',
                 aws_secret_access_key='MY_AWS_SECRET_ACCESS_KEY'
                )
result = client.get_object(Bucket=BUCKET, Key=FILE_TO_READ) 
text = result["Body"].read().decode()
print(text['Details']) # Use your desired JSON Key for your value 

It is not good idea to hard code the AWS Id & Secret Keys directly. For best practices, you can consider either of the followings:

(1) Read your AWS credentials from a json file (aws_cred.json) stored in your local storage:

from json import load
from boto3 import client
...
credentials = load(open('local_fold/aws_cred.json'))
client = client('s3',
                 aws_access_key_id=credentials['MY_AWS_KEY_ID'],
                 aws_secret_access_key=credentials['MY_AWS_SECRET_ACCESS_KEY']
                )

(2) Read from your environment variable (my preferred option for deployment):

import os
client = boto3.client('s3',
                       aws_access_key_id=os.environ['MY_AWS_KEY_ID'],
                       aws_secret_access_key=os.environ['MY_AWS_SECRET_ACCESS_KEY']
                     )

Let's prepare a shell script (set_env.sh) for setting the environment variables and add our python script (read_s3.py) as follows:

# set_env.sh
export MY_AWS_KEY_ID='YOUR_AWS_ACCESS_KEY_ID'
export MY_AWS_SECRET_ACCESS_KEY='YOUR_AWS_SECRET_ACCESS_KEY'
# execute the python file containing your code as stated above that reads from s3
python read_s3.py # will execute the python script to read from s3

Now execute the shell script in a terminal as follows:

sh set_env.sh

Solution 3 - Python

Wanted to add that the botocore.response.streamingbody works well with json.load:

import json
import boto3

s3 = boto3.resource('s3')

obj = s3.Object(bucket, key)
data = json.load(obj.get()['Body']) 

Solution 4 - Python

You can use the below code in AWS Lambda to read the JSON file from the S3 bucket and process it using python.

import json
import boto3
import sys
import logging

# logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

VERSION = 1.0

s3 = boto3.client('s3')

def lambda_handler(event, context):
    bucket = 'my_project_bucket'
    key = 'sample_payload.json'
    
    response = s3.get_object(Bucket = bucket, Key = key)
    content = response['Body']
    jsonObject = json.loads(content.read())
    print(jsonObject)

Solution 5 - Python

I was stuck for a bit as the decoding didn't work for me (s3 objects are gzipped).

Found this discussion which helped me: https://stackoverflow.com/questions/1543652/python-gzip-is-there-a-way-to-decompress-from-a-string?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

import boto3
import zlib

key = event["Records"][0]["s3"]["object"]["key"]
bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
    
s3_object = S3_RESOURCE.Object(bucket_name, key).get()['Body'].read()

jsonData = zlib.decompress(s3_object, 16+zlib.MAX_WBITS)

If youprint jsonData, you'll see your desired JSON file! If you are running test in AWS itself, be sure to check CloudWatch logs as in lambda it wont output full JSON file if its too long.

Solution 6 - Python

If your json file looks like this:

{
    "test": "test123"
}

You can access it like a dict like this:

BUCKET="Bucket123"

def get_json_from_s3(key: str):
    """
    Retrieves the json file containing responses from s3. returns a dict

    Args:
        key (str): file path to the json file

    Returns:
        dict: json style dict
    """
    data = client.get_object(Bucket=BUCKET, Key=key)
    json_text_bytes = data["Body"].read().decode("utf-8")
    json_text = json.loads(json_text_bytes)
    return json_text
test_dict = get_json_from_s3(key="test.json")
print(test_dict["test"])

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNanjuView Question on Stackoverflow
Solution 1 - PythonbastelflpView Answer on Stackoverflow
Solution 2 - PythonHafizur RahmanView Answer on Stackoverflow
Solution 3 - PythonalukachView Answer on Stackoverflow
Solution 4 - PythonPiyush SinghalView Answer on Stackoverflow
Solution 5 - PythonCerberussianView Answer on Stackoverflow
Solution 6 - PythonWesley CheekView Answer on Stackoverflow