Python - Convert a bytes array into JSON format

PythonJson

Python Problem Overview


I want to parse a bytes string in JSON format to convert it into python objects. This is the source I have:

my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'

And this is the desired outcome I want to have:

[{"Date": "2016-05-21T21:35:40Z","CreationDate": "2012-05-05","LogoType": "png","Ref": 164611595,"Classes": [  "Email addresses",  "Passwords"],
"Link": "http://some_link.com"}]

First, I converted the bytes to string:

my_new_string_value = my_bytes_value.decode("utf-8")

but when I try to invoke loads to parse it as JSON:

my_json = json.loads(my_new_string_value)

I get this error:

json.decoder.JSONDecodeError: Expecting value: line 1 column 174 (char 173)

Python Solutions


Solution 1 - Python

Your bytes object is almost JSON, but it's using single quotes instead of double quotes, and it needs to be a string. So one way to fix it is to decode the bytes to str and replace the quotes. Another option is to use ast.literal_eval; see below for details. If you want to print the result or save it to a file as valid JSON you can load the JSON to a Python list and then dump it out. Eg,

import json

my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'

# Decode UTF-8 bytes to Unicode, and convert single quotes 
# to double quotes to make it valid JSON
my_json = my_bytes_value.decode('utf8').replace("'", '"')
print(my_json)
print('- ' * 20)

# Load the JSON to a Python list & dump it back out as formatted JSON
data = json.loads(my_json)
s = json.dumps(data, indent=4, sort_keys=True)
print(s)

output

[{"Date": "2016-05-21T21:35:40Z", "CreationDate": "2012-05-05", "LogoType": "png", "Ref": 164611595, "Classe": ["Email addresses", "Passwords"],"Link":"http://some_link.com"}]
- - - - - - - - - - - - - - - - - - - - 
[    {        "Classe": [            "Email addresses",            "Passwords"        ],
        "CreationDate": "2012-05-05",
        "Date": "2016-05-21T21:35:40Z",
        "Link": "http://some_link.com",
        "LogoType": "png",
        "Ref": 164611595
    }
]


As Antti Haapala mentions in the comments, we can use ast.literal_eval to convert my_bytes_value to a Python list, once we've decoded it to a string.

from ast import literal_eval
import json

my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'

data = literal_eval(my_bytes_value.decode('utf8'))
print(data)
print('- ' * 20)

s = json.dumps(data, indent=4, sort_keys=True)
print(s)

Generally, this problem arises because someone has saved data by printing its Python repr instead of using the json module to create proper JSON data. If it's possible, it's better to fix that problem so that proper JSON data is created in the first place.

Solution 2 - Python

You can simply use,

import json

json.loads(my_bytes_value)

Solution 3 - Python

Python 3.5 + Use io module

import json
import io

my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'

fix_bytes_value = my_bytes_value.replace(b"'", b'"')

my_json = json.load(io.BytesIO(fix_bytes_value))  

Solution 4 - Python

To convert this bytesarray directly to json, you could first convert the bytesarray to a string with decode(), utf-8 is standard. Change the quotation markers.. The last step is to remove the " from the dumped string, to change the json object from string to list.

dumps(s.decode()).replace("'", '"')[1:-1]

Solution 5 - Python

Better solution is:

import json
byte_array_example = b'{"text": "\u0627\u06CC\u0646 \u06CC\u06A9 \u0645\u062A\u0646 \u062A\u0633\u062A\u06CC \u0641\u0627\u0631\u0633\u06CC \u0627\u0633\u062A."}'    
res = json.loads(byte_array_example.decode('unicode_escape'))
print(res)

result:

{'text': 'این یک متن تستی فارسی است.'}

decode by utf-8 cannot decode unicode characters. The right solution is uicode_escape

It is OK

Solution 6 - Python

d = json.dumps(byte_str.decode('utf-8'))

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMerouane BenthameurView Question on Stackoverflow
Solution 1 - PythonPM 2RingView Answer on Stackoverflow
Solution 2 - PythonChaithanya KrishnaView Answer on Stackoverflow
Solution 3 - PythonNovikovView Answer on Stackoverflow
Solution 4 - PythonSimonView Answer on Stackoverflow
Solution 5 - PythonEMAIView Answer on Stackoverflow
Solution 6 - PythonKwameView Answer on Stackoverflow