Python load json file with UTF-8 BOM header

PythonJson

Python Problem Overview


I needed to parse files generated by other tool, which unconditionally outputs json file with UTF-8 BOM header (EFBBBF). I soon found that this was the problem, as Python 2.7 module can't seem to parse it:

>>> import json
>>> data = json.load(open('sample.json'))

ValueError: No JSON object could be decoded

Removing BOM, solves it, but I wonder if there is another way of parsing json file with BOM header?

Python Solutions


Solution 1 - Python

You can open with codecs:

import json
import codecs

json.load(codecs.open('sample.json', 'r', 'utf-8-sig'))

or decode with utf-8-sig yourself and pass to loads:

json.loads(open('sample.json').read().decode('utf-8-sig'))

Solution 2 - Python

Simple! You don't even need to import codecs.

with open('sample.json', encoding='utf-8-sig') as f:
	data = json.load(f)

Solution 3 - Python

Since json.load(stream) uses json.loads(stream.read()) under the hood, it won't be that bad to write a small hepler function that lstrips the BOM:

from codecs import BOM_UTF8

def lstrip_bom(str_, bom=BOM_UTF8):
    if str_.startswith(bom):
        return str_[len(bom):]
    else:
        return str_

json.loads(lstrip_bom(open('sample.json').read()))    

In other situations where you need to wrap a stream and fix it somehow you may look at inheriting from codecs.StreamReader.

Solution 4 - Python

you can also do it with keyword with

import codecs
with codecs.open('samples.json', 'r', 'utf-8-sig') as json_file:  
    data = json.load(json_file)

or better:

import io
with io.open('samples.json', 'r', encoding='utf-8-sig') as json_file:  
    data = json.load(json_file)

Solution 5 - Python

If this is a one-off, a very simple super high-tech solution that worked for me...

  • Open the JSON file in your favorite text editor.
  • Select-all
  • Create a new file
  • Paste
  • Save.

BOOM, BOM header gone!

Solution 6 - Python

I removed the BOM manually with Linux command.

First I check if there are efbb bf bytes for the file, with head i_have_BOM | xxd.

Then I run dd bs=1 skip=3 if=i_have_BOM.json of=I_dont_have_BOM.json.

bs=1 process 1 byte each time, skip=3, skip the first 3 bytes.

Solution 7 - Python

I'm using utf-8-sig just with import json

with open('estados.json', encoding='utf-8-sig') as json_file:
data = json.load(json_file)
print(data)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionthetaView Question on Stackoverflow
Solution 1 - PythonPavel AnossovView Answer on Stackoverflow
Solution 2 - PythonaerinView Answer on Stackoverflow
Solution 3 - PythonnewtoverView Answer on Stackoverflow
Solution 4 - PythonMohamed Ali MimouniView Answer on Stackoverflow
Solution 5 - PythonMike NView Answer on Stackoverflow
Solution 6 - PythonRickView Answer on Stackoverflow
Solution 7 - PythonRodrigo GrossiView Answer on Stackoverflow