Python JSON encoder convert NaNs to null instead
PythonJsonNumpyNanPython Problem Overview
I'm writing code to receive an arbitrary object (possibly nested) capable of being converted to JSON.
The default behavior for Python's builtin JSON encoder is to convert NaNs to NaN
, e.g. json.dumps(np.NaN)
results in NaN
. How can I change this NaN
value to null
?
I tried to subclass JSONEncoder
and override the default()
method as follows:
from json import JSONEncoder, dumps
import numpy as np
class NanConverter(JSONEncoder):
def default(self, obj):
try:
_ = iter(obj)
except TypeError:
if isinstance(obj, float) and np.isnan(obj):
return "null"
return JSONEncoder.default(self, obj)
>>> d = {'a': 1, 'b': 2, 'c': 3, 'e': np.nan, 'f': [1, np.nan, 3]}
>>> dumps(d, cls=NanConverter)
'{"a": 1, "c": 3, "b": 2, "e": NaN, "f": [1, NaN, 3]}'
EXPECTED RESULT: '{"a": 1, "c": 3, "b": 2, "e": null, "f": [1, null, 3]}'
Python Solutions
Solution 1 - Python
This seems to achieve my objective:
import simplejson
>>> simplejson.dumps(d, ignore_nan=True)
Out[3]: '{"a": 1, "c": 3, "b": 2, "e": null, "f": [1, null, 3]}'
Solution 2 - Python
-
As @Gerrat points out, your hook
dumps(d, cls=NanConverter)
unfortunately won't work. -
@Alexander's
simplejson.dumps(d, ignore_nan=True)
works but introduces an additional dependency (simplejson
).
If we introduce another dependency (pandas):
-
Another obvious solution would be
dumps(pd.DataFrame(d).fillna(None))
, but Pandas issue 1972 notes thatd.fillna(None)
will have unpredictable behaviour:> Note that
fillna(None)
is equivalent tofillna()
, which means the value parameter is unused. Instead, it uses the method parameter which is by default forward fill. -
So instead, use
DataFrame.where
:df = pd.DataFrame(d) dumps(df.where(pd.notnull(df), None)))
Solution 3 - Python
Unfortunately, you probably need to use @Bramar's suggestion. You're not going to be able to use this directly. The documentation for Python's JSON encoder states:
> If specified, default is a function that gets called for objects that can’t otherwise be serialized
Your NanConverter.default
method isn't even being called, since Python's JSON encoder already knows how to serialize np.nan
. Add some print statements - you'll see your method isn't even being called.
Solution 4 - Python
simplejson will do the right work here, but there's one extra flag worth including:
Try using simplejson:
pip install simplejson
Then in the code:
import simplejson
response = df.to_dict('records')
simplejson.dumps(response, ignore_nan=True,default=datetime.datetime.isoformat)
The ignore_nan flag will handle correctly all NaN --> null conversions
The default flag will allow simplejson to parse your datetimes correctly.
Solution 5 - Python
Using Pandas
For those using Pandas, the simplest way - no third party libraries required: df.to_json. This even converts NaNs and other Numpy types in nested stuctures:
df = pd.DataFrame({
'words': ['on', 'off'],
'lists': [
[[1, 1, 1], [2, 2, 2], [3, 3, 3]],
[[np.nan], [np.nan], [np.nan]],
'dicts': [
{'S': {'val': 'A'}},
{'S': {'val': np.nan}},
]
})
If you convert it to a list of dicts, Pandas retains the native nan
values:
json.dumps(df.to_dict(orient='record'))
> [{
"words": "on",
"lists": [[1, 1, 1], [2, 2, 2], [3, 3, 3]],
"dicts": {"S": {"val": "A"}}
},
{
"words": "off",
"lists": [[NaN], [NaN], [NaN]],
"dicts": {"S": {"val": NaN}}
}]
But if you have Pandas convert it straight to a JSON string, it'll sort that out for you:
df.to_json(orient='records')
> [{
"words": "on",
"lists": [[1,1,1],[2,2,2],[3,3,3]],
"dicts": {"S":{"val":"A"}}
},
{
"words": "off",
"lists": [[null],[null],[null]],
"dicts": {"S":{"val":null}}
}]
Note that the orient
value is slightly different between to_dict()
and to_json()
.
Using Standard Library
If you're just working with lists and dicts and scalar values, you can convert NaNs manually:
import math
def to_none(val):
if math.isnan(val):
return None
return val
Solution 6 - Python
I use the following workaround:
json_constant_map = {
'-Infinity': float('-Infinity'),
'Infinity': float('Infinity'),
'NaN': None,
}
def json_nan_to_none(obj: typing.Any, *, default: typing.Callable = None) -> None:
# We want to convert NaNs to None and we have to use for now this workaround.
# We still want an exception for infinity and -infinity.
# See: https://github.com/python/cpython/pull/13233
json_string = json.dumps(obj, default=default)
return json.loads(
json_string,
parse_constant=lambda constant: json_constant_map[constant],
)
Solution 7 - Python
You could try to serialize the dictionary to a string, then replace "NaN" with "null", then encode it back:
d = json.dumps(d) # json dump string
d = d.replace("NaN", "null")
d = json.loads(d) # json load string
But you must be careful. If, for some reason, "NaN" is part of a string in some key or value inside the dictionary, this would require additinal care in the replace step.
Solution 8 - Python
you can use simplejson but if you want to use only JSON module then my trick
json.dumps(d).replace(", NaN," , ', "null",')
Solution 9 - Python
There is a PR for this to be customizable in Python json standard library, but it is not yet merged in.
Solution 10 - Python
Here is the solution that I use for converting NaN
to None
. The nested lists also seem to be handled pretty well. The recursion of dicts is handled automatically.
def null_convert(obj):
if isinstance(obj, dict):
for i in obj:
if isinstance(obj[i], float) and np.isnan(obj[i]):
obj[i]= None
if isinstance(obj[i], list):
for j,v in enumerate(obj[i]):
if isinstance(v, float) and np.isnan(v):
obj[i][j] = None
return obj
json.loads(json_str, object_hook = null_convert)
Solution 11 - Python
I ended up overriding the encode
and iterencode
methods in the NanConverter
subclass, preprocessing obj
and substituting nan
to None
(which will become null
once serialized).
This seems to be the most straightforward way given that as @Gerrat noted the Python JSONEncoder will not call default
when it encounters a nan
. Even when calling dump
/dumps
with allow_nan=False
it'll just throw an exception before giving the user the opportunity to "do their own thing".
import math
import numpy as np
from json import JSONEncoder, dumps
def nan2None(obj):
if isinstance(obj, dict):
return {k:nan2None(v) for k,v in obj.items()}
elif isinstance(obj, list):
return [nan2None(v) for v in obj]
elif isinstance(obj, float) and math.isnan(obj):
return None
return obj
class NanConverter(JSONEncoder):
def default(self, obj):
# possible other customizations here
pass
def encode(self, obj, *args, **kwargs):
obj = nan2None(obj)
return super().encode(obj, *args, **kwargs)
def iterencode(self, obj, *args, **kwargs):
obj = nan2None(obj)
return super().iterencode(obj, *args, **kwargs)
>>> d = {'a': 1, 'b': 2, 'c': 3, 'e': math.nan, 'f': [1, np.nan, 3]}
>>> dumps(d, cls=NanConverter)
'{"a": 1, "b": 2, "c": 3, "e": null, "f": [1, null, 3]}'
Solution 12 - Python
@alexander Unfortunately, JSON does not support > np.nan, > np.NaN, > np.inf,
It only support null, refer this documentation. But in Python, we have None option so null values can be replaced with None keyword
Other problems while converting dataframe or list in Python to JSON is, it won't support numpy datatypes, so we need to convert it to JSON acceptable data type, below is solution for the same
class CustomJSONizer(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.integer):
return int(obj)
if isinstance(obj, np.floating):
return float(obj)
if isinstance(obj, np.ndarray):
return obj.tolist()
if isinstance(obj, np.bool_):
return super().encode(bool(obj))
return super(CustomJSONizer, self).default(obj)
This is a custom class which takes care of various data type that you might come across while working with JSON file. To call this we need to use json.dumps
with open('filename.json', 'w', encoding='utf-8') as f:
f.write(json.dumps(Return_content,cls=CustomJSONizer, ensure_ascii=False))