Python JSON encoder convert NaNs to null instead

PythonJsonNumpyNan

Python Problem Overview


I'm writing code to receive an arbitrary object (possibly nested) capable of being converted to JSON.

The default behavior for Python's builtin JSON encoder is to convert NaNs to NaN, e.g. json.dumps(np.NaN) results in NaN. How can I change this NaN value to null?

I tried to subclass JSONEncoder and override the default() method as follows:

from json import JSONEncoder, dumps
import numpy as np
    
class NanConverter(JSONEncoder):
    def default(self, obj):
        try:
            _ = iter(obj)
        except TypeError:
            if isinstance(obj, float) and np.isnan(obj):
                return "null"
        return JSONEncoder.default(self, obj)

>>> d = {'a': 1, 'b': 2, 'c': 3, 'e': np.nan, 'f': [1, np.nan, 3]}
>>> dumps(d, cls=NanConverter)
'{"a": 1, "c": 3, "b": 2, "e": NaN, "f": [1, NaN, 3]}'

EXPECTED RESULT: '{"a": 1, "c": 3, "b": 2, "e": null, "f": [1, null, 3]}'

Python Solutions


Solution 1 - Python

This seems to achieve my objective:

import simplejson


>>> simplejson.dumps(d, ignore_nan=True)
Out[3]: '{"a": 1, "c": 3, "b": 2, "e": null, "f": [1, null, 3]}'

Solution 2 - Python

  1. As @Gerrat points out, your hook dumps(d, cls=NanConverter) unfortunately won't work.

  2. @Alexander's simplejson.dumps(d, ignore_nan=True) works but introduces an additional dependency (simplejson).

If we introduce another dependency (pandas):

  1. Another obvious solution would be dumps(pd.DataFrame(d).fillna(None)), but Pandas issue 1972 notes that d.fillna(None) will have unpredictable behaviour:

    > Note that fillna(None) is equivalent to fillna(), which means the value parameter is unused. Instead, it uses the method parameter which is by default forward fill.

  2. So instead, use DataFrame.where:

     df = pd.DataFrame(d)
     dumps(df.where(pd.notnull(df), None)))
    

Solution 3 - Python

Unfortunately, you probably need to use @Bramar's suggestion. You're not going to be able to use this directly. The documentation for Python's JSON encoder states:

> If specified, default is a function that gets called for objects that can’t otherwise be serialized

Your NanConverter.default method isn't even being called, since Python's JSON encoder already knows how to serialize np.nan. Add some print statements - you'll see your method isn't even being called.

Solution 4 - Python

simplejson will do the right work here, but there's one extra flag worth including:

Try using simplejson:

pip install simplejson

Then in the code:

import simplejson

response = df.to_dict('records')
simplejson.dumps(response, ignore_nan=True,default=datetime.datetime.isoformat)

The ignore_nan flag will handle correctly all NaN --> null conversions

The default flag will allow simplejson to parse your datetimes correctly.

Solution 5 - Python

Using Pandas

For those using Pandas, the simplest way - no third party libraries required: df.to_json. This even converts NaNs and other Numpy types in nested stuctures:

df = pd.DataFrame({
  'words': ['on', 'off'],
  'lists': [
    [[1, 1, 1], [2, 2, 2], [3, 3, 3]],
    [[np.nan], [np.nan], [np.nan]],
  'dicts': [
    {'S': {'val': 'A'}},
    {'S': {'val': np.nan}},
  ]
})

If you convert it to a list of dicts, Pandas retains the native nan values:

json.dumps(df.to_dict(orient='record'))

> [{
    "words": "on",
    "lists": [[1, 1, 1], [2, 2, 2], [3, 3, 3]],
    "dicts": {"S": {"val": "A"}}
  },
  {
    "words": "off",
    "lists": [[NaN], [NaN], [NaN]],
    "dicts": {"S": {"val": NaN}}
  }]

But if you have Pandas convert it straight to a JSON string, it'll sort that out for you:

df.to_json(orient='records')

> [{
    "words": "on",
    "lists": [[1,1,1],[2,2,2],[3,3,3]],
    "dicts": {"S":{"val":"A"}}
  },
  {
    "words": "off",
    "lists": [[null],[null],[null]],
    "dicts": {"S":{"val":null}}
  }]

Note that the orient value is slightly different between to_dict() and to_json().

Using Standard Library

If you're just working with lists and dicts and scalar values, you can convert NaNs manually:

import math

def to_none(val):
    if math.isnan(val):
        return None
    return val

Solution 6 - Python

I use the following workaround:

json_constant_map = {
    '-Infinity': float('-Infinity'),
    'Infinity': float('Infinity'),
    'NaN': None,
}

def json_nan_to_none(obj: typing.Any, *, default: typing.Callable = None) -> None:
    # We want to convert NaNs to None and we have to use for now this workaround.
    # We still want an exception for infinity and -infinity.
    # See: https://github.com/python/cpython/pull/13233
    json_string = json.dumps(obj, default=default)
    return json.loads(
        json_string,
        parse_constant=lambda constant: json_constant_map[constant],
    )

Solution 7 - Python

You could try to serialize the dictionary to a string, then replace "NaN" with "null", then encode it back:

    d = json.dumps(d) # json dump string
    d = d.replace("NaN", "null")
    d = json.loads(d) # json load string

But you must be careful. If, for some reason, "NaN" is part of a string in some key or value inside the dictionary, this would require additinal care in the replace step.

Solution 8 - Python

you can use simplejson but if you want to use only JSON module then my trick

json.dumps(d).replace(", NaN," , ', "null",')

Solution 9 - Python

There is a PR for this to be customizable in Python json standard library, but it is not yet merged in.

Solution 10 - Python

Here is the solution that I use for converting NaN to None. The nested lists also seem to be handled pretty well. The recursion of dicts is handled automatically.

def null_convert(obj):
    if isinstance(obj, dict):
        for i in obj:
            if isinstance(obj[i], float) and np.isnan(obj[i]):
                obj[i]= None
            if isinstance(obj[i], list):
                for j,v in enumerate(obj[i]):
                    if isinstance(v, float) and np.isnan(v):
                        obj[i][j] = None
    return obj

json.loads(json_str, object_hook = null_convert)

Solution 11 - Python

I ended up overriding the encode and iterencode methods in the NanConverter subclass, preprocessing obj and substituting nan to None (which will become null once serialized).

This seems to be the most straightforward way given that as @Gerrat noted the Python JSONEncoder will not call default when it encounters a nan. Even when calling dump/dumps with allow_nan=False it'll just throw an exception before giving the user the opportunity to "do their own thing".

import math
import numpy as np
from json import JSONEncoder, dumps

def nan2None(obj):
    if isinstance(obj, dict):
        return {k:nan2None(v) for k,v in obj.items()}
    elif isinstance(obj, list):
        return [nan2None(v) for v in obj]
    elif isinstance(obj, float) and math.isnan(obj):
        return None
    return obj

class NanConverter(JSONEncoder):
    def default(self, obj):
        # possible other customizations here 
        pass
    def encode(self, obj, *args, **kwargs):
        obj = nan2None(obj)
        return super().encode(obj, *args, **kwargs)
    def iterencode(self, obj, *args, **kwargs):
        obj = nan2None(obj)
        return super().iterencode(obj, *args, **kwargs)

>>> d = {'a': 1, 'b': 2, 'c': 3, 'e': math.nan, 'f': [1, np.nan, 3]}
>>> dumps(d, cls=NanConverter)
'{"a": 1, "b": 2, "c": 3, "e": null, "f": [1, null, 3]}'

Solution 12 - Python

@alexander Unfortunately, JSON does not support > np.nan, > np.NaN, > np.inf,

It only support null, refer this documentation. But in Python, we have None option so null values can be replaced with None keyword

Other problems while converting dataframe or list in Python to JSON is, it won't support numpy datatypes, so we need to convert it to JSON acceptable data type, below is solution for the same

class CustomJSONizer(json.JSONEncoder):
  def default(self, obj):
      if isinstance(obj, np.integer):
         return int(obj)
      if isinstance(obj, np.floating):
         return float(obj)
      if isinstance(obj, np.ndarray):
         return obj.tolist()
      if isinstance(obj, np.bool_):
         return super().encode(bool(obj))
      return super(CustomJSONizer, self).default(obj)

This is a custom class which takes care of various data type that you might come across while working with JSON file. To call this we need to use json.dumps

with open('filename.json', 'w', encoding='utf-8') as f:
     f.write(json.dumps(Return_content,cls=CustomJSONizer, ensure_ascii=False))

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAlexanderView Question on Stackoverflow
Solution 1 - PythonAlexanderView Answer on Stackoverflow
Solution 2 - PythonMichael CurrieView Answer on Stackoverflow
Solution 3 - PythonGerratView Answer on Stackoverflow
Solution 4 - PythoneiTan LaViView Answer on Stackoverflow
Solution 5 - Pythonrodrigo-silveiraView Answer on Stackoverflow
Solution 6 - PythonMitarView Answer on Stackoverflow
Solution 7 - PythonSchroederView Answer on Stackoverflow
Solution 8 - PythonSaurabh Chandra PatelView Answer on Stackoverflow
Solution 9 - PythonMitarView Answer on Stackoverflow
Solution 10 - PythonnumanView Answer on Stackoverflow
Solution 11 - PythonHans BouwmeesterView Answer on Stackoverflow
Solution 12 - Pythonyogesh agrawalView Answer on Stackoverflow