Invalid control character with Python json.loads

PythonJson

Python Problem Overview


Below is my string that is getting printed out with the below code -

jsonString = data.decode("utf-8")

print jsonString

And below is the string that got printed out on the console -

{"description":"Script to check testtbeat of TEST 1 server.", "script":"#!/bin/bash\nset -e\n\nCOUNT=60   #number of 10 second timeouts in 10 minutes\nSUM_SYNCS=0\nSUM_SYNCS_BEHIND=0\nHOSTNAME=$hostname      \n\nwhile [[ $COUNT -ge \"0\" ]]; do\n\necho $HOSTNAME\n\n#send the request, put response in variable\nDATA=$(wget -O - -q -t 1 http://$HOSTNAME:8080/heartbeat)\n\n#grep $DATA for syncs and syncs_behind\nSYNCS=$(echo $DATA | grep -oE 'num_syncs: [0-9]+' | awk '{print $2}')\nSYNCS_BEHIND=$(echo $DATA | grep -oE 'num_syncs_behind: [0-9]+' | awk '{print $2}')\n\necho $SYNCS\necho $SYNCS_BEHIND\n\n#verify conditionals\nif [[ $SYNCS -gt \"8\" && $SYNCS_BEHIND -eq \"0\" ]]; then exit 0; fi\n\n#decrement the counter\nlet COUNT-=1\n\n#wait another 10 seconds\nsleep 10\n\ndone\n"}

But when I load this out using python json.loads as shown below-

jStr = json.loads(jsonString)

I am getting this error -

ERROR Invalid control character at: line 1 column 202 (char 202)

I looked at char 202 but I have no idea why that is causing an issue? char 202 in my notepad++ is e I guess.. Or may be I am calculating it wrong

Any idea what is wrong? How do I find out which one is causing problem.

UPDATE:-

jsonString = {"description":"Script to check testtbeat of TIER 1 server.", "script":"#!/bin/bash\nset -e\n\nCOUNT=60   #number of 10 second timeouts in 10 minutes\nSUM_SYNCS=0\nSUM_SYNCS_BEHIND=0\nHOSTNAME=$hostname      \n\nwhile [[ $COUNT -ge \"0\" ]]; do\n\necho $HOSTNAME\n\n#send the request, put response in variable\nDATA=$(wget -O - -q -t 1 http://$HOSTNAME:8080/heartbeat)\n\n#grep $DATA for syncs and syncs_behind\nSYNCS=$(echo $DATA | grep -oE 'num_syncs: [0-9]+' | awk '{print $2}')\nSYNCS_BEHIND=$(echo $DATA | grep -oE 'num_syncs_behind: [0-9]+' | awk '{print $2}')\n\necho $SYNCS\necho $SYNCS_BEHIND\n\n#verify conditionals\nif [[ $SYNCS -gt \"8\" && $SYNCS_BEHIND -eq \"0\" ]]; then exit 0; fi\n\n#decrement the counter\nlet COUNT-=1\n\n#wait another 10 seconds\nsleep 10\n\ndone\n"}

print jsonString[202]

Below error I got -

KeyError: 202

Python Solutions


Solution 1 - Python

The control character can be allowed inside a string as follows,

json_str = json.loads(jsonString, strict=False)

You can find this in the docs for python 2, or the docs for python 3

> If strict is false (True is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0–31 range, including '\t' (tab), '\n', '\r' and '\0'.

Solution 2 - Python

There is no error in your json text.

You can get the error if you copy-paste the string into your Python source code as a string literal. In that case \n is interpreted as a single character (newline). You can fix it by using raw-string literals instead (r'', Use triple-quotes r'''..''' to avoid escaping "' quotes inside the string literal).

Solution 3 - Python

try to use "strict=False" in json.loads , it will ignore "\n" and another Control characters. like the follwing:

import json
  
test_string = ' { "key1" : "1015391654687" , "key2": "value2 \n " } '

res = json.loads(test_string, strict=False)
  
print(res)

output :

{'key1': '1015391654687', 'key2': 'value2 \n '}

Solution 4 - Python

Escape your newlines.

{"description":"Script to check testtbeat of TEST 1 server.", "script":"#!/bin/bash\\nset -e\\n\\nCOUNT=60   #number of 10 second timeouts in 10 minutes\\nSUM_SYNCS=0\\nSUM_SYNCS_BEHIND=0\\nHOSTNAME=$hostname      #dc1dbx1145.dc1.host.com\\n\\nwhile [[ $COUNT -ge \\"0\\" ]]; do\\n\\necho $HOSTNAME\\n\\n#send the request, put response in variable\\nDATA=$(wget -O - -q -t 1 http://$HOSTNAME:8080/heartbeat)\\n\\n#grep $DATA for syncs and syncs_behind\\nSYNCS=$(echo $DATA | grep -oE 'num_syncs: [0-9]+' | awk '{print $2}')\\nSYNCS_BEHIND=$(echo $DATA | grep -oE 'num_syncs_behind: [0-9]+' | awk '{print $2}')\\n\\necho $SYNCS\\necho $SYNCS_BEHIND\\n\\n#verify conditionals\\nif [[ $SYNCS -gt \\"8\\" && $SYNCS_BEHIND -eq \\"0\\" ]]; then exit 0; fi\\n\\n#decrement the counter\\nlet COUNT-=1\\n\\n#wait another 10 seconds\\nsleep 10\\n\\ndone\\n"}

Works for me.

Also, if you get an error like this in the future, a debugging technique you can use is to shorten the string to something that works and slowly add data until it doesn't.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionarsenalView Question on Stackoverflow
Solution 1 - PythonJoe ChengView Answer on Stackoverflow
Solution 2 - PythonjfsView Answer on Stackoverflow
Solution 3 - PythonK.AView Answer on Stackoverflow
Solution 4 - PythonPakmanView Answer on Stackoverflow