How to solve JSON_ERROR_UTF8 error in php json_decode?

PhpJsonParsingJsonp

Php Problem Overview


I am trying this code

$json = file_get_contents("http://www.google.com/alerts/preview?q=test&t=7&f=1&l=0&e");
print_r(json_decode(utf8_encode($json), true));
    
        //////////////
    
// Define the errors.
$constants = get_defined_constants(true);
$json_errors = array();
foreach ($constants["json"] as $name => $value) {
    if (!strncmp($name, "JSON_ERROR_", 11)) {
        $json_errors[$value] = $name;
    }
}

// Show the errors for different depths.
foreach (range(4, 3, -1) as $depth) {
    var_dump(json_decode($json, true, $depth));
    echo 'Last error: ', $json_errors[json_last_error()], PHP_EOL, PHP_EOL;
}

I've tried a lot of functions, html_entities_decode, utf8_encode and decode, decoding the hex codes, but I always get the error "JSON_ERROR_UTF8".

How could I solve this?

Php Solutions


Solution 1 - Php

There is a good function to sanitize your arrays.

I suggest you use a json_encode wrapper like this :

function safe_json_encode($value, $options = 0, $depth = 512, $utfErrorFlag = false) {
	$encoded = json_encode($value, $options, $depth);
	switch (json_last_error()) {
		case JSON_ERROR_NONE:
			return $encoded;
		case JSON_ERROR_DEPTH:
			return 'Maximum stack depth exceeded'; // or trigger_error() or throw new Exception()
		case JSON_ERROR_STATE_MISMATCH:
			return 'Underflow or the modes mismatch'; // or trigger_error() or throw new Exception()
		case JSON_ERROR_CTRL_CHAR:
			return 'Unexpected control character found';
		case JSON_ERROR_SYNTAX:
			return 'Syntax error, malformed JSON'; // or trigger_error() or throw new Exception()
		case JSON_ERROR_UTF8:
			$clean = utf8ize($value);
            if ($utfErrorFlag) {
                return 'UTF8 encoding error'; // or trigger_error() or throw new Exception()
            }
			return safe_json_encode($clean, $options, $depth, true);
		default:
			return 'Unknown error'; // or trigger_error() or throw new Exception()

	}
}

function utf8ize($mixed) {
	if (is_array($mixed)) {
		foreach ($mixed as $key => $value) {
			$mixed[$key] = utf8ize($value);
		}
	} else if (is_string ($mixed)) {
		return utf8_encode($mixed);
	}
	return $mixed;
}

In my application utf8_encode() works better than iconv()

Solution 2 - Php

You need simple line of code:

$input = iconv('UTF-8', 'UTF-8//IGNORE', utf8_encode($input));
$json = json_decode($input);

Credit: Sang Le, my teamate gave me this code. Yeah!

Solution 3 - Php

The iconv function is pretty worthless unless you can guarantee the input is valid. Use mb_convert_encoding instead.

mb_convert_encoding($value, "UTF-8", "auto");

You can get more explicit than "auto", and even specify a comma-separated list of expected input encodings.

Most importantly, invalid characters will be handled without causing the entire string to be discarded (unlike iconv).

Solution 4 - Php

There is no magic bullet which will "solve" encoding problems; you have to understand what encoding you have, and then convert it.

Computers ultimately transmit and store binary data; to make that binary data useful, we devise codes that say "this string of binary represents an 'a', that one represents a 'b', and this other one represents the man-in-business-suit-levitating emoji ️". UTF-8 (simplifying a little bit) is just one of those encodings. Others have names like ASCII, ISO-8859-1, Windows Code Page 1252, and Shift-JIS.

If all you know is that a string is "not UTF-8" you cannot make it into UTF-8 because you don't know if the first character is supposed to be an "a", or a "️".

If you do know what encoding your string is in, you can use any of three functions in PHP; depending on your installation of PHP, some or all might be unavailable, but they are what you want.

Note that mb_convert_encoding lets you leave out the argument that states the current encoding. This does not automatically work out the correct encoding, it just uses a global setting which you control.

There are two other functions provided in PHP which are badly named: utf8_encode and utf8_decode. These are just extremely limited versions of the three functions above: they can only convert from ISO-8859-1 to UTF-8 and back. If your string is not in that encoding (and you don't want it to be) these functions will not help you. They might make your errors go away, but that's not the same as fixing your data.

Solution 5 - Php

Decoding JSON in PHP Decoding JSON is as simple as encoding it. PHP provides you a handy json_decode function that handles everything for you. If you just pass a valid JSON string into the method, you get an object of type stdClass back. Here’s a short example:

<?php
$string = '{"foo": "bar", "cool": "attr"}';
$result = json_decode($string);

// Result: object(stdClass)#1 (2) { ["foo"]=> string(3) "bar" ["cool"]=> string(4) "attr" }
var_dump($result);

// Prints "bar"
echo $result->foo;

// Prints "attr"
echo $result->cool;
?>

If you want to get an associative array back instead, set the second parameter to true:

<?php
$string = '{"foo": "bar", "cool": "attr"}';
$result = json_decode($string, true);

// Result: array(2) { ["foo"]=> string(3) "bar" ["cool"]=> string(4) "attr" }
var_dump($result);

// Prints "bar"
echo $result['foo'];

// Prints "attr"
echo $result['cool'];
?>

If you expect a very large nested JSON document, you can limit the recursion depth to a certain level. The function will return null and stops parsing if the document is deeper than the given depth.

<?php
$string = '{"foo": {"bar": {"cool": "value"}}}';
$result = json_decode($string, true, 2);

// Result: null
var_dump($result);
?>

The last argument works the same as in json_encode, but there is only one bitmask supported currently (which allows you to convert bigints to strings and is only available from PHP 5.4 upwards).We’ve been working with valid JSON strings until now (aside fromt the null depth error). The next part shows you how to deal with errors.

Error-Handling and Testing If the JSON value could not be parsed or a nesting level deeper than the given (or default) depth is found, NULL is returned from json_decode. This means that no exception is raised by json_encode/json_deocde directly.

So how can we identify the cause of the error? The json_last_error function helps here. json_last_error returns an integer error code that can be one of the following constants (taken from here):

JSON_ERROR_NONE: No error has occurred. JSON_ERROR_DEPTH: The maximum stack depth has been exceeded. JSON_ERROR_STATE_MISMATCH: Invalid or malformed JSON. JSON_ERROR_CTRL_CHAR: Control character error, possibly incorrectly encoded. JSON_ERROR_SYNTAX: Syntax error. JSON_ERROR_UTF8: Malformed UTF-8 characters, possibly incorrectly encoded (since PHP 5.3.3). With those information at hand, we can write a quick parsing helper method that raises a descriptive exception when an error is found.

<?php
class JsonHandler {

    protected static $_messages = array(
        JSON_ERROR_NONE => 'No error has occurred',
        JSON_ERROR_DEPTH => 'The maximum stack depth has been exceeded',
        JSON_ERROR_STATE_MISMATCH => 'Invalid or malformed JSON',
        JSON_ERROR_CTRL_CHAR => 'Control character error, possibly incorrectly encoded',
        JSON_ERROR_SYNTAX => 'Syntax error',
        JSON_ERROR_UTF8 => 'Malformed UTF-8 characters, possibly incorrectly encoded'
    );

    public static function encode($value, $options = 0) {
        $result = json_encode($value, $options);

        if($result)  {
            return $result;
        }

        throw new RuntimeException(static::$_messages[json_last_error()]);
    }

    public static function decode($json, $assoc = false) {
        $result = json_decode($json, $assoc);

        if($result) {
            return $result;
        }

        throw new RuntimeException(static::$_messages[json_last_error()]);
    }

}
?>

We can now use the exception testing function from the last post about exception handling to test if our exception works correctly.

// Returns "Correctly thrown"
assertException("Syntax error", function() {
    $string = '{"foo": {"bar": {"cool": NONUMBER}}}';
    $result = JsonHandler::decode($string);
});

Note that since PHP 5.3.3, there is a JSON_ERROR_UTF8 error returned when an invalid UTF-8 character is found in the string. This is a strong indication that a different charset than UTF-8 is used. If the incoming string is not under your control, you can use the utf8_encode function to convert it into utf8.

<?php echo utf8_encode(json_encode($payload)); ?>

I’ve been using this in the past to convert data loaded from a legacy MSSQL database that didn’t use UTF-8.

source

Solution 6 - Php

I solved adding another 'if' to manage objects in the 'utf8ize' function by @Konstantin (I've not used the other function) :

function utf8ize($mixed) {
    if (is_array($mixed)) {
        foreach ($mixed as $key => $value) {
            $mixed[$key] = utf8ize($value);
        }
    } else if (is_string ($mixed)) {
        return utf8_encode($mixed);
    } else if (is_object($mixed)) {
        $a = (array)$mixed; // from object to array
        return utf8ize($a);
    }
    return $mixed;
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJames HarzsView Question on Stackoverflow
Solution 1 - PhpKonstantinView Answer on Stackoverflow
Solution 2 - PhpAndy TruongView Answer on Stackoverflow
Solution 3 - PhpRich RemerView Answer on Stackoverflow
Solution 4 - PhpIMSoPView Answer on Stackoverflow
Solution 5 - PhpMuhammad TahirView Answer on Stackoverflow
Solution 6 - PhpFilView Answer on Stackoverflow