Javascript parse error on '\u2028' unicode character

JavascriptUnicode

Javascript Problem Overview


Whenever I use the \u2028 character literal in my javascript source with the content type set to "text/html; charset=utf-8" I get a javascript parse errors.

Example:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">

<html lang="en">
<head>
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	<title>json</title>
	
	<script type="text/javascript" charset="utf-8">
	var string = '
    ';
	</script>
</head>
<body>

</body>
</html>

If the <meta http-equiv> is left out everything works as expected. I've tested this on Safari and Firefox, both exhibit the same problem.

Any ideas on why this is happening and how to properly fix this (without removing the encoding)?

Edit: After some more research, the specific problem was that the problem character was returned using JSONP. This was then interpreted by the browser, which reads u2028 as a newline and throws an error about an invalid newline in a string.

Javascript Solutions


Solution 1 - Javascript

Yes, it's a feature of the JavaScript language, documented in the ECMAScript standard (3rd edition section 7.3), that the U+2028 and U+2029 characters count as line endings. Consequently a JavaScript parser will treat any unencoded U+2028/9 character in the same way as a newline. Since you can't put a newline inside a string literal, you get a syntax error.

This is an unfortunate oversight in the design of JSON: it is not actually a proper subset of JavaScript. Raw U+2028/9 characters are valid in string literals in JSON, and will be accepted by JSON.parse, but not so in JavaScript itself.

Hence it is only safe to generate JavaScript code using a JSON parser if you're sure it explicitly \u-escapes those characters. Some do, some don't; many \u-escape all non-ASCII characters, which avoids the problem.

Solution 2 - Javascript

Alright,to answer my own question.

Normally a JSON parser strips out these problem characters, because I was retrieving JSONP I wasn't using a JSON parser, in stead the browser tried to parse the JSON itself as soon as the callback was called.

The only way to fix it was to make sure the server never returns these characters when requesting a JSONP resource.

p.s. My question was about u2028, according to Douglas Crockford's json2 library all of the following characters can cause these problems:

'\u0000\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff'

Solution 3 - Javascript

Could you just use \u2028, instead of real character?, because U+2028 is unicode line seperator, browsers would think that as real line break character like \n.

We cannot do like

x = "

"

Right? but we do x = "\n", so might be same concept.

Solution 4 - Javascript

Well, that makes sense, since you are telling the browser that the HTML and script are both using UTF-8, but then you specify a character that is not UTF-8 encoded. When you specify "charset=UTF-8", you are respoonsible for making sure the bytes transmitted to the browser are actually UTF-8. The web server and and browser will not do it for you in this situation.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionklaaspieterView Question on Stackoverflow
Solution 1 - JavascriptbobinceView Answer on Stackoverflow
Solution 2 - JavascriptklaaspieterView Answer on Stackoverflow
Solution 3 - JavascriptYOUView Answer on Stackoverflow
Solution 4 - JavascriptRemy LebeauView Answer on Stackoverflow