Why does 2+ 40 equal 42?

JavascriptUnicode

Javascript Problem Overview


I was baffled when a colleague showed me this line of JavaScript alerting 42.

alert(2+ 40);

It quickly turns out that what looks like a minus sign is actually an arcane Unicode character with clearly different semantics.

This left me wondering why that character doesn't produce a syntax error when the expression is parsed. I'd also like to know if there are more characters behaving like this.

Javascript Solutions


Solution 1 - Javascript

That character is "OGHAM SPACE MARK", which is a space character. So the code is equivalent to alert(2+ 40).

> I'd also like to know if there are more characters behaving like this.

Any Unicode character in the Zs class is a white space character in JavaScript, but there don't seem to be that many.

However, JavaScript also allows Unicode characters in identifiers, which lets you use interesting variable names like ಠ_ಠ.

Solution 2 - Javascript

After reading the other answers, I wrote a simple script to find all Unicode characters in the range U+0000–U+FFFF that behave like white spaces. As it seems, there are 26 or 27 of them depending on the browser, with disagreements about U+0085 and U+FFFE.

Note that most of these characters just look like a regular white space.

function isSpace(ch)
{
    try
    {
        return Function('return 2 +' + ch + ' 2')() === 4;
    }
    catch(e)
    {
        return false;
    }
}

for (var i = 0; i <= 0xffff; ++i)
{
    var ch = String.fromCharCode(i);
    if (isSpace(ch))
    {
        document.body.appendChild(document.createElement('DIV')).textContent = 'U+' + ('000' + i.toString(16).toUpperCase()).slice(-4) + '    "' + ch + '"';
    }
}

div { font-family: monospace; }

Solution 3 - Javascript

It appears that the character that you are using is actually longer than what the actual minus sign (a hyphen) is.

-

The top is what you are using, the bottom is what the minus sign should be. You do seem to know that already, so now let's see why Javascript does this.

The character that you use is actually the ogham space mark which is a whitespace character, so it is basically interpreted as the same thing as a space, which means that your statement looks like alert(2+ 40) to Javascript.

There are other characters like this in Javascript. You can see a full list here on Wikipedia.


Something interesting I noticed about this character is the way that Google Chrome (and possible other browsers) interprets it in the top bar of the page.

enter image description here

It is a block with 1680 inside of it. That is actually the unicode number for the ogham space mark. It appears to be just my machine doing this, but it is a strange thing.


I decided to try this out in other languages to see what happens and these are the results that I got.


Languages it doesn't work in:

Python 2 & 3

>> 2+ 40
  File "<stdin>", line 1
    2+ 40
        ^
SyntaxError: invalid character in identifier

Ruby

>> 2+ 40
NameError: undefined local variable or method ` 40' for main:Object
    from (irb):1
    from /home/michaelpri/.rbenv/versions/2.2.2/bin/irb:11:in `<main>'

Java (inside the main method)

>> System.out.println(2+ 40);
Main.java:3: error: illegal character: \5760
            System.out.println(2+?40);
                                 ^
Main.java:3: error: ';' expected
            System.out.println(2+?40);
                                  ^
Main.java:3: error: illegal start of expression
            System.out.println(2+?40);
                                    ^
3 errors

PHP

>> 2+ 40;
Use of undefined constant  40 - assumed ' 40' :1

C

>> 2+ 40
main.c:1:1: error: expected identifier or '(' before numeric constant
 2+ 40
 ^
main.c:1:1: error: stray '\341' in program
main.c:1:1: error: stray '\232' in program
main.c:1:1: error: stray '\200' in program

exit status 1

Go

>> 2+ 40
can't load package: package .: 
main.go:1:1: expected 'package', found 'INT' 2
main.go:1:3: illegal character U+1680

exit status 1

Perl 5

>> perl -e'2+ 40'                                                                                                                                   
Unrecognized character \xE1; marked by <-- HERE after 2+<-- HERE near column 3 at -e line 1.

Languages it does work in:

Scheme

>> (+ 240)
=> 42

C# (inside the Main() method)

Console.WriteLine(2+ 40);

Output: 42

Perl 6

>> ./perl6 -e'say 2+ 40' 
42

Solution 4 - Javascript

I guess it has to do something with the fact that for some strange reason it classifies as whitespace:

$ unicode  
U+1680 OGHAM SPACE MARK
UTF-8: e1 9a 80  UTF-16BE: 1680  Decimal: &#5760;
  ( )
Uppercase: U+1680
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)

Solution 5 - Javascript

> I'd also like to know if there are more characters behaving like this.

I seem to remember reading a piece a while back about mischievously replacing semi-colons (U+003B) in someone's code with U+037E which is the Greek question mark.

They both look the same (to the extent that I believe the Greeks themselves use U+003B) but this article stated that the other one wouldn't work.

Some more information on this from Wikipedia is here: https://en.wikipedia.org/wiki/Question_mark#Greek_question_mark

And a (closed) question on using this as prank from SO itself. Not where I originally read it AFAIR though: https://stackoverflow.com/questions/26965331/javascript-prank-joke

Solution 6 - Javascript

Many languages won't compile this expression, but I was curious what Rust's compiler had to say on the topic. It is notoriously strict but will often give us knowledge and wisdom with loving kindness.

So I asked it to compile this:

fn main() {
	println!("{}", (2+ 40));
}

And the compiler replied:

error: unknown start of token: \u{1680}
  |
  |     println!("{}", (2+40));
  |                       ^
  |
help: Unicode character ' ' (Ogham Space mark) looks like ' ' (Space), but it is not


JavaScript, on the other hand, (tested with the latest and most commonly used browser today) seems to be pretty chill about that character and simply ignores it.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGOTO 0View Question on Stackoverflow
Solution 1 - JavascriptFelix KlingView Answer on Stackoverflow
Solution 2 - JavascriptGOTO 0View Answer on Stackoverflow
Solution 3 - JavascriptmichaelpriView Answer on Stackoverflow
Solution 4 - JavascriptPSkocikView Answer on Stackoverflow
Solution 5 - JavascriptnoonandView Answer on Stackoverflow
Solution 6 - Javascriptat54321View Answer on Stackoverflow