Python: Why does ("hello" is "hello") evaluate as True?

PythonIdentityString ComparisonObject Comparison

Python Problem Overview


Why does "hello" is "hello" produce True in Python?

I read the following here:

> If two string literals are equal, they have been put to same memory location. A string is an immutable entity. No harm can be done.

So there is one and only one place in memory for every Python string? Sounds pretty strange. What's going on here?

Python Solutions


Solution 1 - Python

Python (like Java, C, C++, .NET) uses string pooling / interning. The interpreter realises that "hello" is the same as "hello", so it optimizes and uses the same location in memory.

Another goodie: "hell" + "o" is "hello" ==> True

Solution 2 - Python

> So there is one and only one place in memory for every Python string?

No, only ones the interpreter has decided to optimise, which is a decision based on a policy that isn't part of the language specification and which may change in different CPython versions.

eg. on my install (2.6.2 Linux):

>>> 'X'*10 is 'X'*10
True
>>> 'X'*30 is 'X'*30
False

similarly for ints:

>>> 2**8 is 2**8
True
>>> 2**9 is 2**9
False

So don't rely on 'string' is 'string': even just looking at the C implementation it isn't safe.

Solution 3 - Python

Literal strings are probably grouped based on their hash or something similar. Two of the same literal strings will be stored in the same memory, and any references both refer to that.

 Memory        Code
-------
|          myLine = "hello"
|        /
|hello  <
|        \
|          myLine = "hello"
-------

Solution 4 - Python

The is operator returns true if both arguments are the same object. Your result is a consequence of this, and the quoted bit.

In the case of string literals, these are interned, meaning they are compared to known strings. If an identical string is already known, the literal takes that value, instead of an alternative one. Thus, they become the same object, and the expression is true.

Solution 5 - Python

The Python interpreter/compiler parses the string literals, i.e. the quoted list of characters. When it does this, it can detect "I've seen this string before", and use the same representation as last time. It can do this since it knows that strings defined in this way cannot be changed.

Solution 6 - Python

Why is it strange. If the string is immutable it makes a lot of sense to only store it once. .NET has the same behavior.

Solution 7 - Python

I think if any two variables (not just strings) contain the same value, the value will be stored only once not twice and both the variables will point to the same location. This saves memory.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDeniz DoganView Question on Stackoverflow
Solution 1 - PythoncarlView Answer on Stackoverflow
Solution 2 - PythonbobinceView Answer on Stackoverflow
Solution 3 - PythonQuantumplationView Answer on Stackoverflow
Solution 4 - PythonSingleNegationEliminationView Answer on Stackoverflow
Solution 5 - PythonunwindView Answer on Stackoverflow
Solution 6 - PythonBrian RasmussenView Answer on Stackoverflow
Solution 7 - Pythonuser250145View Answer on Stackoverflow