str performance in python

PythonStringPerformancePython 3.xPython 2.7

Python Problem Overview


While profiling a piece of python code (python 2.6 up to 3.2), I discovered that the str method to convert an object (in my case an integer) to a string is almost an order of magnitude slower than using string formatting.

Here is the benchmark

>>> from timeit import Timer
>>> Timer('str(100000)').timeit()
0.3145311339386332
>>> Timer('"%s"%100000').timeit()
0.03803517023435887

Does anyone know why this is the case? Am I missing something?

Python Solutions


Solution 1 - Python

'%s' % 100000 is evaluated by the compiler and is equivalent to a constant at run-time.

>>> import dis
>>> dis.dis(lambda: str(100000))
  8           0 LOAD_GLOBAL              0 (str)
              3 LOAD_CONST               1 (100000)
              6 CALL_FUNCTION            1
              9 RETURN_VALUE        
>>> dis.dis(lambda: '%s' % 100000)
  9           0 LOAD_CONST               3 ('100000')
              3 RETURN_VALUE        

% with a run-time expression is not (significantly) faster than str:

>>> Timer('str(x)', 'x=100').timeit()
0.25641703605651855
>>> Timer('"%s" % x', 'x=100').timeit()
0.2169809341430664

Do note that str is still slightly slower, as @DietrichEpp said, this is because str involves lookup and function call operations, while % compiles to a single immediate bytecode:

>>> dis.dis(lambda x: str(x))
  9           0 LOAD_GLOBAL              0 (str)
              3 LOAD_FAST                0 (x)
              6 CALL_FUNCTION            1
              9 RETURN_VALUE        
>>> dis.dis(lambda x: '%s' % x)
 10           0 LOAD_CONST               1 ('%s')
              3 LOAD_FAST                0 (x)
              6 BINARY_MODULO       
              7 RETURN_VALUE        

Of course the above is true for the system I tested on (CPython 2.7); other implementations may differ.

Solution 2 - Python

One reason that comes to mind is the fact that str(100000) involves a global lookup, but "%s"%100000 does not. The str global has to be looked up in the global scope. This does not account for the entire difference:

>>> Timer('str(100000)').timeit()
0.2941889762878418
>>> Timer('x(100000)', 'x=str').timeit()
0.24904918670654297

As noted by thg435,

>>> Timer('"%s"%100000',).timeit()
0.034214019775390625
>>> Timer('"%s"%x','x=100000').timeit()
0.2940788269042969

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionLuca SbardellaView Question on Stackoverflow
Solution 1 - PythongeorgView Answer on Stackoverflow
Solution 2 - PythonDietrich EppView Answer on Stackoverflow