Why is math.sqrt massively slower than exponentiation?

PythonPerformanceMacos CatalinaPython 3.8

Python Problem Overview


I can't believe what I just measured:

python3 -m timeit -s "from math import sqrt" "sqrt(2)"
5000000 loops, best of 5: 42.8 nsec per loop

python3 -m timeit "2 ** 0.5"
50000000 loops, best of 5: 4.93 nsec per loop

This goes against any intuition... it shoud be exactly the opposite!

Python 3.8.3 on macOS Catalina

Python Solutions


Solution 1 - Python

Python 3 is precomputing the value of 2 ** 0.5 at compile time, since both operands are known at that time. The value of sqrt, however, is not known at compile time, so the computation necessarily occurs at run time.

You aren't timing how long it takes to compute 2 ** 0.5, but just the time it takes to load a constant.

A fairer comparison would be

$ python3 -m timeit -s "from math import sqrt" "sqrt(2)"
5000000 loops, best of 5: 50.7 nsec per loop
$ python3 -m timeit -s "x = 2" "x**0.5"
5000000 loops, best of 5: 56.7 nsec per loop


I'm not sure if there is a way to show unoptimized byte code. Python starts by parsing source code into an abstract syntax tree (AST):

>>> ast.dump(ast.parse("2**0.5"))
'Module(body=[Expr(value=BinOp(left=Num(n=2), op=Pow(), right=Num(n=0.5)))])'

Update: This particular optimization is now applied directly to the abstract syntax tree, so the byte code is generated directly from something like

Module(body=Num(n= 1.4142135623730951))

The ast module doesn't appear to apply the optimization.

The compiler takes the AST and generates unoptimized byte code; in this case, I believe it would look (based on the output of dis.dis("2**x") and dis.dis("x**0.5")) like

LOAD_CONST       0  (2)
LOAD_CONST       1  (0.5)
BINARY_POWER
RETURN_VALUE

The raw byte code is then subject to modification by the peephole optimzizer, which can reduce these 4 instructions to 2, as shown by the dis module.

The compiler then generates byte code from the AST.

>>> dis.dis("2**0.5")
  1           0 LOAD_CONST               0 (1.4142135623730951)
              2 RETURN_VALUE

[While the following paragraph was originally written with the idea of optimizing byte code in mind, the reasoning applies to optimizing the AST as well.]

Since nothing at runtime affects how the two LOAD_CONST and following BINARY_POWER instruction are evaluated (for example, there are no name lookups), the peephole optimizer can take this sequence of byte codes, perform the computation of 2**0.5 itself, and replace the first three instructions with a single LOAD_CONST instruction that loads the result immediately.

Solution 2 - Python

To enhance chepner's answer, here's a proof:

Python 3.5.3 (default, Sep 27 2018, 17:25:39) 
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> dis.dis('2 ** 0.5')
  1           0 LOAD_CONST               2 (1.4142135623730951)
              3 RETURN_VALUE

vs.

>>> dis.dis('sqrt(2)')
  1           0 LOAD_NAME                0 (sqrt)
              3 LOAD_CONST               0 (2)
              6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
              9 RETURN_VALUE

Solution 3 - Python

>>> dis.dis('44442.3123 ** 0.5')
          0 LOAD_CONST               0 (210.81345379268373)
          2 RETURN_VALUE

I do not believe, that 44442.3123 ** 0.5 is precomputed at compile time. We should better check the AST of code.

>>> import ast
>>> import math
>>> code = ast.parse("2**2")
>>> ast.dump(code)
'Module(body=[Expr(value=BinOp(left=Num(n=2), op=Pow(), right=Num(n=2)))])'
>>> code = ast.parse("math.sqrt(3)")
>>> ast.dump(code)
"Module(body=[Expr(value=Call(func=Attribute(value=Name(id='math', ctx=Load()), attr='sqrt', ctx=Load()), args=[Num(n=3)], keywords=[]))])"

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionPeter LeikaufView Question on Stackoverflow
Solution 1 - PythonchepnerView Answer on Stackoverflow
Solution 2 - Pythonmkrieger1View Answer on Stackoverflow
Solution 3 - PythonakostrikovView Answer on Stackoverflow