Python 2.x gotchas and landmines
PythonPython Problem Overview
The purpose of my question is to strengthen my knowledge base with Python and get a better picture of it, which includes knowing its faults and surprises. To keep things specific, I'm only interested in the CPython interpreter.
I'm looking for something similar to what learned from my [PHP landmines][1] [1]: https://stackoverflow.com/questions/512120/php-landmines-in-general question where some of the answers were well known to me but a couple were borderline horrifying.
Update: Apparently one maybe two people are upset that I asked a question that's already partially answered outside of Stack Overflow. As some sort of compromise here's the URL http://www.ferg.org/projects/python_gotchas.html
Note that one or two answers here already are original from what was written on the site referenced above.
Python Solutions
Solution 1 - Python
Expressions in default arguments are calculated when the function is defined, not when it’s called.
Example: consider defaulting an argument to the current time:
>>>import time
>>> def report(when=time.time()):
... print when
...
>>> report()
1210294387.19
>>> time.sleep(5)
>>> report()
1210294387.19
The when
argument doesn't change. It is evaluated when you define the function. It won't change until the application is re-started.
Strategy: you won't trip over this if you default arguments to None
and then do something useful when you see it:
>>> def report(when=None):
... if when is None:
... when = time.time()
... print when
...
>>> report()
1210294762.29
>>> time.sleep(5)
>>> report()
1210294772.23
Exercise: to make sure you've understood: why is this happening?
>>> def spam(eggs=[]):
... eggs.append("spam")
... return eggs
...
>>> spam()
['spam']
>>> spam()
['spam', 'spam']
>>> spam()
['spam', 'spam', 'spam']
>>> spam()
['spam', 'spam', 'spam', 'spam']
Solution 2 - Python
You should be aware of how class variables are handled in Python. Consider the following class hierarchy:
class AAA(object):
x = 1
class BBB(AAA):
pass
class CCC(AAA):
pass
Now, check the output of the following code:
>>> print AAA.x, BBB.x, CCC.x
1 1 1
>>> BBB.x = 2
>>> print AAA.x, BBB.x, CCC.x
1 2 1
>>> AAA.x = 3
>>> print AAA.x, BBB.x, CCC.x
3 2 3
Surprised? You won't be if you remember that class variables are internally handled as dictionaries of a class object. For read operations, if a variable name is not found in the dictionary of current class, the parent classes are searched for it. So, the following code again, but with explanations:
# AAA: {'x': 1}, BBB: {}, CCC: {}
>>> print AAA.x, BBB.x, CCC.x
1 1 1
>>> BBB.x = 2
# AAA: {'x': 1}, BBB: {'x': 2}, CCC: {}
>>> print AAA.x, BBB.x, CCC.x
1 2 1
>>> AAA.x = 3
# AAA: {'x': 3}, BBB: {'x': 2}, CCC: {}
>>> print AAA.x, BBB.x, CCC.x
3 2 3
Same goes for handling class variables in class instances (treat this example as a continuation of the one above):
>>> a = AAA()
# a: {}, AAA: {'x': 3}
>>> print a.x, AAA.x
3 3
>>> a.x = 4
# a: {'x': 4}, AAA: {'x': 3}
>>> print a.x, AAA.x
4 3
Solution 3 - Python
Loops and lambdas (or any closure, really): variables are bound by name
funcs = []
for x in range(5):
funcs.append(lambda: x)
[f() for f in funcs]
# output:
# 4 4 4 4 4
A work around is either creating a separate function or passing the args by name:
funcs = []
for x in range(5):
funcs.append(lambda x=x: x)
[f() for f in funcs]
# output:
# 0 1 2 3 4
Solution 4 - Python
Dynamic binding makes typos in your variable names surprisingly hard to find. It's easy to spend half an hour fixing a trivial bug.
EDIT: an example...
for item in some_list:
... # lots of code
... # more code
for tiem in some_other_list:
process(item) # oops!
Solution 5 - Python
One of the biggest surprises I ever had with Python is this one:
a = ([42],)
a[0] += [43, 44]
This works as one might expect, except for raising a TypeError after updating the first entry of the tuple! So a
will be ([42, 43, 44],)
after executing the +=
statement, but there will be an exception anyway. If you try this on the other hand
a = ([42],)
b = a[0]
b += [43, 44]
you won't get an error.
Solution 6 - Python
try:
int("z")
except IndexError, ValueError:
pass
reason this doesn't work is because IndexError is the type of exception you're catching, and ValueError is the name of the variable you're assigning the exception to.
Correct code to catch multiple exceptions is:
try:
int("z")
except (IndexError, ValueError):
pass
Solution 7 - Python
There was a lot of discussion on hidden language features a while back: hidden-features-of-python. Where some pitfalls were mentioned (and some of the good stuff too).
Also you might want to check out Python Warts.
But for me, integer division's a gotcha:
>>> 5/2
2
You probably wanted:
>>> 5*1.0/2
2.5
If you really want this (C-like) behaviour, you should write:
>>> 5//2
2
As that will work with floats too (and it will work when you eventually go to Python 3):
>>> 5*1.0//2
2.0
GvR explains how integer division came to work how it does on the history of Python.
Solution 8 - Python
Not including an __init__
.py in your packages. That one still gets me sometimes.
Solution 9 - Python
List slicing has caused me a lot of grief. I actually consider the following behavior a bug.
Define a list x
>>> x = [10, 20, 30, 40, 50]
Access index 2:
>>> x[2]
30
As you expect.
Slice the list from index 2 and to the end of the list:
>>> x[2:]
[30, 40, 50]
As you expect.
Access index 7:
>>> x[7]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
Again, as you expect.
However, try to slice the list from index 7 until the end of the list:
>>> x[7:]
[]
???
The remedy is to put a lot of tests when using list slicing. I wish I'd just get an error instead. Much easier to debug.
Solution 10 - Python
The only gotcha/surprise I've dealt with is with CPython's GIL. If for whatever reason you expect python threads in CPython to run concurrently... well they're not and this is pretty well documented by the Python crowd and even Guido himself.
A long but thorough explanation of CPython threading and some of the things going on under the hood and why true concurrency with CPython isn't possible. http://jessenoller.com/2009/02/01/python-threads-and-the-global-interpreter-lock/
Solution 11 - Python
James Dumay eloquently reminded me of another Python gotcha:
Not all of Python's “included batteries” are wonderful.
James’ specific example was the HTTP libraries: httplib
, urllib
, urllib2
, urlparse
, mimetools
, and ftplib
. Some of the functionality is duplicated, and some of the functionality you'd expect is completely absent, e.g. redirect handling. Frankly, it's horrible.
If I ever have to grab something via HTTP these days, I use the urlgrabber module forked from the Yum project.
Solution 12 - Python
Floats are not printed at full precision by default (without repr
):
x = 1.0 / 3
y = 0.333333333333
print x #: 0.333333333333
print y #: 0.333333333333
print x == y #: False
repr
prints too many digits:
print repr(x) #: 0.33333333333333331
print repr(y) #: 0.33333333333300003
print x == 0.3333333333333333 #: True
Solution 13 - Python
Unintentionally mixing oldstyle and newstyle classes can cause seemingly mysterious errors.
Say you have a simple class hierarchy consisting of superclass A and subclass B. When B is instantiated, A's constructor must be called first. The code below correctly does this:
class A(object):
def __init__(self):
self.a = 1
class B(A):
def __init__(self):
super(B, self).__init__()
self.b = 1
b = B()
But if you forget to make A a newstyle class and define it like this:
class A:
def __init__(self):
self.a = 1
you get this traceback:
Traceback (most recent call last):
File "AB.py", line 11, in <module>
b = B()
File "AB.py", line 7, in __init__
super(B, self).__init__()
TypeError: super() argument 1 must be type, not classobj
Two other questions relating to this issue are 489269 and 770134
Solution 14 - Python
def f():
x += 1
x = 42
f()
results in an UnboundLocalError
, because local names are detected statically. A different example would be
def f():
print x
x = 43
x = 42
f()
Solution 15 - Python
You cannot use locals()['x'] = whatever to change local variable values as you might expect.
This works:
>>> x = 1
>>> x
1
>>> locals()['x'] = 2
>>> x
2
BUT:
>>> def test():
... x = 1
... print x
... locals()['x'] = 2
... print x # *** prints 1, not 2 ***
...
>>> test()
1
1
This actually burnt me in an answer here on SO, since I had tested it outside a function and got the change I wanted. Afterwards, I found it mentioned and contrasted to the case of globals() in "Dive Into Python." See example 8.12. (Though it does not note that the change via locals() will work at the top level as I show above.)
Solution 16 - Python
x += [...]
is not the same as x = x + [...]
when x
is a list`
>>> x = y = [1,2,3]
>>> x = x + [4]
>>> x == y
False
>>> x = y = [1,2,3]
>>> x += [4]
>>> x == y
True
One creates a new list while the other modifies in place
Solution 17 - Python
List repetition with nested lists
This caught me out today and wasted an hour of my time debugging:
>>> x = [[]]*5
>>> x[0].append(0)
# Expect x equals [[0], [], [], [], []]
>>> x
[[0], [0], [0], [0], [0]] # Oh dear
Explanation: https://stackoverflow.com/questions/1959744/python-list-problem
Solution 18 - Python
Using class variables when you want instance variables. Most of the time this doesn't cause problems, but if it's a mutable value it causes surprises.
class Foo(object):
x = {}
But:
>>> f1 = Foo()
>>> f2 = Foo()
>>> f1.x['a'] = 'b'
>>> f2.x
{'a': 'b'}
You almost always want instance variables, which require you to assign inside __init__
:
class Foo(object):
def __init__(self):
self.x = {}
Solution 19 - Python
Python 2 has some surprising behaviour with comparisons:
>>> print x
0
>>> print y
1
>>> x < y
False
What's going on? repr()
to the rescue:
>>> print "x: %r, y: %r" % (x, y)
x: '0', y: 1
Solution 20 - Python
If you assign to a variable inside a function, Python assumes that the variable is defined inside that function:
>>> x = 1
>>> def increase_x():
... x += 1
...
>>> increase_x()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in increase_x
UnboundLocalError: local variable 'x' referenced before assignment
Use global x
(or nonlocal x
in Python 3) to declare you want to set a variable defined outside your function.
Solution 21 - Python
The values of range(end_val)
are not only strictly smaller than end_val
, but strictly smaller than int(end_val)
. For a float
argument to range
, this might be an unexpected result:
from future.builtins import range
list(range(2.89))
[0, 1]
Solution 22 - Python
Due to 'truthiness' this makes sense:
>>>bool(1)
True
but you might not expect it to go the other way:
>>>float(True)
1.0
This can be a gotcha if you're converting strings to numeric and your data has True/False values.
Solution 23 - Python
If you create a list of list this way:
arr = [[2]] * 5
print arr
[[2], [2], [2], [2], [2]]
Then this creates an array with all elements pointing to the same object ! This might create a real confusion. Consider this:
arr[0][0] = 5
then if you print arr
print arr
[[5], [5], [5], [5], [5]]
The proper way of initializing the array is for example with a list comprehension:
arr = [[2] for _ in range(5)]
arr[0][0] = 5
print arr
[[5], [2], [2], [2], [2]]