numpy: Efficiently avoid 0s when taking log(matrix)
NumpyNumpy Problem Overview
from numpy import *
m = array([[1,0],
[2,3]])
I would like to compute the element-wise log2(m)
, but only in the places where m
is not 0. In those places, I would like to have 0 as a result.
I am now fighting against:
RuntimeWarning: divide by zero encountered in log2
Try 1: using where
res = where(m != 0, log2(m), 0)
which computes me the correct result, but I still get logged a RuntimeWarning: divide by zero encountered in log2
. It looks like (and syntactically it is quite obvious) numpy still computes log2(m)
on the full matrix and only afterwards where
picks the values to keep.
I would like to avoid this warning.
Try 2: using masks
from numpy import ma
res = ma.filled(log2(ma.masked_equal(m, 0)), 0)
Sure masking away the zeros will prevent log2
to get applied to them, won't it? Unfortunately not: We still get RuntimeWarning: divide by zero encountered in log2
.
Even though the matrix is masked, log2
still seems to be applied to every element.
How can I efficiently compute the element-wise log of a numpy array without getting division-by-zero warnings?
- Of course I could temporarily disable the logging of these warnings using
seterr
, but that doesn't look like a clean solution. - And sure a double for loop would help with treating 0s specially, but defeats the efficiency of numpy.
Any ideas?
Numpy Solutions
Solution 1 - Numpy
We can use masked arrays for this:
>>> from numpy import *
>>> m = array([[1,0], [2,3]])
>>> x = ma.log(m)
>>> print x.filled(0)
[[ 0. 0. ]
[ 0.69314718 1.09861229]]
Solution 2 - Numpy
Another option is to use the where
parameter of numpy's ufuncs:
m = np.array([[1., 0], [2, 3]])
res = np.log2(m, out=np.zeros_like(m), where=(m!=0))
No RuntimeWarning
is raised, and zeros are introduced where the log is not computed.
Solution 3 - Numpy
Simply disable the warning for that computation:
from numpy import errstate,isneginf,array
m = array([[1,0],[2,3]])
with errstate(divide='ignore'):
res = log2(m)
And then you can postprocess the -inf
if you want:
res[isneginf(res)]=0
EDIT: I put here some comments about the other option, which is using masked arrays, posted in the other answer. You should opt for disabling the error for two reasons:
-
Using masked arrays is by far less efficient then disabling momentarily the error, and you asked for efficiency.
-
Disabling the specific 'divide by zero' warning does NOT disable the other problem with calculating the log of a number, which is negative input. Negative input is captured as an 'invalid value' warning, and you will have to deal with it.
On the other hand, using masked arrays captures the two errors as the same, and will lead you to not notice a negative number in the input. In other words, a negative number in the input is treated like a zero, and will give zero as a result. This is not what you asked.
- As a last point and as a personal opinion, disabling the warning is very readable, it is obvious what the code is doing and makes it more mantainable. In that respect, I find this solution cleaner then using masked arrays.
Solution 4 - Numpy
The masked array solution and the solution that disables the warning are both fine. For variety, here's another that uses scipy.special.xlogy
. np.sign(m)
is given as the x
argument, so xlogy
returns 0 wherever np.sign(m)
is 0.
The result is divided by np.log(2)
to give the base-2 logarithm.
In [4]: from scipy.special import xlogy
In [5]: m = np.array([[1, 0], [2, 3]])
In [6]: xlogy(np.sign(m), m) / np.log(2)
Out[6]:
array([[ 0. , 0. ],
[ 1. , 1.5849625]])
Solution 5 - Numpy
Problem
For an array containing zeros
or negatives
we get the respective errors.
y = np.log(x)
# RuntimeWarning: divide by zero encountered in log
# RuntimeWarning: invalid value encountered in log
Solution
markroxor suggests np.clip
, in my example this creates a horizontal floor. gg349 and others use np.errstate
and np.seterr
, I think these are clunky and does not solve the problem. As a note np.complex
doesn't work for zeros. user3315095 uses indexing p=0<x
, and NumPy.log has this functionality built in, where
/out
. mdeff demonstrates this, but replaces the -inf
with 0
which for me was insufficient, and doesn't solve for negatives.
I suggest 0<x
and np.nan
(or if needed np.NINF
/-np.inf
).
y = np.log(x, where=0<x, out=np.nan*x)
John Zwinck uses mask matrix np.ma.log
this works but is computationally slower, try App:timeit.
Example
import numpy as np
x = np.linspace(-10, 10, 300)
# y = np.log(x) # Old
y = np.log(x, where=0<x, out=np.nan*x) # New
import matplotlib.pyplot as plt
plt.plot(x, y)
plt.show()
App:timeit
Time Comparison for mask
and where
import numpy as np
import time
def timeit(fun, xs):
t = time.time()
for i in range(len(xs)):
fun(xs[i])
print(time.time() - t)
xs = np.random.randint(-10,+10, (1000,10000))
timeit(lambda x: np.ma.log(x).filled(np.nan), xs)
timeit(lambda x: np.log(x, where=0<x, out=np.nan*x), xs)
Solution 6 - Numpy
What about the following
from numpy import *
m=array((-1.0,0.0,2.0))
p=m > 0.0
print 'positive=',p
print m[p]
res=zeros_like(m)
res[p]=log(m[p])
print res
Solution 7 - Numpy
You can use something like -
m = np.clip(m, 1e-12, None)
to avoid the log(0) error. This will set the lower bound to 1e-12
.