Floor or ceiling of a pandas series in python?
PythonPandasSeriesFloorCeilPython Problem Overview
I have a pandas series series
. If I want to get the element-wise floor or ceiling, is there a built in method or do I have to write the function and use apply? I ask because the data is big so I appreciate efficiency. Also this question has not been asked with respect to the Pandas package.
Python Solutions
Solution 1 - Python
You can use NumPy's built in methods to do this: np.ceil(series)
or np.floor(series)
.
Both return a Series object (not an array) so the index information is preserved.
Solution 2 - Python
I am the OP, but I tried this and it worked:
np.floor(series)
Solution 3 - Python
> UPDATE: THIS ANSWER IS WRONG, DO NOT DO THIS
>
> Explanation: using Series.apply()
with a native vectorized Numpy function makes
> no sense in most cases as it will run the Numpy function in a Python loop, leading to much worse performance. You'd be much better off using
> np.floor(series)
directly, as suggested by several other answers.
You could do something like this using NumPy's floor, for instance, with a dataframe
:
floored_data = data.apply(np.floor)
Can't test it right now but an actual and working solution might not be far from it.
Solution 4 - Python
With pd.Series.clip
, you can set a floor via clip(lower=x)
or ceiling via clip(upper=x)
:
s = pd.Series([-1, 0, -5, 3])
print(s.clip(lower=0))
# 0 0
# 1 0
# 2 0
# 3 3
# dtype: int64
print(s.clip(upper=0))
# 0 -1
# 1 0
# 2 -5
# 3 0
# dtype: int64
pd.Series.clip
allows generalised functionality, e.g. applying and flooring a ceiling simultaneously, e.g. s.clip(-1, 1)
NOTE: Answer originally referred to clip_lower
/ clip_upper
which were removed in pandas 1.0.0.
Solution 5 - Python
The pinned answer already the fastest. Here's I provide some alternative to do ceiling and floor using pure pandas and compare it with the numpy approach.
series = pd.Series(np.random.normal(100,20,1000000))
Floor
%timeit np.floor(series) # 1.65 ms ± 18.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit series.astype(int) # 2.2 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit (series-0.5).round(0) # 3.1 ms ± 47 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit round(series-0.5,0) # 2.83 ms ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Why astype int works? Because in Python, when converting to integer, that it always get floored.
Ceil
%timeit np.ceil(series) # 1.67 ms ± 21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit (series+0.5).round(0) # 3.15 ms ± 46.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit round(series+0.5,0) # 2.99 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So yeah, just use the numpy function.