Access index in pandas.Series.apply

PythonPandas

Python Problem Overview


Lets say I have a MultiIndex Series s:

>>> s
     values
a b
1 2  0.1 
3 6  0.3
4 4  0.7

and I want to apply a function which uses the index of the row:

def f(x):
   # conditions or computations using the indexes
   if x.index[0] and ...: 
   other = sum(x.index) + ...
   return something

How can I do s.apply(f) for such a function? What is the recommended way to make this kind of operations? I expect to obtain a new Series with the values resulting from this function applied on each row and the same MultiIndex.

Python Solutions


Solution 1 - Python

I don't believe apply has access to the index; it treats each row as a numpy object, not a Series, as you can see:

In [27]: s.apply(lambda x: type(x))
Out[27]: 
a  b
1  2    <type 'numpy.float64'>
3  6    <type 'numpy.float64'>
4  4    <type 'numpy.float64'>

To get around this limitation, promote the indexes to columns, apply your function, and recreate a Series with the original index.

Series(s.reset_index().apply(f, axis=1).values, index=s.index)

Other approaches might use s.get_level_values, which often gets a little ugly in my opinion, or s.iterrows(), which is likely to be slower -- perhaps depending on exactly what f does.

Solution 2 - Python

Make it a frame, return scalars if you want (so the result is a series)

Setup

In [11]: s = Series([1,2,3],dtype='float64',index=['a','b','c'])

In [12]: s
Out[12]: 
a    1
b    2
c    3
dtype: float64

Printing function

In [13]: def f(x):
    print type(x), x
    return x
   ....: 

In [14]: pd.DataFrame(s).apply(f)
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
Out[14]: 
   0
a  1
b  2
c  3

Since you can return anything here, just return the scalars (access the index via the name attribute)

In [15]: pd.DataFrame(s).apply(lambda x: 5 if x.name == 'a' else x[0] ,1)
Out[15]: 
a    5
b    2
c    3
dtype: float64

Solution 3 - Python

Convert to DataFrame and apply along row. You can access the index as x.name. x is also a Series now with 1 value

s.to_frame(0).apply(f, axis=1)[0]

Solution 4 - Python

You may find it faster to use where rather than apply here:

In [11]: s = pd.Series([1., 2., 3.], index=['a' ,'b', 'c'])

In [12]: s.where(s.index != 'a', 5)
Out[12]: 
a    5
b    2
c    3
dtype: float64

Also you can use numpy-style logic/functions to any of the parts:

In [13]: (2 * s + 1).where((s.index == 'b') | (s.index == 'c'), -s)
Out[13]: 
a   -1
b    5
c    7
dtype: float64

In [14]: (2 * s + 1).where(s.index != 'a', -s)
Out[14]: 
a   -1
b    5
c    7
dtype: float64

I recommend testing for speed (as efficiency against apply will depend on the function). Although, I find that applys are more readable...

Solution 5 - Python

You can access the whole row as argument inside the fucntion if you use DataFrame.apply() instead of Series.apply().

def f1(row):
	if row['I'] < 0.5:
		return 0
	else:
		return 1

def f2(row):
	if row['N1']==1:
		return 0
	else:
		return 1

import pandas as pd
import numpy as np
df4 = pd.DataFrame(np.random.rand(6,1), columns=list('I'))
df4['N1']=df4.apply(f1, axis=1)
df4['N2']=df4.apply(f2, axis=1)
 

Solution 6 - Python

Use reset_index() to convert the Series to a DataFrame and the index to a column, and then apply your function to the DataFrame.

The tricky part is knowing how reset_index() names the columns, so here are a couple of examples.

With a Singly Indexed Series

s=pd.Series({'idx1': 'val1', 'idx2': 'val2'})

def use_index_and_value(row):
    return 'I made this with index {} and value {}'.format(row['index'], row[0])

s2 = s.reset_index().apply(use_index_and_value, axis=1)

# The new Series has an auto-index;
# You'll want to replace that with the index from the original Series
s2.index = s.index
s2

Output:

idx1    I made this with index idx1 and value val1
idx2    I made this with index idx2 and value val2
dtype: object

With a Multi-Indexed Series

Same concept here, but you'll need to access the index values as row['level_*'] because that's where they're placed by Series.reset_index().

s=pd.Series({
    ('idx(0,0)', 'idx(0,1)'): 'val1',
    ('idx(1,0)', 'idx(1,1)'): 'val2'
})

def use_index_and_value(row):
    return 'made with index: {},{} & value: {}'.format(
        row['level_0'],
        row['level_1'],
        row[0]
    )

s2 = s.reset_index().apply(use_index_and_value, axis=1)

# Replace auto index with the index from the original Series
s2.index = s.index
s2

Output:

idx(0,0)  idx(0,1)    made with index: idx(0,0),idx(0,1) & value: val1
idx(1,0)  idx(1,1)    made with index: idx(1,0),idx(1,1) & value: val2
dtype: object

If your series or indexes have names, you will need to adjust accordingly.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionelyaseView Question on Stackoverflow
Solution 1 - PythonDan AllanView Answer on Stackoverflow
Solution 2 - PythonJeffView Answer on Stackoverflow
Solution 3 - PythonnehzView Answer on Stackoverflow
Solution 4 - PythonAndy HaydenView Answer on Stackoverflow
Solution 5 - PythonVladimir LeontievView Answer on Stackoverflow
Solution 6 - PythonwaterproofView Answer on Stackoverflow