Set MultiIndex of an existing DataFrame in pandas

PythonPandasDataframeIndexingMulti Index

Python Problem Overview


I have a DataFrame that looks like

  Emp1    Empl2           date       Company
0    0        0     2012-05-01         apple
1    0        1     2012-05-29         apple
2    0        1     2013-05-02         apple
3    0        1     2013-11-22         apple
18   1        0     2011-09-09        google
19   1        0     2012-02-02        google
20   1        0     2012-11-26        google
21   1        0     2013-05-11        google

I want to pass the company and date for setting a MultiIndex for this DataFrame. Currently it has a default index. I am using df.set_index(['Company', 'date'], inplace=True)

df = pd.DataFrame()
for c in company_list:
        row = pd.DataFrame([dict(company = '%s' %s, date = datetime.date(2012, 05, 01))])
        df = df.append(row, ignore_index = True)
        for e in emp_list:
            dataset  = pd.read_sql("select company, emp_name, date(date), count(*) from company_table where  = '"+s+"' and emp_name = '"+b+"' group by company, date, name LIMIT 5 ", con)
                if len(dataset) == 0:
                row = pd.DataFrame([dict(sitename='%s' %s, name = '%s' %b, date = datetime.date(2012, 05, 01), count = np.nan)])
                dataset = dataset.append(row, ignore_index=True)
            dataset = dataset.rename(columns = {'count': '%s' %b})
            dataset = dataset.groupby(['company', 'date', 'emp_name'], as_index = False).sum()
            
            dataset = dataset.drop('emp_name', 1)
            df = pd.merge(df, dataset, how = '')
            df = df.sort('date', ascending = True)
            df.fillna(0, inplace = True)

df.set_index(['Company', 'date'], inplace=True)            
print df

But when I print this DataFrame, it prints None. Is this not the correct way of doing it? Also I want to shuffle the positions of the columns company and date so that company becomes the first index, and date becomes the second in Hierarchy. Any ideas on this?

Python Solutions


Solution 1 - Python

When you pass inplace in makes the changes on the original variable and returns None, and the function does not return the modified dataframe, it returns None.

is_none = df.set_index(['Company', 'date'], inplace=True)
df  # the dataframe you want
is_none # has the value None

so when you have a line like:

df = df.set_index(['Company', 'date'], inplace=True)

it first modifies df... but then it sets df to None!

That is, you should just use the line:

df.set_index(['Company', 'date'], inplace=True)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser3527975View Question on Stackoverflow
Solution 1 - PythonAndy HaydenView Answer on Stackoverflow