Update index after sorting data-frame

PythonPandas

Python Problem Overview


Take the following data-frame:

x = np.tile(np.arange(3),3)
y = np.repeat(np.arange(3),3)
df = pd.DataFrame({"x": x, "y": y})

   x  y
0  0  0
1  1  0
2  2  0
3  0  1
4  1  1
5  2  1
6  0  2
7  1  2
8  2  2

I need to sort it by x first, and only second by y:

df2 = df.sort(["x", "y"])

   x  y
0  0  0
3  0  1
6  0  2
1  1  0
4  1  1
7  1  2
2  2  0
5  2  1
8  2  2

How can I change the index such that it is ascending again. I.e. how do I get this:

   x  y
0  0  0
1  0  1
2  0  2
3  1  0
4  1  1
5  1  2
6  2  0
7  2  1
8  2  2

I have tried the following. Unfortunately, it doesn't change the index at all:

df2.reindex(np.arange(len(df2.index)))

Python Solutions


Solution 1 - Python

You can reset the index using reset_index to get back a default index of 0, 1, 2, ..., n-1 (and use drop=True to indicate you want to drop the existing index instead of adding it as an additional column to your dataframe):

In [19]: df2 = df2.reset_index(drop=True)

In [20]: df2
Out[20]:
   x  y
0  0  0
1  0  1
2  0  2
3  1  0
4  1  1
5  1  2
6  2  0
7  2  1
8  2  2

Solution 2 - Python

Since pandas 1.0.0 df.sort_values has a new parameter ignore_index which does exactly what you need:

In [1]: df2 = df.sort_values(by=['x','y'],ignore_index=True)

In [2]: df2
Out[2]:
   x  y
0  0  0
1  0  1
2  0  2
3  1  0
4  1  1
5  1  2
6  2  0
7  2  1
8  2  2

Solution 3 - Python

df.sort() is deprecated, use df.sort_values(...): https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html

Then follow joris' answer by doing df.reset_index(drop=True)

Solution 4 - Python

You can set new indices by using set_index:

df2.set_index(np.arange(len(df2.index)))

Output:

   x  y
0  0  0
1  0  1
2  0  2
3  1  0
4  1  1
5  1  2
6  2  0
7  2  1
8  2  2

Solution 5 - Python

The following works!

  1. If you want to change the existing dataframe itself, you may directly use

     df.sort_values(by=['col1'], inplace=True)
     df.reset_index(drop=True, inplace=True)
    
     df
     >>     col1  col2  col3 col4
         0    A     2     0    a
         1    A     1     1    B
         2    B     9     9    c
         5    C     4     3    F
         4    D     7     2    e
         3  NaN     8     4    D
    
  2. Else, if you don't want to change the existing dataframe but want to store the sorted dataframe into another variable separately, you may use:

    df_sorted = df.sort_values(by=['col1']).reset_index(drop=True)
    
    df_sorted
    >>     col1  col2  col3 col4
        0    A     2     0    a
        1    A     1     1    B
        2    B     9     9    c
        3    C     4     3    F
        4    D     7     2    e
        5  NaN     8     4    D
    
    df
    >>       col1  col2  col3 col4
          0    A     2     0    a
          1    A     1     1    B
          2    B     9     9    c
          3  NaN     8     4    D
          4    D     7     2    e
          5    C     4     3    F
    

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionLemmingView Question on Stackoverflow
Solution 1 - PythonjorisView Answer on Stackoverflow
Solution 2 - PythonDavidView Answer on Stackoverflow
Solution 3 - PythonaaronpenneView Answer on Stackoverflow
Solution 4 - PythonilyakhovView Answer on Stackoverflow
Solution 5 - Pythonvagdevi kView Answer on Stackoverflow