Pandas: Difference between pivot and pivot_table. Why is only pivot_table working?

PythonPandasPivot

Python Problem Overview


I have the following dataframe.

df.head(30)

     struct_id  resNum score_type_name  score_value
0   4294967297       1           omega     0.064840
1   4294967297       1          fa_dun     2.185618
2   4294967297       1      fa_dun_dev     0.000027
3   4294967297       1     fa_dun_semi     2.185591
4   4294967297       1             ref    -1.191180
5   4294967297       2            rama    -0.795161
6   4294967297       2           omega     0.222345
7   4294967297       2          fa_dun     1.378923
8   4294967297       2      fa_dun_dev     0.028560
9   4294967297       2      fa_dun_rot     1.350362
10  4294967297       2         p_aa_pp    -0.442467
11  4294967297       2             ref     0.249477
12  4294967297       3            rama     0.267443
13  4294967297       3           omega     0.005106
14  4294967297       3          fa_dun     0.020352
15  4294967297       3      fa_dun_dev     0.025507
16  4294967297       3      fa_dun_rot    -0.005156
17  4294967297       3         p_aa_pp    -0.096847
18  4294967297       3             ref     0.979644
19  4294967297       4            rama    -1.403292
20  4294967297       4           omega     0.212160
21  4294967297       4          fa_dun     4.218029
22  4294967297       4      fa_dun_dev     0.003712
23  4294967297       4     fa_dun_semi     4.214317
24  4294967297       4         p_aa_pp    -0.462765
25  4294967297       4             ref    -1.960940
26  4294967297       5            rama    -0.600053
27  4294967297       5           omega     0.061867
28  4294967297       5          fa_dun     3.663050
29  4294967297       5      fa_dun_dev     0.004953

According to the pivot documentation, I should be able to reshape this on the score_type_name using the pivot function.

df.pivot(columns='score_type_name',values='score_value',index=['struct_id','resNum'])

But, I get the following.

enter image description here

However, pivot_table function seems to work:

pivoted = df.pivot_table(columns='score_type_name',
                         values='score_value',
                         index=['struct_id','resNum'])

enter image description here

But it does not lend itself, for me atleast, to further analysis. I want it to just have the struct_id, resNum, and score_type_name as columns instead of stacking the score_type_name on top of the other columns. Additionally, I want the struct_id to be for every row, and not aggregate in a joined row like it does for the table.

So can anyone tell me how I can get a nice Dataframe like I want using pivot? Additionally, from the documentation, I can't tell why pivot_table works and pivot doesn't. If I look at the first example of pivot, it looks like exactly what I need.

P.S. I did post a question in reference to this problem, but I did such a poor job of demonstrating the output, I deleted it and tried again using ipython notebook. I apologize in advance if you are seeing this twice.

Here is the notebook for your full reference

EDIT - My desired results would look like this (made in excel):

StructId	resNum	pdb_residue_number	chain_id	name3	fa_dun	fa_dun_dev	fa_dun_rot	fa_dun_semi	omega	p_aa_pp	rama	ref
4294967297	1	99	A	ASN	2.1856	0.0000		2.1856	0.0648			-1.1912
4294967297	2	100	A	MET	1.3789	0.0286	1.3504		0.2223	-0.4425	-0.7952	0.2495
4294967297	3	101	A	VAL	0.0204	0.0255	-0.0052		0.0051	-0.0968	0.2674	0.9796
4294967297	4	102	A	GLU	4.2180	0.0037		4.2143	0.2122	-0.4628	-1.4033	-1.9609
4294967297	5	103	A	GLN	3.6630	0.0050		3.6581	0.0619	-0.2759	-0.6001	-1.5172
4294967297	6	104	A	MET	1.5175	0.2206	1.2968		0.0504	-0.3758	-0.7419	0.2495
4294967297	7	105	A	HIS	3.6987	0.0184		3.6804	0.0547	0.4019	-0.1489	0.3883
4294967297	8	106	A	THR	0.1048	0.0134	0.0914		0.0003	-0.7963	-0.4033	0.2013
4294967297	9	107	A	ASP	2.3626	0.0005		2.3620	0.0521	0.1955	-0.3499	-1.6300
4294967297	10	108	A	ILE	1.8447	0.0270	1.8176		0.0971	0.1676	-0.4071	1.0806
4294967297	11	109	A	ILE	0.1276	0.0092	0.1183		0.0208	-0.4026	-0.0075	1.0806
4294967297	12	110	A	SER	0.2921	0.0342	0.2578		0.0342	-0.2426	-1.3930	0.1654
4294967297	13	111	A	LEU	0.6483	0.0019	0.6464		0.0845	-0.3565	-0.2356	0.7611
4294967297	14	112	A	TRP	2.5965	0.1507		2.4457	0.5143	-0.1370	-0.5373	1.2341
4294967297	15	113	A	ASP	2.6448	0.1593			0.0510		-0.5011	

Python Solutions


Solution 1 - Python

For anyone who is still interested in the difference between pivot and pivot_table, there are mainly two differences:

  • pivot_table is a generalization of pivot that can handle duplicate values for one pivoted index/column pair. Specifically, you can give pivot_table a list of aggregation functions using keyword argument aggfunc. The default aggfunc of pivot_table is numpy.mean.
  • pivot_table also supports using multiple columns for the index and column of the pivoted table. A hierarchical index will be automatically generated for you.

REF: pivot and pivot_table

Solution 2 - Python

Another caveat:

pivot_table will only allow numerical types as "values=", whereas pivot will take string types as "values=".

Solution 3 - Python

I debugged it a little bit.

  • The DataFrame.pivot() and DataFrame.pivot_table() are different.
  • pivot() doesn't accept a list for index.
  • pivot_table() accepts.

Internally, both of them are using reset_index()/stack()/unstack() to do the job.

pivot() is just a short cut for simple usage, I think.

Solution 4 - Python

I'm not sure I understand, but I'll give it a try. I usually use stack/unstack instead of pivot, is this closer to what you want?

df.set_index(['struct_id','resNum','score_type_name']).unstack()

                  score_value                                              
score_type_name        fa_dun fa_dun_dev fa_dun_rot fa_dun_semi     omega   
struct_id  resNum                                                           
4294967297 1         2.185618   0.000027        NaN    2.185591  0.064840   
           2         1.378923   0.028560   1.350362         NaN  0.222345   
           3         0.020352   0.025507  -0.005156         NaN  0.005106   
           4         4.218029   0.003712        NaN    4.214317  0.212160   
           5         3.663050   0.004953        NaN         NaN  0.061867   

                                                 
score_type_name     p_aa_pp      rama       ref  
struct_id  resNum                                
4294967297 1            NaN       NaN -1.191180  
           2      -0.442467 -0.795161  0.249477  
           3      -0.096847  0.267443  0.979644  
           4      -0.462765 -1.403292 -1.960940  
           5            NaN -0.600053       NaN  

I'm not sure why your pivot isn't working (kinda seems to me like it should, but I could be wrong), but it does seem to work (or at least not give an error) if I leave off 'struct_id'. Of course, that's not really a useful solution for the full dataset where you have more than one different values for 'struct_id'.

df.pivot(columns='score_type_name',values='score_value',index='resNum')

score_type_name    fa_dun  fa_dun_dev  fa_dun_rot  fa_dun_semi     omega  
resNum                                                                     
1                2.185618    0.000027         NaN     2.185591  0.064840   
2                1.378923    0.028560    1.350362          NaN  0.222345   
3                0.020352    0.025507   -0.005156          NaN  0.005106   
4                4.218029    0.003712         NaN     4.214317  0.212160   
5                3.663050    0.004953         NaN          NaN  0.061867   

score_type_name   p_aa_pp      rama       ref  
resNum                                         
1                     NaN       NaN -1.191180  
2               -0.442467 -0.795161  0.249477  
3               -0.096847  0.267443  0.979644  
4               -0.462765 -1.403292 -1.960940  
5                     NaN -0.600053       NaN  

Edit to add: reset_index() will convert from a multi-index (hierarchical) to a flatter style. There is still some hierarchy in the column names, sometimes the easiest way to get rid of those is just to do df.columns=['var1','var2',...] although there are more sophisticated ways if you do some searching.

df.set_index(['struct_id','resNum','score_type_name']).unstack().reset_index()

                  struct_id resNum score_value                            
score_type_name                         fa_dun fa_dun_dev fa_dun_rot   
0                4294967297      1    2.185618   0.000027        NaN   
1                4294967297      2    1.378923   0.028560   1.350362   
2                4294967297      3    0.020352   0.025507  -0.005156   
3                4294967297      4    4.218029   0.003712        NaN   
4                4294967297      5    3.663050   0.004953        NaN   
                                                                

Solution 5 - Python

pivot() is used for pivoting without aggregation. Therefore, it can’t deal with duplicate values for one index/column pair.

Since here your index=['struct_id','resNum'] have multiple duplicates, therefore pivot doesn't work.

However, pivot_table will work because it will handle duplicate values by aggregating them.

Solution 6 - Python

To get the dataframe you obtained from the pivot_table call into the format you want:

pivoted.columns.name=None  ## remove the score_type_name
result = pivoted.reset_index()  ## puts index columns back into dataframe body

Solution 7 - Python

Before calling pivot we need to ensure that our data does not have rows with duplicate values for the specified columns.

Pivot with duplicate give

Index contains duplicate entries, cannot reshape

If we can’t ensure this we may have to use the pivot_table method instead.

Please find the link below for a more detailed explanation

https://nikgrozev.com/2015/07/01/reshaping-in-pandas-pivot-pivot-table-stack-and-unstack-explained-with-pictures/

Solution 8 - Python

The given snippet may help you out for further flatten the look of your dataframe

df.set_index(['struct_id','resNum','score_type_name']).unstack().reset_index()
df.loc[:,['struct_id','resNum','fa_dun','fa_dun_dev','fa_dun_rot']]

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionjwillis0720View Question on Stackoverflow
Solution 1 - PythonTanachatView Answer on Stackoverflow
Solution 2 - PythonKevin GlynnView Answer on Stackoverflow
Solution 3 - PythonDay.ong LiView Answer on Stackoverflow
Solution 4 - PythonJohnEView Answer on Stackoverflow
Solution 5 - PythonElaineView Answer on Stackoverflow
Solution 6 - PythontegancpView Answer on Stackoverflow
Solution 7 - PythonDheerajView Answer on Stackoverflow
Solution 8 - PythonAsif KhanView Answer on Stackoverflow