Pandas concat: ValueError: Shape of passed values is blah, indices imply blah2

PythonPandas

Python Problem Overview


I'm trying to merge a (Pandas 14.1) dataframe and a series. The series should form a new column, with some NAs (since the index values of the series are a subset of the index values of the dataframe).

This works for a toy example, but not with my data (detailed below).

Example:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6, 4), columns=['A', 'B', 'C', 'D'], index=pd.date_range('1/1/2011', periods=6, freq='D'))
df1

A	B	C	D
2011-01-01	-0.487926	0.439190	0.194810	0.333896
2011-01-02	1.708024	0.237587	-0.958100	1.418285
2011-01-03	-1.228805	1.266068	-1.755050	-1.476395
2011-01-04	-0.554705	1.342504	0.245934	0.955521
2011-01-05	-0.351260	-0.798270	0.820535	-0.597322
2011-01-06	0.132924	0.501027	-1.139487	1.107873

s1 = pd.Series(np.random.randn(3), name='foo', index=pd.date_range('1/1/2011', periods=3, freq='2D'))
s1

2011-01-01   -1.660578
2011-01-03   -0.209688
2011-01-05    0.546146
Freq: 2D, Name: foo, dtype: float64

pd.concat([df1, s1],axis=1)

A	B	C	D	foo
2011-01-01	-0.487926	0.439190	0.194810	0.333896	-1.660578
2011-01-02	1.708024	0.237587	-0.958100	1.418285	NaN
2011-01-03	-1.228805	1.266068	-1.755050	-1.476395	-0.209688
2011-01-04	-0.554705	1.342504	0.245934	0.955521	NaN
2011-01-05	-0.351260	-0.798270	0.820535	-0.597322	0.546146
2011-01-06	0.132924	0.501027	-1.139487	1.107873	NaN

The situation with the data (see below) seems basically identical - concatting a series with a DatetimeIndex whose values are a subset of the dataframe's. But it gives the ValueError in the title (blah1 = (5, 286) blah2 = (5, 276) ). Why doesn't it work?:

In[187]: df.head()
Out[188]:
high	low	loc_h	loc_l
time				
2014-01-01 17:00:00	1.376235	1.375945	1.376235	1.375945
2014-01-01 17:01:00	1.376005	1.375775	NaN	NaN
2014-01-01 17:02:00	1.375795	1.375445	NaN	1.375445
2014-01-01 17:03:00	1.375625	1.375515	NaN	NaN
2014-01-01 17:04:00	1.375585	1.375585	NaN	NaN
In [186]: df.index
Out[186]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-01 17:00:00, ..., 2014-01-01 21:30:00]
Length: 271, Freq: None, Timezone: None

In [189]: hl.head()
Out[189]:
2014-01-01 17:00:00    1.376090
2014-01-01 17:02:00    1.375445
2014-01-01 17:05:00    1.376195
2014-01-01 17:10:00    1.375385
2014-01-01 17:12:00    1.376115
dtype: float64

In [187]:hl.index
Out[187]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-01 17:00:00, ..., 2014-01-01 21:30:00]
Length: 89, Freq: None, Timezone: None

In: pd.concat([df, hl], axis=1)
Out: [stack trace] ValueError: Shape of passed values is (5, 286), indices imply (5, 276)

Python Solutions


Solution 1 - Python

I had a similar problem (join worked, but concat failed).

Check for duplicate index values in df1 and s1, (e.g. df1.index.is_unique)

Removing duplicate index values (e.g., df.drop_duplicates(inplace=True)) or one of the methods here https://stackoverflow.com/a/34297689/7163376 should resolve it.

Solution 2 - Python

My problem were different indices, the following code solved my problem.

df1.reset_index(drop=True, inplace=True)
df2.reset_index(drop=True, inplace=True)
df = pd.concat([df1, df2], axis=1)

Solution 3 - Python

Aus_lacy's post gave me the idea of trying related methods, of which join does work:

In [196]:

hl.name = 'hl'
Out[196]:
'hl'
In [199]:

df.join(hl).head(4)
Out[199]:
high	low	loc_h	loc_l	hl
2014-01-01 17:00:00	1.376235	1.375945	1.376235	1.375945	1.376090
2014-01-01 17:01:00	1.376005	1.375775	NaN	NaN	NaN
2014-01-01 17:02:00	1.375795	1.375445	NaN	1.375445	1.375445
2014-01-01 17:03:00	1.375625	1.375515	NaN	NaN	NaN

Some insight into why concat works on the example but not this data would be nice though!

Solution 4 - Python

> To drop duplicate indices, use df = df.loc[df.index.drop_duplicates()]. C.f. pandas.pydata.org/pandas-docs/stable/generated/… – BallpointBen Apr 18 at 15:25

This is wrong but I can't reply directly to BallpointBen's comment due to low reputation. The reason its wrong is that df.index.drop_duplicates() returns a list of unique indices, but when you index back into the dataframe using those the unique indices it still returns all records. I think this is likely because indexing using one of the duplicated indices will return all instances of the index.

Instead, use df.index.duplicated(), which returns a boolean list (add the ~ to get the not-duplicated records):

df = df.loc[~df.index.duplicated()]

Solution 5 - Python

Your indexes probably contains duplicated values.

import pandas as pd

T1_INDEX = [
    0,
    1,  # <= !!! if I write e.g.: "0" here then it fails
    0.2,
]
T1_COLUMNS = [
    'A', 'B', 'C', 'D'
]
T1 = [
    [1.0, 1.1, 1.2, 1.3],
    [2.0, 2.1, 2.2, 2.3],
    [3.0, 3.1, 3.2, 3.3],
]

T2_INDEX = [
    1.2,
    2.11,
]

T2_COLUMNS = [
    'D', 'E', 'F',
]
T2 = [
    [54.0, 5324.1, 3234.2],
    [55.0, 14.5324, 2324.2],
    # [3.0, 3.1, 3.2],
]
df1 = pd.DataFrame(T1, columns=T1_COLUMNS, index=T1_INDEX)
df2 = pd.DataFrame(T2, columns=T2_COLUMNS, index=T2_INDEX)


print(pd.concat([pd.DataFrame({})] + [df2, df1], axis=1))

Solution 6 - Python

Try sorting index after concatenating them

result=pd.concat([df1,df2]).sort_index()

Solution 7 - Python

Maybe it is simple, try this if you have a DataFrame. then make sure that both matrices or vectros that you're trying to combine have the same rows_name/index

I had the same issue. I changed the name indices of the rows to make them match each other here is an example for a matrix (principal component) and a vector(target) have the same row indicies (I circled them in the blue in the leftside of the pic)

Before, "when it was not working", I had the matrix with normal row indicies (0,1,2,3) while I had the vector with row indices (ID0, ID1, ID2, ID3) then I changed the vector's row indices to (0,1,2,3) and it worked for me.

enter image description here

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionbironeView Question on Stackoverflow
Solution 1 - Pythonlmart999View Answer on Stackoverflow
Solution 2 - PythonflowView Answer on Stackoverflow
Solution 3 - PythonbironeView Answer on Stackoverflow
Solution 4 - PythonJeremy MattView Answer on Stackoverflow
Solution 5 - PythonkfrView Answer on Stackoverflow
Solution 6 - Pythonjibran abbasiView Answer on Stackoverflow
Solution 7 - PythonAhmadView Answer on Stackoverflow