Is there a way to copy only the structure (not the data) of a Pandas DataFrame?

PythonPandasDataframe

Python Problem Overview


I received a DataFrame from somewhere and want to create another DataFrame with the same number and names of columns and rows (indexes). For example, suppose that the original data frame was created as

import pandas as pd
df1 = pd.DataFrame([[11,12],[21,22]], columns=['c1','c2'], index=['i1','i2'])

I copied the structure by explicitly defining the columns and names:

df2 = pd.DataFrame(columns=df1.columns, index=df1.index)    

I don't want to copy the data, otherwise I could just write df2 = df1.copy(). In other words, after df2 being created it must contain only NaN elements:

In [1]: df1
Out[1]: 
    c1  c2
i1  11  12
i2  21  22

In [2]: df2
Out[2]: 
     c1   c2
i1  NaN  NaN
i2  NaN  NaN

Is there a more idiomatic way of doing it?

Python Solutions


Solution 1 - Python

That's a job for reindex_like. Start with the original:

df1 = pd.DataFrame([[11, 12], [21, 22]], columns=['c1', 'c2'], index=['i1', 'i2'])

Construct an empty DataFrame and reindex it like df1:

pd.DataFrame().reindex_like(df1)
Out: 
    c1  c2
i1 NaN NaN
i2 NaN NaN   

Solution 2 - Python

In version 0.18 of pandas, the DataFrame constructor has no options for creating a dataframe like another dataframe with NaN instead of the values.

The code you use df2 = pd.DataFrame(columns=df1.columns, index=df1.index) is the most logical way, the only way to improve on it is to spell out even more what you are doing is to add data=None, so that other coders directly see that you intentionally leave out the data from this new DataFrame you are creating.

TLDR: So my suggestion is:

Explicit is better than implicit
df2 = pd.DataFrame(data=None, columns=df1.columns, index=df1.index)

Very much like yours, but more spelled out.

Solution 3 - Python

Not exactly answering this question, but a similar one for people coming here via a search engine

My case was creating a copy of the data frame without data and without index. One can achieve this by doing the following. This will maintain the dtypes of the columns.

empty_copy = df.drop(df.index)

Solution 4 - Python

Let's start with some sample data

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([[1, 'a'], [2, 'b'], [3, 'c']],
   ...:                   columns=['num', 'char'])

In [3]: df
Out[3]: 
   num char
0    1    a
1    2    b
2    3    c

In [4]: df.dtypes
Out[4]: 
num      int64
char    object
dtype: object
Now let's use a simple DataFrame initialization using the columns of the original DataFrame but providing no data:
In [5]: empty_copy_1 = pd.DataFrame(data=None, columns=df.columns)

In [6]: empty_copy_1
Out[6]: 
Empty DataFrame
Columns: [num, char]
Index: []

In [7]: empty_copy_1.dtypes
Out[7]: 
num     object
char    object
dtype: object

As you can see, the column data types are not the same as in our original DataFrame.

So, if you want to preserve the column dtype...

If you want to preserve the column data types you need to construct the DataFrame one Series at a time

In [8]: empty_copy_2 = pd.DataFrame.from_items([
   ...:     (name, pd.Series(data=None, dtype=series.dtype))
   ...:     for name, series in df.iteritems()])

In [9]: empty_copy_2
Out[9]: 
Empty DataFrame
Columns: [num, char]
Index: []

In [10]: empty_copy_2.dtypes
Out[10]: 
num      int64
char    object
dtype: object

Solution 5 - Python

A simple alternative -- first copy the basic structure or indexes and columns with datatype from the original dataframe (df1) into df2

df2 = df1.iloc[0:0]

Then fill your dataframe with empty rows -- pseudocode that will need to be adapted to better match your actual structure:

s = pd.Series([Nan,Nan,Nan], index=['Col1', 'Col2', 'Col3'])

loop through the rows in df1

df2 = df2.append(s)

Solution 6 - Python

To preserve column type you can use the astype method, like pd.DataFrame(columns=df1.columns).astype(df1.dtypes)

import pandas as pd

df1 = pd.DataFrame(
    [
        [11, 12, 'Alice'],
        [21, 22, 'Bob']
    ],
    columns=['c1', 'c2', 'c3'],
    index=['i1', 'i2']
)

df2 = pd.DataFrame(columns=df1.columns).astype(df1.dtypes)
print(df2.shape)
print(df2.dtypes)

output:

(0, 3)
c1     int64
c2     int64
c3    object
dtype: object

Working example

Solution 7 - Python

You can simply mask by notna() i.e

df1 = pd.DataFrame([[11, 12], [21, 22]], columns=['c1', 'c2'], index=['i1', 'i2'])

df2 = df1.mask(df1.notna())

    c1  c2
i1 NaN NaN
i2 NaN NaN

Solution 8 - Python

A simple way to copy df structure into df2 is:

df2 = pd.DataFrame(columns=df.columns)

Solution 9 - Python

This has worked for me in pandas 0.22: df2 = pd.DataFrame(index=df.index.delete(slice(None)), columns=df.columns)

Convert types: df2 = df2.astype(df.dtypes)

delete(slice(None)) In case you do not want to keep the values ​​of the indexes.

Solution 10 - Python

I know this is an old question, but I thought I would add my two cents.

def df_cols_like(df):
    """
    Returns an empty data frame with the same column names and types as df
    """
    df2 = pd.DataFrame({i[0]: pd.Series(dtype=i[1])
                        for i in df.dtypes.iteritems()},
                       columns=df.dtypes.index)
    return df2

This approach centers around the df.dtypes attribute of the input data frame, df, which is a pd.Series. A pd.DataFrame is constructed from a dictionary of empty pd.Series objects named using the input column names with the column order being taken from the input df.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionbmelloView Question on Stackoverflow
Solution 1 - PythonayhanView Answer on Stackoverflow
Solution 2 - PythonfirelynxView Answer on Stackoverflow
Solution 3 - PythonMartijn LentinkView Answer on Stackoverflow
Solution 4 - PythonPedro M DuarteView Answer on Stackoverflow
Solution 5 - PythondavmarcView Answer on Stackoverflow
Solution 6 - PythonLukasz KusznerView Answer on Stackoverflow
Solution 7 - PythonBharathView Answer on Stackoverflow
Solution 8 - PythonHaddock-sanView Answer on Stackoverflow
Solution 9 - PythonfelocruView Answer on Stackoverflow
Solution 10 - PythonPhilView Answer on Stackoverflow