How to concatenate multiple column values into a single column in Pandas dataframe

PythonPandasDataframe

Python Problem Overview


This question is same to this posted earlier. I want to concatenate three columns instead of concatenating two columns:

Here is the combining two columns:

df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})

df['combined']=df.apply(lambda x:'%s_%s' % (x['foo'],x['bar']),axis=1)

df
    bar	foo	new	combined
0	1	a	apple	a_1
1	2	b	banana	b_2
2	3	c	pear	c_3

I want to combine three columns with this command but it is not working, any idea?

df['combined']=df.apply(lambda x:'%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)

Python Solutions


Solution 1 - Python

Another solution using DataFrame.apply(), with slightly less typing and more scalable when you want to join more columns:

cols = ['foo', 'bar', 'new']
df['combined'] = df[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)

Solution 2 - Python

you can simply do:

In[17]:df['combined']=df['bar'].astype(str)+'_'+df['foo']+'_'+df['new']

In[17]:df
Out[18]: 
   bar foo     new    combined
0    1   a   apple   1_a_apple
1    2   b  banana  2_b_banana
2    3   c    pear    3_c_pear

Solution 3 - Python

If you have even more columns you want to combine, using the Series method str.cat might be handy:

df["combined"] = df["foo"].str.cat(df[["bar", "new"]].astype(str), sep="_")

Basically, you select the first column (if it is not already of type str, you need to append .astype(str)), to which you append the other columns (separated by an optional separator character).

Solution 4 - Python

Just wanted to make a time comparison for both solutions (for 30K rows DF):

In [1]: df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})

In [2]: big = pd.concat([df] * 10**4, ignore_index=True)

In [3]: big.shape
Out[3]: (30000, 3)

In [4]: %timeit big.apply(lambda x:'%s_%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)
1 loop, best of 3: 881 ms per loop

In [5]: %timeit big['bar'].astype(str)+'_'+big['foo']+'_'+big['new']
10 loops, best of 3: 44.2 ms per loop

a few more options:

In [6]: %timeit big.ix[:, :-1].astype(str).add('_').sum(axis=1).str.cat(big.new)
10 loops, best of 3: 72.2 ms per loop

In [11]: %timeit big.astype(str).add('_').sum(axis=1).str[:-1]
10 loops, best of 3: 82.3 ms per loop

Solution 5 - Python

The answer given by @allen is reasonably generic but can lack in performance for larger dataframes:

Reduce does a lot better:

from functools import reduce

import pandas as pd

# make data
df = pd.DataFrame(index=range(1_000_000))
df['1'] = 'CO'
df['2'] = 'BOB'
df['3'] = '01'
df['4'] = 'BILL'


def reduce_join(df, columns):
    assert len(columns) > 1
    slist = [df[x].astype(str) for x in columns]
    return reduce(lambda x, y: x + '_' + y, slist[1:], slist[0])


def apply_join(df, columns):
    assert len(columns) > 1
    return df[columns].apply(lambda row:'_'.join(row.values.astype(str)), axis=1)

# ensure outputs are equal
df1 = reduce_join(df, list('1234'))
df2 = apply_join(df, list('1234'))
assert df1.equals(df2)

# profile
%timeit df1 = reduce_join(df, list('1234'))  # 733 ms
%timeit df2 = apply_join(df, list('1234'))   # 8.84 s

Solution 6 - Python

Possibly the fastest solution is to operate in plain Python:

Series(
    map(
        '_'.join,
        df.values.tolist()
        # when non-string columns are present:
        # df.values.astype(str).tolist()
    ),
    index=df.index
)

Comparison against @MaxU answer (using the big data frame which has both numeric and string columns):

%timeit big['bar'].astype(str) + '_' + big['foo'] + '_' + big['new']
# 29.4 ms ± 1.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


%timeit Series(map('_'.join, big.values.astype(str).tolist()), index=big.index)
# 27.4 ms ± 2.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Comparison against @derchambers answer (using their df data frame where all columns are strings):

from functools import reduce

def reduce_join(df, columns):
    slist = [df[x] for x in columns]
    return reduce(lambda x, y: x + '_' + y, slist[1:], slist[0])

def list_map(df, columns):
    return Series(
        map(
            '_'.join,
            df[columns].values.tolist()
        ),
        index=df.index
    )

%timeit df1 = reduce_join(df, list('1234'))
# 602 ms ± 39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df2 = list_map(df, list('1234'))
# 351 ms ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solution 7 - Python

I think you are missing one %s

df['combined']=df.apply(lambda x:'%s_%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)

Solution 8 - Python

@derchambers I found one more solution:

import pandas as pd

# make data
df = pd.DataFrame(index=range(1_000_000))
df['1'] = 'CO'
df['2'] = 'BOB'
df['3'] = '01'
df['4'] = 'BILL'

def eval_join(df, columns):

    sum_elements = [f"df['{col}']" for col in columns]
    to_eval = "+ '_' + ".join(sum_elements)

    return eval(to_eval)


#profile
%timeit df3 = eval_join(df, list('1234')) # 504 ms

Solution 9 - Python

If you have a list of columns you want to concatenate and maybe you'd like to use some separator, here's what you can do

def concat_columns(df, cols_to_concat, new_col_name, sep=" "):
    df[new_col_name] = df[cols_to_concat[0]]
    for col in cols_to_concat[1:]:
        df[new_col_name] = df[new_col_name].astype(str) + sep + df[col].astype(str)

This should be faster than apply and takes an arbitrary number of columns to concatenate.

Solution 10 - Python

df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})

df['combined'] = df['foo'].astype(str)+'_'+df['bar'].astype(str)

If you concatenate with string('_') please you convert the column to string which you want and after you can concatenate the dataframe.

Solution 11 - Python

df['New_column_name'] = df['Column1'].map(str) + 'X' + df['Steps']

X= x is any delimiter (eg: space) by which you want to separate two merged column.

Solution 12 - Python

You could create a function which would make the implementation neater (esp. if you're using this functionality multiple times throughout an implementation):

def concat_cols(df, cols_to_concat, new_col_name, separator):  
    df[new_col_name] = ''
    for i, col in enumerate(cols_to_concat):
        df[new_col_name] += ('' if i == 0 else separator) + df[col].astype(str)
    return df

Sample usage:

test = pd.DataFrame(data=[[1,2,3], [4,5,6], [7,8,9]], columns=['a', 'b', 'c'])
test = concat_cols(test, ['a', 'b', 'c'], 'concat_col', '_')

Solution 13 - Python

First convert the columns to str. Then use the .T.agg('_'.join) function to concatenate them. More info can be gotten here

# Initialize columns
cols_concat = ['first_name', 'second_name']

# Convert them to type str
df[cols_concat] = df[cols_concat].astype('str')

# Then concatenate them as follows
df['new_col'] = df[cols_concat].T.agg('_'.join)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNamAshenaView Question on Stackoverflow
Solution 1 - PythonAllenView Answer on Stackoverflow
Solution 2 - PythonshivsnView Answer on Stackoverflow
Solution 3 - PythoncbrnrView Answer on Stackoverflow
Solution 4 - PythonMaxU - stop genocide of UAView Answer on Stackoverflow
Solution 5 - PythonderchambersView Answer on Stackoverflow
Solution 6 - PythonkrassowskiView Answer on Stackoverflow
Solution 7 - Pythonmilos.aiView Answer on Stackoverflow
Solution 8 - PythonGrzegorzView Answer on Stackoverflow
Solution 9 - PythonDaniil BalabanovView Answer on Stackoverflow
Solution 10 - PythonManivannan MurugavelView Answer on Stackoverflow
Solution 11 - PythonNipun Kumar GoelView Answer on Stackoverflow
Solution 12 - PythonStephenOKView Answer on Stackoverflow
Solution 13 - PythonJane KathambiView Answer on Stackoverflow