Pandas Dataframe: split column into multiple columns, right-align inconsistent cell entries

PythonSplitPandas

Python Problem Overview


I have a pandas dataframe with a column named 'City, State, Country'. I want to separate this column into three new columns, 'City, 'State' and 'Country'.

0                 HUN
1                 ESP
2                 GBR
3                 ESP
4                 FRA
5             ID, USA
6             GA, USA
7    Hoboken, NJ, USA
8             NJ, USA
9                 AUS

Splitting the column into three columns is trivial enough:

location_df = df['City, State, Country'].apply(lambda x: pd.Series(x.split(',')))

However, this creates left-aligned data:

     0       1       2
0	 HUN	 NaN     NaN
1	 ESP	 NaN     NaN
2	 GBR	 NaN     NaN
3	 ESP	 NaN     NaN
4	 FRA	 NaN     NaN
5	 ID      USA     NaN
6	 GA      USA     NaN
7	 Hoboken  NJ     USA
8	 NJ      USA     NaN
9	 AUS	 NaN     NaN

How would one go about creating the new columns with the data right-aligned? Would I need to iterate through every row, count the number of commas and handle the contents individually?

Python Solutions


Solution 1 - Python

I'd do something like the following:

foo = lambda x: pd.Series([i for i in reversed(x.split(','))])
rev = df['City, State, Country'].apply(foo)
print rev

      0    1        2
0   HUN  NaN      NaN
1   ESP  NaN      NaN
2   GBR  NaN      NaN
3   ESP  NaN      NaN
4   FRA  NaN      NaN
5   USA   ID      NaN
6   USA   GA      NaN
7   USA   NJ  Hoboken
8   USA   NJ      NaN
9   AUS  NaN      NaN

I think that gets you what you want but if you also want to pretty things up and get a City, State, Country column order, you could add the following:

rev.rename(columns={0:'Country',1:'State',2:'City'},inplace=True)
rev = rev[['City','State','Country']]
print rev

     City State Country
0      NaN   NaN     HUN
1      NaN   NaN     ESP
2      NaN   NaN     GBR
3      NaN   NaN     ESP
4      NaN   NaN     FRA
5      NaN    ID     USA
6      NaN    GA     USA
7  Hoboken    NJ     USA
8      NaN    NJ     USA
9      NaN   NaN     AUS

Solution 2 - Python

Assume you have the column name as target

df[["City", "State", "Country"]] = df["target"].str.split(pat=",", expand=True)

Solution 3 - Python

Since you are dealing with strings I would suggest the amendment to your current code i.e.

location_df = df[['City, State, Country']].apply(lambda x: pd.Series(str(x).split(',')))

I got mine to work by testing one of the columns but give this one a try.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionjamesbevView Question on Stackoverflow
Solution 1 - PythonKarl D.View Answer on Stackoverflow
Solution 2 - PythonDolittle WangView Answer on Stackoverflow
Solution 3 - PythonNaufalView Answer on Stackoverflow