Updating value in iterrow for pandas

PythonLoopsPandasExplicit

Python Problem Overview


I am doing some geocoding work that I used selenium to screen scrape the x-y coordinate I need for address of a location, I imported an xls file to panda dataframe and want to use explicit loop to update the rows which do not have the x-y coordinate, like below:

for index, row in rche_df.iterrows():
    if isinstance(row.wgs1984_latitude, float):
        row = row.copy()
        target = row.address_chi        
        dict_temp = geocoding(target)
        row.wgs1984_latitude = dict_temp['lat']
        row.wgs1984_longitude = dict_temp['long']

I have read https://stackoverflow.com/questions/15972264/why-doesnt-this-function-take-after-i-iterrows-over-a-pandas-dataframe and am fully aware that iterrow only gives us a view rather than a copy for editing, but what if I really to update the value row by row? Is lambda feasible?

Python Solutions


Solution 1 - Python

The rows you get back from iterrows are copies that are no longer connected to the original data frame, so edits don't change your dataframe. Thankfully, because each item you get back from iterrows contains the current index, you can use that to access and edit the relevant row of the dataframe:

for index, row in rche_df.iterrows():
    if isinstance(row.wgs1984_latitude, float):
        row = row.copy()
        target = row.address_chi        
        dict_temp = geocoding(target)
        rche_df.loc[index, 'wgs1984_latitude'] = dict_temp['lat']
        rche_df.loc[index, 'wgs1984_longitude'] = dict_temp['long']

In my experience, this approach seems slower than using an approach like apply or map, but as always, it's up to you to decide how to make the performance/ease of coding tradeoff.

Solution 2 - Python

Another way based on this question:

for index, row in rche_df.iterrows():
    if isinstance(row.wgs1984_latitude, float):
        row = row.copy()
        target = row.address_chi        
        dict_temp = geocoding(target)
        
        rche_df.at[index, 'wgs1984_latitude'] = dict_temp['lat']
        rche_df.at[index, 'wgs1984_longitude'] = dict_temp['long']

This link describe difference between .loc and .at. Shortly, .at faster than .loc.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionlokheartView Question on Stackoverflow
Solution 1 - PythonMariusView Answer on Stackoverflow
Solution 2 - PythonAlireza MazochiView Answer on Stackoverflow