How to build and fill pandas dataframe from for loop?

PythonPandas

Python Problem Overview


Here is a simple example of the code I am running, and I would like the results put into a pandas dataframe (unless there is a better option):

for p in game.players.passing():
    print p, p.team, p.passing_att, p.passer_rating()

R.Wilson SEA 29 55.7
J.Ryan SEA 1 158.3
A.Rodgers GB 34 55.8

Using this code:

d = []
for p in game.players.passing():
    d = [{'Player': p, 'Team': p.team, 'Passer Rating':
        p.passer_rating()}]

pd.DataFrame(d)

I can get:

	Passer Rating	Player	    Team
  0	55.8	        A.Rodgers	GB

Which is a 1x3 dataframe, and I understand why it is only one row but I can't figure out how to make it multi-row with the columns in the correct order. Ideally the solution would be able to deal with n number of rows (based on p) and it would be wonderful (although not essential) if the number of columns would be set by the number of stats requested. Any suggestions? Thanks in advance!

Python Solutions


Solution 1 - Python

The simplest answer is what Paul H said:

d = []
for p in game.players.passing():
    d.append(
        {
            'Player': p,
            'Team': p.team,
            'Passer Rating':  p.passer_rating()
        }
    )

pd.DataFrame(d)

But if you really want to "build and fill a dataframe from a loop", (which, btw, I wouldn't recommend), here's how you'd do it.

d = pd.DataFrame()

for p in game.players.passing():
    temp = pd.DataFrame(
        {
            'Player': p,
            'Team': p.team,
            'Passer Rating': p.passer_rating()
        }
    )

    d = pd.concat([d, temp])

Solution 2 - Python

Try this using list comprehension:

import pandas as pd

df = pd.DataFrame(
    [p, p.team, p.passing_att, p.passer_rating()] for p in game.players.passing()
)

Solution 3 - Python

Make a list of tuples with your data and then create a DataFrame with it:

d = []
for p in game.players.passing():
    d.append((p, p.team, p.passer_rating()))

pd.DataFrame(d, columns=('Player', 'Team', 'Passer Rating'))

A list of tuples should have less overhead than a list dictionaries. I tested this below, but please remember to prioritize ease of code understanding over performance in most cases.

Testing functions:

def with_tuples(loop_size=1e5):
    res = []
    
    for x in range(int(loop_size)):
        res.append((x-1, x, x+1))
    
    return pd.DataFrame(res, columns=("a", "b", "c"))

def with_dict(loop_size=1e5):
    res = []
    
    for x in range(int(loop_size)):
        res.append({"a":x-1, "b":x, "c":x+1})
    
    return pd.DataFrame(res)

Results:

%timeit -n 10 with_tuples()
# 10 loops, best of 3: 55.2 ms per loop

%timeit -n 10 with_dict()
# 10 loops, best of 3: 130 ms per loop

Solution 4 - Python

I may be wrong, but I think the accepted answer by @amit has a bug.

from pandas import DataFrame as df
x = [1,2,3]
y = [7,8,9,10]

# this gives me a syntax error at 'for' (Python 3.7)
d1 = df[[a, "A", b, "B"] for a in x for b in y]

# this works
d2 = df([a, "A", b, "B"] for a in x for b in y)

# and if you want to add the column names on the fly
# note the additional parentheses
d3 = df(([a, "A", b, "B"] for a in x for b in y), columns = ("l","m","n","o"))

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionc.j.mcdonnView Question on Stackoverflow
Solution 1 - PythonNick MarinakisView Answer on Stackoverflow
Solution 2 - PythonAmitView Answer on Stackoverflow
Solution 3 - PythonSeanny123View Answer on Stackoverflow
Solution 4 - Pythonbzip2View Answer on Stackoverflow