Getting Google Spreadsheet CSV into A Pandas Dataframe
PythonPandasGoogle SheetsGoogle Drive-ApiGoogle AppsPython Problem Overview
I uploaded a file to Google spreadsheets (to make a publically accessible example IPython Notebook, with data) I was using the file in it's native form could be read into a Pandas Dataframe. So now I use the following code to read the spreadsheet, works fine but just comes in as string,, and I'm not having any luck trying to get it back into a dataframe (you can get the data)
import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
The data ends up looking like: (1st row headers)
',City,region,Res_Comm,mkt_type,Quradate,National_exp,Alabama_exp,Sales_exp,Inventory_exp,Price_exp,Credit_exp\n0,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Rural,1/15/2010,2,2,3,2,3,3\n10,Foley,South_Mobile-Baldwin,Residential,Suburban_Urban,1/15/2010,4,4,4,4,4,3\n12,Birmingham,North_Central-Birmingham-Tuscaloosa-Anniston,Commercial,Suburban_Urban,1/15/2010,2,2,3,2,2,3\n
The native pandas code that brings in the disk resident file looks like:
df = pd.io.parsers.read_csv('/home/tom/Dropbox/Projects/annonallanswerswithmaster1012013.csv',index_col=0,parse_dates=['Quradate'])
A "clean" solution would be helpful to many to provide an easy way to share datasets for Pandas use! I tried a bunch of alternative with no success and I'm pretty sure I'm missing something obvious again.
Just a Update note The new Google spreadsheet has a different URL pattern Just use this in place of the URL in the above example and or the below answer and you should be fine here is an example:
https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&id
see solution below from @Max Ghenis which just used pd.read_csv, no need for StringIO or requests...
Python Solutions
Solution 1 - Python
Seems to work for me without the StringIO
:
test = pd.read_csv('https://docs.google.com/spreadsheets/d/' +
'0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc' +
'/export?gid=0&format=csv',
# Set first column as rownames in data frame
index_col=0,
# Parse column values to datetime
parse_dates=['Quradate']
)
test.head(5) # Same result as @TomAugspurger
BTW, including the ?gid=
enables importing different sheets, find the gid in the URL.
Solution 2 - Python
You can use read_csv()
on a StringIO
object:
from io import BytesIO
import requests
import pandas as pd
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
In [10]: df = pd.read_csv(BytesIO(data), index_col=0,parse_dates=['Quradate'])
In [11]: df.head()
Out[11]:
City region Res_Comm \
0 Dothan South_Central-Montgomery-Auburn-Wiregrass-Dothan Residential
10 Foley South_Mobile-Baldwin Residential
12 Birmingham North_Central-Birmingham-Tuscaloosa-Anniston Commercial
38 Brent North_Central-Birmingham-Tuscaloosa-Anniston Residential
44 Athens North_Huntsville-Decatur-Florence Residential
mkt_type Quradate National_exp Alabama_exp Sales_exp \
0 Rural 2010-01-15 00:00:00 2 2 3
10 Suburban_Urban 2010-01-15 00:00:00 4 4 4
12 Suburban_Urban 2010-01-15 00:00:00 2 2 3
38 Rural 2010-01-15 00:00:00 3 3 3
44 Suburban_Urban 2010-01-15 00:00:00 4 5 4
Inventory_exp Price_exp Credit_exp
0 2 3 3
10 4 4 3
12 2 2 3
38 3 3 2
44 4 4 4
Solution 3 - Python
Open the specific sheet you want in your browser. Make sure it's at least viewable by anyone with the link. Copy and paste the URL. You'll get something like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER
.
sheet_url = 'https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER'
First we turn that into a CSV export URL, like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/export?format=csv&gid=NUMBER
:
csv_export_url = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')
Then we pass it to pd.read_csv, which can take a URL.
df = pd.read_csv(csv_export_url)
This will break if Google changes its API (it seems undocumented), and may give unhelpful errors if a network failure occurs.
Solution 4 - Python
My approach is a bit different. I just used pandas.Dataframe() but obviously needed to install and import gspread. And it worked fine!
gsheet = gs.open("Name")
Sheet_name ="today"
wsheet = gsheet.worksheet(Sheet_name)
dataframe = pd.DataFrame(wsheet.get_all_records())
Solution 5 - Python
I have been using the following utils and it worked so far:
def load_from_gspreadsheet(sheet_name, key):
url = 'https://docs.google.com/spreadsheets/d/{key}/gviz/tq?tqx=out:csv&sheet={sheet_name}&headers=1'.format(
key=key, sheet_name=sheet_name.replace(' ', '%20'))
log.info('Loading google spreadsheet from {}'.format(url))
df = pd.read_csv(url)
return df.drop([col for col in df.columns if col.startswith('Unnamed')], axis=1)
You must specify the sheet_name and the key. The key is the string you get from the url in the following path: https://docs.google.com/spreadsheets/d/{key}/edit/
.
You can change the value of headers if you have more than one row for the column names but I am not sure if it still work with multi-headers.
It may brake if Google will change their APIs.
Also please bear in mind that your spreadsheet must be public, everyone with the link can read it.
Solution 6 - Python
If the csv file was shared via drive and not via spreadsheet then the below change to the url would work
#Derive the id from the google drive shareable link.
#For the file at hand the link is as below
#<https://drive.google.com/open?id=1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69>
file_id='1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69'
link='https://drive.google.com/uc?export=download&id={FILE_ID}'
csv_url=link.format(FILE_ID=file_id)
#The final url would be as below:-
#csv_url='https://drive.google.com/uc?export=download&id=1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69'
df = pd.read_csv(csv_url)
And the dataframe would be (if you just ran the above code)
a b c d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
See working code here.
Solution 7 - Python
In Google Sheets file go to File > Publish to the web > Select .csv (see screenshot) > Copy link
Code
import pandas as pd
path = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSvmELTzIjfSmX8GuV3HE2qomN3uRyvPX8RDzpw77JH33DUbj1bjech7H6NYPArvpZFux0DdJ5L5TKy/pub?output=csv'
data = pd.read_csv(path)
print(data)
Solution 8 - Python
First
- Click on File
- Select Publish to the web tab
- Select which sheet you want as CSV(in case of multiple sheet) and also change the format from webpage to Comma-separated values
- Click on publish
- Copy the link eg: https://docs.google.com/spreadsheets/d/e/{}/pub?gid=0&single=true&output=csv
import pandas as pd
pd.read_csv("https://docs.google.com/spreadsheets/d/e/{}/pub?gid=0&single=true&output=csv")
Solution 9 - Python
This works for me.
import pandas as pd
#Create a public URL
#https://docs.google.com/spreadsheets/d/0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc/edit?usp=sharing
#get spreadsheets key from url
gsheetkey = "0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc"
#sheet name
sheet_name = 'Sheet 1'
url=f'https://docs.google.com/spreadsheet/ccc?key={gsheetkey}&output=xlsx'
df = pd.read_excel(url,sheet_name=sheet_name)
print(df)
Solution 10 - Python
Straight to the point:
- Get your google URL
https://docs.google.com/spreadsheets/d/ this is your sheet ID number/edit#gid=This will be your tab name, it will be a number. Each tab has its own
I like to make a function(not making here) so I separate my variables
- Sheet_id = "Place your sheet ID here"
- Sheet_name = "Place your sheet # here"
the next URL is the tricky party:
- url = f"https://docs.google.com/spreadsheets/d/ {Sheet_id}/export?gid= {Sheet_name}&format=csv"
Then just read it in
df = pd.csv(url)
That's it. If you need to select a different row as a header you can do this
df = pd.csv(url, header=1)