Read a CSV from github into R

RData ManipulationData Management

R Problem Overview


I am trying to read a CSV from github into R:

latent.growth.data <- read.csv("https://github.com/aronlindberg/latent_growth_classes/blob/master/LGC_data.csv")

However, this gives me:

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : unsupported URL scheme

I tried ?read.csv, ?download.file, getURL (which only returned strange HTML), as well as the data import manual, but still cannot understand how to make it work.

What am I doing wrong?

R Solutions


Solution 1 - R

Try this:

library(RCurl)
x <- getURL("https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv")
y <- read.csv(text = x)

You have two problems:

  1. You're not linking to the "raw" text file, but Github's display version (visit the URL for https:\raw.github.com....csv to see the difference between the raw version and the display version).
  2. https is a problem for R in many cases, so you need to use a package like RCurl to get around it. In some cases (not with Github, though) you can simply replace https with http and things work out, so you can always try that out first, but I find using RCurl reliable and not too much extra typing.

Solution 2 - R

From the documentation of url:

> Note that ‘https://’ connections are not supported (with some exceptions on Windows).

So the problem is that R does not allow conncetions to https URL's.

You can use download.file with curl:

download.file("https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv", 
    destfile = "/tmp/test.csv", method = "curl")

Solution 3 - R

I am using R 3.0.2 and this code does the job.

urlfile<-'https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
dsin<-read.csv(urlfile)

and this as well

urlfile<-'https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
dsin<-read.csv(url(urlfile))

edit (sessionInfo)

R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250   
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C                  
[5] LC_TIME=Polish_Poland.1250    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.0.2

Solution 4 - R

In similar style to akhmed, I thought I would update the answer, since now you can just use Hadley's readr package. Just one thing to note: you'll need the url to be the raw content (see the //raw.git... below). Here's an example:

library(readr)
data <- read_csv("https://raw.githubusercontent.com/RobertMyles/Bayesian-Ideal-Point-IRT-Models/master/Senate_Example.csv")

Voilà!

Solution 5 - R

Realizing that the question is very old, Google still reported it as a top result (at least for me) so I decided to provide the answer for year 2015.

Folks are generally migrating now to curl package (including famous httr) as described by r-bloggers which offers the following very simple solution:

library(curl)

x <- read.csv( curl("https://raw.githubusercontent.com/trinker/dummy/master/data/gcircles.csv") )

Solution 6 - R

This is what I've been helping develop rio for. It's basically a universal data import/export package that supports HTTPS/SSL and infers the file type from its extension, thus allowing you to read basically anything using one import function:

library("rio")

If you grab the "raw" url for your CSV from Github, you can load it one line with import:

import("https://raw.githubusercontent.com/aronlindberg/latent_growth_classes/master/LGC_data.csv")

The result is a data.frame:

     top100_repository_name   month monthly_increase monthly_begin_at monthly_end_with
1                    Bukkit 2012-03                9              431              440
2                    Bukkit 2012-04               19              438              457
3                    Bukkit 2012-05               19              455              474
4                    Bukkit 2012-06               18              475              493
5                    Bukkit 2012-07               15              492              507
6                    Bukkit 2012-08               50              506              556
...

Solution 7 - R

Seems nowadays GitHub wants you to go through their API to fetch content. I used the gh package as follows:

require(gh)

tmp = tempfile()
qurl = 'https://raw.githubusercontent.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
# download
gh(paste0('GET ', qurl), .destfile = tmp, .overwrite = TRUE)
# read
read.csv(tmp)

The important part is that you provide an personal access token (PAT). Either through the gh(.token = ) argument, or as I did, by setting the PAT globally in an ~/.Renviron file [1]. Of course you first have to create the PAT at your GitHub account.

[1] ~/.Renviron, I guess is searched first by all r-lib packages, as gh is one. The token therein should look like this:

GITHUB_PAT = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

You could also use the usethis package to set up the PAT.

Solution 8 - R

curl might not work in windows at least for me

This is what worked for me in Windows

download.file("https://github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv", 
    destfile = "/tmp/test.csv",method="wininet")

In Linux

download.file("https://github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv", 
    destfile = "/tmp/test.csv",method="curl")

Solution 9 - R

A rather dummy way... using copy/paste from clipboard

x <- read.table(file = "clipboard", sep = "t", header=TRUE)

Solution 10 - R

As mentioned by other postings, just go to the link for the raw code on github.

For example:

x <- read.csv("https://raw.githubusercontent.com/rfordatascience/ tidytuesday/master/data/2018/2018-04-23/week4_australian_salary.csv")

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionhistelheimView Question on Stackoverflow
Solution 1 - RA5C1D2H2I1M1N2O1R2T1View Answer on Stackoverflow
Solution 2 - RPaul HiemstraView Answer on Stackoverflow
Solution 3 - RMaciejView Answer on Stackoverflow
Solution 4 - RRobertMylesView Answer on Stackoverflow
Solution 5 - RakhmedView Answer on Stackoverflow
Solution 6 - RThomasView Answer on Stackoverflow
Solution 7 - RandscharView Answer on Stackoverflow
Solution 8 - Rakhil vangalaView Answer on Stackoverflow
Solution 9 - RLeftyView Answer on Stackoverflow
Solution 10 - RzeejayView Answer on Stackoverflow