Access a URL and read Data with R

UrlR

Url Problem Overview


Is there a way I can specify and get data from a web site URL on to a CSV file for analysis using R?

Url Solutions


Solution 1 - Url

In the simplest case, just do

X <- read.csv(url("http://some.where.net/data/foo.csv"))

plus which ever options read.csv() may need.

Edit in Sep 2020 or 9 years later:

For a few years now R also supports directly passing the URL to read.csv:

X <- read.csv("http://some.where.net/data/foo.csv")

End of 2020 edit. Original post continutes.

Long answer: Yes this can be done and many packages have use that feature for years. E.g. the tseries packages uses exactly this feature to download stock prices from Yahoo! for almost a decade:

R> library(tseries)
Loading required package: quadprog
Loading required package: zootseriesversion: 0.10-24tseriesis a package for time series analysis and computational finance.

    Seelibrary(help="tseries")’ for details.

R> get.hist.quote("IBM")
trying URL 'http://chart.yahoo.com/table.csv?    ## manual linebreak here
  s=IBM&a=0&b=02&c=1991&d=5&e=08&f=2011&g=d&q=q&y=0&z=IBM&x=.csv'
Content type 'text/csv' length unknown
opened URL
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
........
downloaded 258 Kb

             Open   High    Low  Close
1991-01-02 112.87 113.75 112.12 112.12
1991-01-03 112.37 113.87 112.25 112.50
1991-01-04 112.75 113.00 111.87 112.12
1991-01-07 111.37 111.87 110.00 110.25
1991-01-08 110.37 110.37 108.75 109.00
1991-01-09 109.75 110.75 106.75 106.87
[...]

This is all exceedingly well documented in the manual pages for help(connection) and help(url). Also see the manul on 'Data Import/Export' that came with R.

Solution 2 - Url

base

read.csv without the url function just works fine. Probably I am missing something if Dirk Eddelbuettel included it in his answer:

ad <- read.csv("http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv")
head(ad)

  X    TV radio newspaper sales
1 1 230.1  37.8      69.2  22.1
2 2  44.5  39.3      45.1  10.4
3 3  17.2  45.9      69.3   9.3
4 4 151.5  41.3      58.5  18.5
5 5 180.8  10.8      58.4  12.9
6 6   8.7  48.9      75.0   7.2

Another options using two popular packages:

data.table

library(data.table)
ad <- fread("http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv")
head(ad)

V1    TV radio newspaper sales
1:  1 230.1  37.8      69.2  22.1
2:  2  44.5  39.3      45.1  10.4
3:  3  17.2  45.9      69.3   9.3
4:  4 151.5  41.3      58.5  18.5
5:  5 180.8  10.8      58.4  12.9
6:  6   8.7  48.9      75.0   7.2

readr

library(readr)
ad <- read_csv("http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv")
head(ad)

# A tibble: 6 x 5
     X1    TV radio newspaper sales
  <int> <dbl> <dbl>     <dbl> <dbl>
1     1 230.1  37.8      69.2  22.1
2     2  44.5  39.3      45.1  10.4
3     3  17.2  45.9      69.3   9.3
4     4 151.5  41.3      58.5  18.5
5     5 180.8  10.8      58.4  12.9
6     6   8.7  48.9      75.0   7.2

Solution 3 - Url

Often data on webpages is in the form of an XML table. You can read an XML table into R using the package XML.

In this package, the function

readHTMLTable(<url>)

will look through a page for XML tables and return a list of data frames (one for each table found).

Solution 4 - Url

Beside of read.csv(url("...")) you also can use read.table("http://...").

Example:

> sample <- read.table("http://www.ats.ucla.edu/stat/examples/ara/angell.txt")
> sample
                V1   V2   V3   V4 V5
1        Rochester 19.0 20.6 15.0  E
2         Syracuse 17.0 15.6 20.2  E
...
43         Atlanta  4.2 70.6 32.6  S
> 

Solution 5 - Url

scan can read from a web page automatically; you don't necessarily have to mess with connections.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser597551View Question on Stackoverflow
Solution 1 - UrlDirk EddelbuettelView Answer on Stackoverflow
Solution 2 - UrlmpalancoView Answer on Stackoverflow
Solution 3 - UrlDavidCView Answer on Stackoverflow
Solution 4 - UrllarkeeView Answer on Stackoverflow
Solution 5 - UrlAaron left Stack OverflowView Answer on Stackoverflow