How do I get a list of built-in data sets in R?

RDatasetR Faq

R Problem Overview


Can someone please help how to get the list of built-in data sets and their dependency packages?

R Solutions


Solution 1 - R

There are several ways to find the included datasets in R:

1: Using data() will give you a list of the datasets of all loaded packages (and not only the ones from the datasets package); the datasets are ordered by package

2: Using data(package = .packages(all.available = TRUE)) will give you a list of all datasets in the available packages on your computer (i.e. also the not-loaded ones)

3: Using data(package = "packagename") will give you the datasets of that specific package, so data(package = "plyr") will give the datasets in the plyr package


If you want to know in which package a dataset is located (e.g. the acme dataset), you can do:

dat <- as.data.frame(data(package = .packages(all.available = TRUE))$results)
dat[dat$Item=="acme", c(1,3,4)]

which gives:

    Package Item                  Title
107    boot acme Monthly Excess Returns

Solution 2 - R

I often need to also know which structure of datasets are available, so I created dataStr in my misc package.

dataStr <- function(package="datasets", ...)
  {
  d <- data(package=package, envir=new.env(), ...)$results[,"Item"]
  d <- sapply(strsplit(d, split=" ", fixed=TRUE), "[", 1)
  d <- d[order(tolower(d))]
  for(x in d){ message(x, ":  ", class(get(x))); message(str(get(x)))}
  }
dataStr()

Please mind that the output in the console is quite long.

This is the type of output:

[...]

warpbreaks:  data.frame
'data.frame':	54 obs. of  3 variables:
 $ breaks : num  26 30 54 25 70 52 51 26 67 18 ...
 $ wool   : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
 $ tension: Factor w/ 3 levels "L","M","H": 1 1 1 1 1 1 1 1 1 2 ...

WorldPhones:  matrix
 num [1:7, 1:7] 45939 60423 64721 68484 71799 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:7] "1951" "1956" "1957" "1958" ...
  ..$ : chr [1:7] "N.Amer" "Europe" "Asia" "S.Amer" ...

WWWusage:  ts
 Time-Series [1:100] from 1 to 100: 88 84 85 85 84 85 83 85 88 89 ...

Edit: To get more informative output and use it for unloaded packages or all the packages on the search path, please use the revised online version with

source("https://raw.githubusercontent.com/brry/berryFunctions/master/R/dataStr.R")

Solution 3 - R

Here is a comprehensive R packages datasets list maintained by Prof. Vincent Arel-Bundock. https://vincentarelbundock.github.io/Rdatasets/

> Rdatasets is a collection of nearly 1500 datasets that were originally > distributed alongside the statistical software environment R and some > of its add-on packages. The goal is to make these data more broadly > accessible for teaching and statistical software development.

Solution 4 - R

Run

help(package = "datasets")

in the R Studio console and you'll get all available datasets in the tidy Help tab on the right.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionmockashView Question on Stackoverflow
Solution 1 - RJaapView Answer on Stackoverflow
Solution 2 - RBerry BoessenkoolView Answer on Stackoverflow
Solution 3 - RAyşe NurView Answer on Stackoverflow
Solution 4 - RIgor MicevView Answer on Stackoverflow