How to deal with hdf5 files in R?

RHdf5

R Problem Overview


I have a file in hdf5 format. I know that it is supposed to be a matrix, but I want to read that matrix in R so that I can study it. I see that there is a h5r package that is supposed to help with this, but I do not see any simple to read/understand tutorial. Is such a tutorial available online. Specifically, How do you read a hdf5 object with this package, and how to actually extract the matrix?

UPDATE

I found out a package rhdf5 which is not part of CRAN but is part of BioConductoR. The interface is relatively easier to understand the the documentation and example code is quite clear. I could use it without problems. My problem it seems was the input file. The matrix that I wanted to read was actually stored in the hdf5 file as a python pickle. So every time I tried to open it and access it through R i got a segmentation fault. I did figure out how to save the matrix from within python as a tsv file and now that problem is solved.

R Solutions


Solution 1 - R

The rhdf5 package works really well, although it is not in CRAN. Install it from Bioconductor

# as of 2020-09-08, these are the updated instructions per
# https://bioconductor.org/install/

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install(version = "3.11")

And to use it:

library(rhdf5)

List the objects within the file to find the data group you want to read:

h5ls("path/to/file.h5")

Read the HDF5 data:

mydata <- h5read("path/to/file.h5", "/mygroup/mydata")

And inspect the structure:

str(mydata)

(Note that multidimensional arrays may appear transposed). Also you can read groups, which will be named lists in R.

Solution 2 - R

You could also use h5, a package which I recently published on CRAN. Compared to rhdf5 it has the following features:

  1. S4 object model to directly interact with HDF5 objects like files, groups, datasets and attributes.
  2. Simpler syntax, implemented R-like subsetting operators for datasets supporting commands like
readdata <- dataset[1:3, 1:3]
dataset[1:3, 1:3] <- matrix(1:9, nrow = 3)
  1. Supported NA values for all data types
  2. 200+ Test cases with a code coverage of 80%+.

To save a matrix you could use:

library(h5)
testmat <- matrix(rnorm(120), ncol = 3)
# Create HDF5 File
file <- h5file("test.h5")
# Save matrix to file in group 'testgroup' and datasetname 'testmat'
file["testgroup", "testmat"] <- testmat
# Close file
h5close(file)

... and read the entire matrix back into R:

file <- h5file("test.h5")
testmat_in <- file["testgroup", "testmat"][]
h5close(file)

See also h5 on

Solution 3 - R

I used the rgdal package to read HDF5 files. You do need to take care that probably the binary version of rgdal does not support hdf5. In that case, you need to build gdal from source with HDF5 support before building rgdal from source.

Alternatively, try and convert the files from hdf5 to netcdf. Once they are in netcdf, you can use the excellent ncdf package to access the data. The conversion I think could be done with the cdo tool.

Solution 4 - R

The ncdf4 package, an interface to netCDF-4, can also be used to read hdf5 files (netCDF-4 is compatible with netCDF-3, but it uses hdf5 as the storage layer).

In the developer's words:

> NetCDF-4 combines the netCDF-3 and HDF5 data models, taking the desirable characteristics of each, while taking advantage of their separate strengths

> The netCDF-4 format implements and expands the netCDF-3 data model by using an enhanced version of HDF5 as the storage layer.

In practice, ncdf4 provides a simple interface, and migrating code from using older hdf5 and ncdf packages to a single ncdf4 package has made our code less buggy and easier to write (some of my trials and workarounds are documented in my previous answer).

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSamView Question on Stackoverflow
Solution 1 - RMike TView Answer on Stackoverflow
Solution 2 - Ruser625626View Answer on Stackoverflow
Solution 3 - RPaul HiemstraView Answer on Stackoverflow
Solution 4 - RDavid LeBauerView Answer on Stackoverflow