Function not found in R doParallel 'foreach' - Error in { : task 1 failed - "could not find function "raster""

R

R Problem Overview


I am trying to use a high performance cluster at my institution for the first time and I have hit a problem that I can't resolve.

The following code returns an error:

ptime<-system.time({
  r <- foreach(z = 1:length(files),.combine=cbind) %dopar% {
    raster <- raster(paste(folder,files[1],sep=""))
    data<-getValues(raster)
    clp <- na.omit(data)
    for(i in 1:length(classes)){
      results[i,z]<-length(clp[clp==classes[i]])/length(clp)
      print(z)
    }
  }
})

Error in { : task 1 failed - "could not find function "raster""

A also tried a different foreach code for another task I have:

r <- foreach (i=1:length(poly)) %dopar% {
  clip<-gIntersection(paths,poly[i,])
  lgth<-gLength(clip)
  vid<-poly@data[i,3]
  path.lgth[i,] <- c(vid,lgth)
  print(i)
}

and this time the gIntersection function isn't found. Obviously the packages are all installed and loaded. After reading some forum posts it seem it has to do with the environment that the functions execute/operate in.

Can someone please help? I'm not a programmer!

Thank you!

Update:

I have adjusted my code for the solution provided:

results<-matrix(nrow=length(classes),ncol=length(files))
dimnames(results)[[1]]<-classes
dimnames(results)[[2]]<-files

ptime<-system.time({
    foreach(z = 1:length(files),.packages="raster") %dopar% {
    raster <- raster(paste(folder,files[z],sep=""))
    data<-getValues(raster)
    clp <- na.omit(data)
    for(i in 1:length(classes)){
      results[i,z]<-length(clp[clp==classes[i]])/length(clp)
      print(z)
    }
  }
})

But what I get is an output (my results matrix) filled with na's. As you can see I create a matrix object called results to fill with results (which works with for loops), but after reading the documentation for foreach it seems that you save your results differently with this function.

And advice on what I should choose for the .combine argument?

R Solutions


Solution 1 - R

In the vignette of foreach and the help page of foreach, the argument .packages is pointed out as necessary to provide when using parallel computation with functions that are not loaded by default. So your code in the first example should be:

ptime<-system.time({
  r <- foreach(z = 1:length(files),
               .combine=cbind, 
               .packages='raster') %dopar% {
      # some code
      # and more code
  }
})

Some more explanation

The foreach package does a lot of setting up behind the scenes. What happens is the following (in principle, technical details are a tad more complicated):

  • foreach sets up a system of "workers" that you can see as separate R sessions that are each committed to a different core in a cluster.

  • The function that needs to be carried out is loaded into each "worker" session, together with the objects needed to carry out the function

  • each worker calculates the result for a subset of the data

  • The results of the calculation on the different workers is put together and reported in the "master" R session.

As the workers can be seen as separate R sessions, packages from the "master" session are not automatically loaded. You have to specify which packages should be loaded in those worker sessions, and that's what the .package argument of foreach is used for.


Note that when you use other packages (e.g. parallel or snowfall), you'll have to set up these workers explicitly, and also take care of passing objects and loading packages on the worker sessions.

Solution 2 - R

I dealt with the same problem. My solution is

  1. Prepare your function in a separate R file.

Function.R

f <- function(parameters...){Body...}

2. Source your Function within foreach loop

MainFile.R

library(foreach)
library(doParallel)
cores=detectCores()
cl <- makeCluster(cores[1]-2) #not to overload your computer
registerDoParallel(cl)

clusterEvalQ(cl, .libPaths("C:/R/win-library/4.0")) #Give your R library path
output <- foreach(i=1:5, .combine = rbind) %dopar% {
source("~/Function.R") # That is the main point. Source your Function File here.
temp <- f(parameters...) # use your custom function after sourcing 
temp
}

stopCluster(cl)

Solution 3 - R

I had the "could not find function" error using foreach too. Often this question relates to a function from a package not being available to the foreach workers. However my use case is when non-package/custom/orphaned(?)/unexported(?) functions defined in the same script as foreach. Instead of creating a new package and exporting the functions this is a dirty quick fix. Here's the scenario:

A single script with:

funA(){
	...
}

funB(){
	...
}

funC(){    	
    foreach(){
        funA(); funB()
    }
}

funC()

two functions funA & funB defined before funC and then funC called. Looks like these first two aren't considered in the global environment of funC even though they are in the same script, hence the error. Easy solution was to cut & paste funA & funB to the top of funC's definition like so:

funC(){
    funA(){
	    ...
    }
    funB(){
	    ...
    }    	
    foreach(){}
}

funC()

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionKarenView Question on Stackoverflow
Solution 1 - RJoris MeysView Answer on Stackoverflow
Solution 2 - RMudassarView Answer on Stackoverflow
Solution 3 - RTimothy M PollingtonView Answer on Stackoverflow