How do I cache steps in GitHub actions?

GithubGithub Actions

Github Problem Overview


Say I have a GitHub actions workflow with 2 steps.

  1. Download and compile my application's dependencies.
  2. Compile and test my application

My dependencies rarely change and the compiled dependencies can be safely cached until I next change the lock-file that specifies their versions.

Is a way to save the result of the first step so that in future workflow can skip over that step?

Github Solutions


Solution 1 - Github

Caching is now natively supported via the cache action. It works across both jobs and workflows within a repository. See also: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/caching-dependencies-to-speed-up-workflows.

Consider the following example:

name: GitHub Actions Workflow with NPM cache

on: [push]

jobs:
  build:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v1

    - name: Cache NPM dependencies
      uses: actions/cache@v1
      with:
        path: ~/.npm
        key: ${{ runner.OS }}-npm-cache-${{ hashFiles('**/package-lock.json') }}
        restore-keys: |
          ${{ runner.OS }}-npm-cache-

    - name: Install NPM dependencies
      run: npm install

Where the path and key parameters of the cache action is used to identify the cache.

The optional restore-keys is used for a possible fallback to a partial match (i.e. if package-lock.json changes the previous cache will be used).

Prefixing the keys with some id (npm-cache in this example) is useful when the restore-keys fallback is used and there're multiple different caches (e.g. for JS packages and for system packages). Otherwise, one cache could fall back to the other unrelated cache. Similarly, an OS prefix useful when using matrix builds so caches of different systems don't get mixed up.

You can also build your own reusable caching logic with @actions/cache such as:


Old answer:

Native caching is not currently possible, expected to be implemented by mid-November 2019.

You can use artifacts (1, 2) to move directories between jobs (within 1 workflow) as proposed on the GH Community board. This, however, doesn't work across workflows.

Solution 2 - Github

The cache action can only cache the contents of a folder. So if there is such a folder, you may win some time by caching it.

For instance, if you use some imaginary package-installer (like Python's pip or virtualenv, or NodeJS' npm, or anything else that puts its files into a folder), you can win some time by doing it like this:

    - uses: actions/cache@v2
      id: cache-packages  # give it a name for checking the cache hit-or-not
      with:
        path: ./packages/  # what we cache: the folder
        key: ${{ runner.os }}-packages-${{ hashFiles('**/packages*.txt') }}
        restore-keys: |
          ${{ runner.os }}-packages-
    - run: package-installer packages.txt
      if: steps.cache-packages.outputs.cache-hit != 'true'

So what's important here:

  1. We give this step a name, cache-packages
  2. Later, we use this name for conditional execution: if, steps.cache-packages.outputs.cache-hit != 'true'
  3. Give the cache action a path to the folder you want to cache: ./packages/
  4. Cache key: something that depends on the hash of your input files. That is, if any packages.txt file changes, the cache will be rebuilt.
  5. The second step, package installer, will only be run if there was no cache

For users of virtualenv: if you need to activate some shell environment, you have to do it in every step. Like this:

- run: . ./environment/activate && command

Solution 3 - Github

> My dependencies rarely change and the compiled dependencies can be safely cached until I next change the lock-file that specifies their versions. Is a way to save the result of the first step so that in future workflow can skip over that step?

The first step being:

> Download and compile my application's dependencies.

GitHub Actions themselves will not do this for you. The only advice I can give you is that you adhere to Docker best practices in order to ensure that if Actions do make use of docker caching, your image could be re-used instead of rebuilt. See: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache

> When building an image, Docker steps through the instructions in your Dockerfile, executing each in the order specified. As each instruction is examined, Docker looks for an existing image in its cache that it can reuse, rather than creating a new (duplicate) image.

This also implies that the underlying system of GitHub Actions can/will leverage the Docker caching.

However things like compilation, Docker won't be able to use the cache mechanism, so I suggest you think very well if this is something you desperately need. The alternative is to download the compiled/processed files from an artifact store (Nexus, NPM, MavenCentral) to skip that step. You do have to weight the benefits vs the complexity you are adding to your build on this.

Solution 4 - Github

If you are using Docker in your WorkFlows, as @peterevans answered, GitHub now supports caching through the cache action, but it has its limitations.

For that reason, you might find useful this action to bypass GitHub's action limitations.

> Disclaimer: I created the action to support cache before GitHub did it officially, and I still use it because of its simplicity and flexibility.

Solution 5 - Github

Solution 6 - Github

I'll summarize the two options:

  1. Caching
  2. Docker

Caching

You can add a command in your workflow to cache directories. When that step is reached, it'll check if the directory that you specified was previously saved. If so, it'll grab it. If not, it won't. Then in further steps you write checks to see if the cached data is present. For example, say you are compiling some dependency that is large and doesn't change much. You could add a cache step at the beginning of your workflow, then a step to build the contents of the directory if they aren't there. The first time that you run it won't find the files but subsequently it will and your workflow will run faster.

Behind the scenes, GitHub is uploading a zip of your directory to github's own AWS storage. They purge anything older than a week or if you hit a 2GB limit.

Some drawbacks with this technique is that it saves just directories. So if you installed into /usr/bin, you'll have to cache that! That would be awkward. You should instead install into $home/.local and use echo set-env to add that to your path.

Docker

Docker is a little more complex and it means that you have to have a dockerhub account and manage two things now. But it's way more powerful. Instead of saving just a directory, you'll save an entire computer! What you'll do is make a Dockerfile that will have in it all your dependencies, like apt-get and python pip lines or even long compilation. Then you'll build that docker image and publish it on dockerhub. Finally, you'll have your tests set to run on that new docker image, instead of on eg, ubuntu-latest. And from now on, instead of installing dependencies, it'll just download the image.

You can automate this further by storing that Dockerfile in the same GitHub repo as the project and then write a job with steps that will download the latest docker image, rebuild if necessary just the changed steps, and then upload to dockerhub. And then a job which "needs" that one and uses the image. That way your workflow will both update the docker image if needed and also use it.

The downsides is that your deps will be in one file, the Dockerfile, and the tests in the workflow, so it's not all together. Also, if the time to download the image is more than the time to build the dependencies, this is a poor choice.


I think that each one has upsides and downsides. Caching is only good for really simple stuff, like compiling into .local. If you need something more extensive, Docker is the most powerful.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionlpilView Question on Stackoverflow
Solution 1 - GithubthisismydesignView Answer on Stackoverflow
Solution 2 - GithubkolyptoView Answer on Stackoverflow
Solution 3 - GithubbitoiuView Answer on Stackoverflow
Solution 4 - GithubwhoanView Answer on Stackoverflow
Solution 5 - GithubbitoiuView Answer on Stackoverflow
Solution 6 - GithubEyalView Answer on Stackoverflow