Docker COPY files using glob pattern?

DockerDockerfileYarnpkg

Docker Problem Overview


I have a monorepo managed by Yarn, I'd like to take advantage of the Docker cache layers to speed up my builds, to do so I'd like to first copy the package.json and yarn.lock files, run yarn install and then copy the rest of the files.

This is my repo structure:

packages/one/package.json
packages/one/index.js
packages/two/package.json
packages/two/index.js
package.json
yarn.lock

And this is the interested part of the Dockerfile:

COPY package.json .
COPY yarn.lock .
COPY packages/**/package.json ./
RUN yarn install --pure-lockfile
COPY . .

The problem is that the 3rd COPY command doesn't copy anything, how can I achieve the expected result?

Docker Solutions


Solution 1 - Docker

There is a solution based on multistage-build feature:

FROM node:12.18.2-alpine3.11

WORKDIR /app
COPY ["package.json", "yarn.lock", "./"]
# Step 2: Copy whole app
COPY packages packages

# Step 3: Find and remove non-package.json files
RUN find packages \! -name "package.json" -mindepth 2 -maxdepth 2 -print | xargs rm -rf

# Step 4: Define second build stage
FROM node:12.18.2-alpine3.11

WORKDIR /app
# Step 5: Copy files from the first build stage.
COPY --from=0 /app .

RUN yarn install --frozen-lockfile

COPY . .

# To restore workspaces symlinks
RUN yarn install --frozen-lockfile

CMD yarn start

On Step 5 the layer cache will be reused even if any file in packages directory has changed.

Solution 2 - Docker

As mentioned in the official Dockerfile reference for COPY <src> <dest>

>The COPY instruction copies new files or directories from <src> and adds them to the filesystem of the container at the path <dest>.

For your case >Each may contain wildcards and matching will be done using Go’s filepath.Match rules.

These are the rules. They contain this:

> '*' matches any sequence of non-Separator characters

So try to use * instead of ** in your pattern.

Solution 3 - Docker

If you can't technically enumerate all the subdirectories at stake in the Dockerfile (namely, writing COPY packages/one/package.json packages/one/ for each one), but want to copy all the files in two steps and take advantage of Docker's caching feature, you can try the following workaround:

  • Devise a wrapper script (say, in bash) that copies the required package.json files to a separate directory (say, .deps/) built with a similar hierarchy, then call docker build …
  • Adapt the Dockerfile to copy (and rename) the separate directory beforehand, and then call yarn install --pure-lockfile

All things put together, this could lead to the following files:

./build.bash:

#!/bin/bash

tag="image-name:latest"

rm -f -r .deps  # optional, to be sure that there is
# no extraneous "package.json" from a previous build

find . -type d \( -path \*/.deps \) -prune -o \
  -type f \( -name "package.json" \) \
  -exec bash -c 'dest=".deps/$1" && \
    mkdir -p -- "$(dirname "$dest")" && \
    cp -av -- "$1" "$dest"' bash '{}' \;
# instead of mkdir + cp, you may also want to use
# rsync if it is available in your environment...

sudo docker build -t "$tag" .

and

./Dockerfile:

FROM …

WORKDIR /usr/src/app

# COPY package.json .  # subsumed by the following command
COPY .deps .
# and not "COPY .deps .deps", to avoid doing an extra "mv"
COPY yarn.lock .
RUN yarn install --pure-lockfile

COPY . .
# Notice that "COPY . ." will also copy the ".deps" folder; this is
# maybe a minor issue, but it could be avoided by passing more explicit
# paths than just "." (or by adapting the Dockerfile and the script and
# putting them in the parent folder of the Yarn application itself...)

Solution 4 - Docker

Using Docker's new BuildKit executor it has become possible to use a bind mount into the Docker context, from which you can then copy any files as needed.

For example, the following snippet copies all package.json files from the Docker context into the image's /app/ directory (the workdir in the below example)

Unfortunately, changing any file in the mount still results in a layer cache miss. This can be worked around using the multi-stage approach as presented by @mbelsky, but this time the explicit deletion is no longer need.

# syntax = docker/dockerfile:1.2
FROM ... AS packages

WORKDIR /app/
RUN --mount=type=bind,target=/docker-context \
    cd /docker-context/; \
    find . -name "package.json" -mindepth 0 -maxdepth 4 -exec cp --parents "{}" /app/ \;

FROM ...

WORKDIR /app/
COPY --from=packages /app/ .

The mindepth/maxdepth arguments are specified to reduce the number of directories to search, this can be adjusted/removed as desirable for your use-case.

It may be necessary to enable the BuildKit executor using environment variable DOCKER_BUILDKIT=1, as the traditional executor silently ignores the bind mounts.

More information about BuildKit and bind bounds can be found here.

Solution 5 - Docker

Following @Joost suggestion, I've created a dockerfile that utilizes the power of BuildKit to achieve the following:

  • Faster npm install by moving npm's cache directory to the build cache
  • Skipping npm install if nothing changed in package.json files since last successful build

Pseudo Code:

  • Get all package.json files from the build context
  • Compare them to the package.json files from the last successful build
  • If changes were found, run npm install and cache the package.json files + node_modules folder
  • Copy the node_modules (fresh or cached) to the desired location in the image
# syntax = docker/dockerfile:1.2
FROM node:14-alpine AS builder

# https://github.com/opencollective/opencollective/issues/1443
RUN apk add --no-cache ncurses

# must run as root
RUN npm config set unsafe-perm true

WORKDIR /app

# get a temporary copy of the package.json files from the build context
RUN --mount=id=website-packages,type=bind,target=/tmp/builder \
    cd /tmp/builder/ && \
    mkdir /tmp/packages && \
    chown 1000:1000 /tmp/packages && \
    find ./ -name "package.json" -mindepth 0 -maxdepth 6 -exec cp --parents "{}" /tmp/packages/ \;

# check if package.json files were changed since the last successful build
RUN --mount=id=website-build-cache,type=cache,target=/tmp/builder,uid=1000 \
    mkdir -p /tmp/builder/packages && \
    cd /tmp/builder/packages && \
    (diff -qr ./ /tmp/packages/ || (touch /tmp/builder/.rebuild && echo "Found an updated package.json"));

USER node

COPY --chown=node:node . /app

# run `npm install` if package.json files were changed, or use the cached node_modules/
RUN --mount=id=website-build-cache,type=cache,target=/tmp/builder,uid=1000 \
    echo "Creating NPM cache folders" && \
    mkdir -p /tmp/builder/.npm && \
    mkdir -p /tmp/builder/modules && \
    echo "Copying latest package.json files to NPM cache folders" && \
    /bin/cp -rf /tmp/packages/* /tmp/builder/modules && \
    cd /tmp/builder/modules && \
    echo "Using NPM cache folders" && \
    npm config set cache /tmp/builder/.npm && \
    if test -f /tmp/builder/.rebuild; then (echo "Installing NPM packages" && npm install --no-fund --no-audit --no-optional --loglevel verbose); fi && \
    echo "copy cached NPM packages" && \
    /bin/cp -rfT /tmp/builder/modules/node_modules /app/node_modules && \
    rm -rf /tmp/builder/packages && \
    mkdir -p /tmp/builder/packages && \
    cd /app && \
    echo "Caching package.json files" && \
    find ./ -name "package.json" -mindepth 0 -maxdepth 6 -exec cp --parents "{}" /tmp/builder/packages/ \; && \
    (rm /tmp/builder/.rebuild 2> /dev/null || true);

Note: I'm only using the node_modules of the root folder, as in my case, all the packages from inner folders are hoisted to the root

Solution 6 - Docker

just use .dockerignore to filter out not needed files. refer this reference

in your cases, add this to your .dockerignore.

    *.js
    any file to skip copy

I assume your files are located like /home/package.json, and want to copy those files to /dest in docker.

Dockerfile would look like this.

COPY /home /dest

this will copy all files to /home directory except list in .dockerignore

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionFez VrastaView Question on Stackoverflow
Solution 1 - DockermbelskyView Answer on Stackoverflow
Solution 2 - Dockerv.karbovnichyView Answer on Stackoverflow
Solution 3 - DockerErikMDView Answer on Stackoverflow
Solution 4 - DockerJoostView Answer on Stackoverflow
Solution 5 - DockerArikView Answer on Stackoverflow
Solution 6 - DockerDarren HaView Answer on Stackoverflow