Docker COPY files using glob pattern?
DockerDockerfileYarnpkgDocker Problem Overview
I have a monorepo managed by Yarn, I'd like to take advantage of the Docker cache layers to speed up my builds, to do so I'd like to first copy the package.json
and yarn.lock
files, run yarn install
and then copy the rest of the files.
This is my repo structure:
packages/one/package.json
packages/one/index.js
packages/two/package.json
packages/two/index.js
package.json
yarn.lock
And this is the interested part of the Dockerfile:
COPY package.json .
COPY yarn.lock .
COPY packages/**/package.json ./
RUN yarn install --pure-lockfile
COPY . .
The problem is that the 3rd COPY
command doesn't copy anything, how can I achieve the expected result?
Docker Solutions
Solution 1 - Docker
There is a solution based on multistage-build feature:
FROM node:12.18.2-alpine3.11
WORKDIR /app
COPY ["package.json", "yarn.lock", "./"]
# Step 2: Copy whole app
COPY packages packages
# Step 3: Find and remove non-package.json files
RUN find packages \! -name "package.json" -mindepth 2 -maxdepth 2 -print | xargs rm -rf
# Step 4: Define second build stage
FROM node:12.18.2-alpine3.11
WORKDIR /app
# Step 5: Copy files from the first build stage.
COPY --from=0 /app .
RUN yarn install --frozen-lockfile
COPY . .
# To restore workspaces symlinks
RUN yarn install --frozen-lockfile
CMD yarn start
On Step 5
the layer cache will be reused even if any file in packages
directory has changed.
Solution 2 - Docker
As mentioned in the official Dockerfile reference for COPY <src> <dest>
>The COPY instruction copies new files or directories from <src>
and adds them to the filesystem of the container at the path <dest>
.
For your case
>Each filepath.Match
rules.
These are the rules. They contain this:
> '*' matches any sequence of non-Separator characters
So try to use *
instead of **
in your pattern.
Solution 3 - Docker
If you can't technically enumerate all the subdirectories at stake in the Dockerfile (namely, writing COPY packages/one/package.json packages/one/
for each one), but want to copy all the files in two steps and take advantage of Docker's caching feature, you can try the following workaround:
- Devise a wrapper script (say, in bash) that copies the required
package.json
files to a separate directory (say,.deps/
) built with a similar hierarchy, then calldocker build …
- Adapt the Dockerfile to copy (and rename) the separate directory beforehand, and then call
yarn install --pure-lockfile
…
All things put together, this could lead to the following files:
./build.bash:
#!/bin/bash
tag="image-name:latest"
rm -f -r .deps # optional, to be sure that there is
# no extraneous "package.json" from a previous build
find . -type d \( -path \*/.deps \) -prune -o \
-type f \( -name "package.json" \) \
-exec bash -c 'dest=".deps/$1" && \
mkdir -p -- "$(dirname "$dest")" && \
cp -av -- "$1" "$dest"' bash '{}' \;
# instead of mkdir + cp, you may also want to use
# rsync if it is available in your environment...
sudo docker build -t "$tag" .
and
./Dockerfile:
FROM …
WORKDIR /usr/src/app
# COPY package.json . # subsumed by the following command
COPY .deps .
# and not "COPY .deps .deps", to avoid doing an extra "mv"
COPY yarn.lock .
RUN yarn install --pure-lockfile
COPY . .
# Notice that "COPY . ." will also copy the ".deps" folder; this is
# maybe a minor issue, but it could be avoided by passing more explicit
# paths than just "." (or by adapting the Dockerfile and the script and
# putting them in the parent folder of the Yarn application itself...)
Solution 4 - Docker
Using Docker's new BuildKit executor it has become possible to use a bind mount into the Docker context, from which you can then copy any files as needed.
For example, the following snippet copies all package.json files from the Docker context into the image's /app/
directory (the workdir in the below example)
Unfortunately, changing any file in the mount still results in a layer cache miss. This can be worked around using the multi-stage approach as presented by @mbelsky, but this time the explicit deletion is no longer need.
# syntax = docker/dockerfile:1.2
FROM ... AS packages
WORKDIR /app/
RUN --mount=type=bind,target=/docker-context \
cd /docker-context/; \
find . -name "package.json" -mindepth 0 -maxdepth 4 -exec cp --parents "{}" /app/ \;
FROM ...
WORKDIR /app/
COPY --from=packages /app/ .
The mindepth
/maxdepth
arguments are specified to reduce the number of directories to search, this can be adjusted/removed as desirable for your use-case.
It may be necessary to enable the BuildKit executor using environment variable DOCKER_BUILDKIT=1
, as the traditional executor silently ignores the bind mounts.
More information about BuildKit and bind bounds can be found here.
Solution 5 - Docker
Following @Joost suggestion, I've created a dockerfile
that utilizes the power of BuildKit to achieve the following:
- Faster
npm install
by moving npm's cache directory to the build cache - Skipping
npm install
if nothing changed inpackage.json
files since last successful build
Pseudo Code:
- Get all
package.json
files from the build context - Compare them to the
package.json
files from the last successful build - If changes were found, run
npm install
and cache thepackage.json
files +node_modules
folder - Copy the
node_modules
(fresh or cached) to the desired location in the image
# syntax = docker/dockerfile:1.2
FROM node:14-alpine AS builder
# https://github.com/opencollective/opencollective/issues/1443
RUN apk add --no-cache ncurses
# must run as root
RUN npm config set unsafe-perm true
WORKDIR /app
# get a temporary copy of the package.json files from the build context
RUN --mount=id=website-packages,type=bind,target=/tmp/builder \
cd /tmp/builder/ && \
mkdir /tmp/packages && \
chown 1000:1000 /tmp/packages && \
find ./ -name "package.json" -mindepth 0 -maxdepth 6 -exec cp --parents "{}" /tmp/packages/ \;
# check if package.json files were changed since the last successful build
RUN --mount=id=website-build-cache,type=cache,target=/tmp/builder,uid=1000 \
mkdir -p /tmp/builder/packages && \
cd /tmp/builder/packages && \
(diff -qr ./ /tmp/packages/ || (touch /tmp/builder/.rebuild && echo "Found an updated package.json"));
USER node
COPY --chown=node:node . /app
# run `npm install` if package.json files were changed, or use the cached node_modules/
RUN --mount=id=website-build-cache,type=cache,target=/tmp/builder,uid=1000 \
echo "Creating NPM cache folders" && \
mkdir -p /tmp/builder/.npm && \
mkdir -p /tmp/builder/modules && \
echo "Copying latest package.json files to NPM cache folders" && \
/bin/cp -rf /tmp/packages/* /tmp/builder/modules && \
cd /tmp/builder/modules && \
echo "Using NPM cache folders" && \
npm config set cache /tmp/builder/.npm && \
if test -f /tmp/builder/.rebuild; then (echo "Installing NPM packages" && npm install --no-fund --no-audit --no-optional --loglevel verbose); fi && \
echo "copy cached NPM packages" && \
/bin/cp -rfT /tmp/builder/modules/node_modules /app/node_modules && \
rm -rf /tmp/builder/packages && \
mkdir -p /tmp/builder/packages && \
cd /app && \
echo "Caching package.json files" && \
find ./ -name "package.json" -mindepth 0 -maxdepth 6 -exec cp --parents "{}" /tmp/builder/packages/ \; && \
(rm /tmp/builder/.rebuild 2> /dev/null || true);
Note:
I'm only using the node_modules
of the root folder, as in my case, all the packages from inner folders are hoisted to the root
Solution 6 - Docker
just use .dockerignore
to filter out not needed files. refer this reference
in your cases, add this to your .dockerignore.
*.js
any file to skip copy
I assume your files are located like /home/package.json
, and want to copy those files to /dest
in docker.
Dockerfile would look like this.
COPY /home /dest
this will copy all files to /home directory except list in .dockerignore