docker data volume vs mounted host directory

Docker

Docker Problem Overview


We can have a data volume in docker:

$ docker run -v /path/to/data/in/container --name test_container debian
$ docker inspect test_container
...
Mounts": [
    {
        "Name": "fac362...80535",
        "Source": "/var/lib/docker/volumes/fac362...80535/_data",
        "Destination": "/path/to/data/in/container",
        "Driver": "local",
        "Mode": "",
        "RW": true
    }
]
...

But if the data volume lives in /var/lib/docker/volumes/fac362...80535/_data, is it any different from having the data in a folder mounted using -v /path/to/data/in/container:/home/user/a_good_place_to_have_data?

Docker Solutions


Solution 1 - Docker

Although using volumes and bind mounts feels the same (with the only change being the location of the directory), there are differences in behavior.

Volumes vs Bind Mounts

  • With Bind Mount, a file or directory on the host machine is mounted into a container. The file or directory is referenced by its full or relative path on the host machine.
  • With Volume, a new directory is created within Docker's storage directory on the host machine, and Docker manages that directory's content.

Volumes advantages over bind mounts:

  • Volumes are easier to back up or migrate than bind mounts.
  • You can manage volumes using Docker CLI commands or the Docker API.
  • Volumes work on both Linux and Windows containers.
  • Volumes can be more safely shared among multiple containers.
  • Volume drivers allow you to store volumes on remote hosts or cloud providers, to encrypt the contents of volumes, or to add other functionality.
  • A new volume’s contents can be pre-populated by a container.

EDIT (9.9.2019):
According to @Sebi2020 comment, Bind mounts are much easier to backup. Docker doesn't provide any command to backup volumes. You have to use temporary containers with a bind mount to create backups.

Volumes

> Created and managed by Docker. You can create a volume explicitly > using the docker volume create command, or Docker can create a volume > during container or service creation.
> > When you create a volume, it is stored within a directory on the > Docker host. When you mount the volume into a container, this > directory is what is mounted into the container. This is similar to > the way that bind mounts work, except that volumes are managed by > Docker and are isolated from the core functionality of the host > machine.
> > A given volume can be mounted into multiple containers simultaneously. > When no running container is using a volume, the volume is still > available to Docker and is not removed automatically. You can remove > unused volumes using docker volume prune.
> > When you mount a volume, it may be named or anonymous. Anonymous > volumes are not given an explicit name when they are first mounted > into a container, so Docker gives them a random name that is > guaranteed to be unique within a given Docker host. Besides the name, > named and anonymous volumes behave in the same ways.
> > Volumes also support the use of volume drivers, which allow you to > store your data on remote hosts or cloud providers, among other > possibilities.

enter image description here

Bind mounts

> Available since the early days of Docker. Bind mounts have limited > functionality compared to volumes. When you use a bind mount, a file > or directory on the host machine is mounted into a container. The file > or directory is referenced by its full path on the host machine. The > file or directory does not need to exist on the Docker host already. > It is created on demand if it does not yet exist. Bind mounts are very > performant, but they rely on the host machine’s filesystem having a > specific directory structure available. If you are developing new > Docker applications, consider using named volumes instead. You can’t > use Docker CLI commands to directly manage bind mounts.

enter image description here

There is also tmpfs mounts.
tmpfs mounts

> A tmpfs mount is not persisted on disk, either on the Docker host or > within a container. It can be used by a container during the lifetime > of the container, to store non-persistent state or sensitive > information. For instance, internally, swarm services use tmpfs mounts > to mount secrets into a service’s containers.
enter image description here

Reference:
https://docs.docker.com/storage/

Solution 2 - Docker

> is it any different from having the data in a folder mounted using -v /path/to/data/in/container:/home/user/a_good_place_to_have_data?

It is because, as mentioned in "Mount a host directory as a data volume"

> The host directory is, by its nature, host-dependent. For this reason, you can’t mount a host directory from Dockerfile because built images should be portable. A host directory wouldn’t be available on all potential hosts. > > If you have some persistent data that you want to share between containers, or want to use from non-persistent containers, it’s best to create a named Data Volume Container, and then to mount the data from it.

You can combine both approaches:

 docker run --volumes-from dbdata -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /dbdata

> Here we’ve launched a new container and mounted the volume from the dbdata container.
We’ve then mounted a local host directory as /backup.
Finally, we’ve passed a command that uses tar to backup the contents of the dbdata volume to a backup.tar file inside our /backup directory. When the command completes and the container stops we’ll be left with a backup of our dbdata volume.

Solution 3 - Docker

Yes, this is quite different from a few perspectives. Like you wrote in the question's title, it is about understanding why we need data volumes vs bind mount to host.

Part 1 - Basic scenarios with examples

Lets take 2 scenarios.

Case 1: Web server
We want to provide our web server a configuration file that might change frequently.
For example: exposing ports according to the current environment.
We can rebuild the image each time with the relevant setup or create 2 different images for each environment. Both of this solutions aren’t very efficient.

With Bind mounts Docker mounts the given source directory into a location inside the container.
(The original directory / file in the read-only layer inside the union file system will simply be overridden).

For example - binding a dynamic port to nginx:

version: "3.7"
services:
  web:
    image: nginx:alpine
    volumes:
     - type: bind #<-----Notice the type
       source: ./mysite.template
       target: /etc/nginx/conf.d/mysite.template
    ports:
     - "9090:8080"
    environment:
     - PORT=8080
    command: /bin/sh -c "envsubst < /etc/nginx/conf.d/mysite.template > 
        /etc/nginx/conf.d/default.conf && exec nginx -g 'daemon off;'"

(*) Notice that this example could also be solved using Volumes.

Case 2 : Databases
Docker containers do not store persistent data -- any data that will be written to the writable layer in container’s union file system will be lost once the container stops running.

But what if we have a database running on a container, and the container stops - that means that all the data will be lost?

Volumes to the rescue.
Those are named file system trees which are managed for us by Docker.

For example - persisting Postgres SQL data:

services:    
  db:
    image: postgres:latest
    volumes:
      - "dbdata:/var/lib/postgresql/data"
    volumes:
     - type: volume #<-----Notice the type
       source: dbdata
       target: /var/lib/postgresql/data
volumes:
  dbdata:

Notice that in this case, for named volumes, the source is the name of the volume (for anonymous volumes, this field is omitted).

Part 2 - Comparison

Differences in management and isolation on the host

Bind mounts exist on the host file system and being managed by the host maintainer.
Applications / processes outside of Docker can also modify it.

Volumes can also be implemented on the host, but Docker will manage them for us and they can not be accessed outside of Docker.

Volumes are a much wider solution

Although both solutions help us to separate the data lifecycle from containers, by using Volumes you gain much more power and flexibility over your system.

With Volumes we can design our data effectively and decouple it from other parts of the system by storing it in dedicated remote locations (e.g., in the cloud) and integrate it with external services like backups, monitoring, encryption and hardware management.

Solution 4 - Docker

The difference between host directory and a data volume is in that that Docker manages the latter by placing it into the $DOCKER-DATA-DIR/volumes directory and attaching a reference to it (names or randomly generated ids). That is you get a little bit of convenience.

Both host directories and data volumes are directories on the host. Both are host dependent. You can't reference either of them in a Dockerfile; the VOLUME directive creates a new nameless (with randomly generated id) volume every time you launch a new container and cannot reference an existing volume.

* $DOCKER-DATA-DIR is /var/lib/docker here unless you changed the defaults.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionkoddoView Question on Stackoverflow
Solution 1 - DockerE235View Answer on Stackoverflow
Solution 2 - DockerVonCView Answer on Stackoverflow
Solution 3 - DockerRtmYView Answer on Stackoverflow
Solution 4 - DockergolemView Answer on Stackoverflow