Restarting an unhealthy docker container based on healthcheck

DockerHealth Monitoring

Docker Problem Overview


I am using Docker version 17.09.0-ce, and I see that containers are marked as unhealthy. Is there an option to get the container restart instead of keeping the container as unhealthy?

Docker Solutions


Solution 1 - Docker

Restarting of unhealty container feature was in the original PR (https://github.com/moby/moby/pull/22719), but was removed after a discussion and considered to be done later as enhancement of RestartPolicy.

At this moment you can use this workaround to automatically restarting unhealty containers: https://hub.docker.com/r/willfarrell/autoheal/

Here is a sample compose file:

version: '2'
services:
  autoheal:
    restart: always
    image: willfarrell/autoheal
    environment:
      - AUTOHEAL_CONTAINER_LABEL=all
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

Simply execute docker-compose up -d on this

Solution 2 - Docker

You can restart automatically an unhealthy container by setting a smart HEALTHCHECK and a proper restart policy.

The Docker restart policy should be one of always or unless-stopped.

The HEALTHCHECK instead should implement a logic that kills the container when it's unhealthy.

In the following example I used curl with its internal retry mechanism and piped it (in case of failure/service unhealthy) to the kill command.

HEALTHCHECK --interval=5m --timeout=2m --start-period=45s \
   CMD curl -f --retry 6 --max-time 5 --retry-delay 10 --retry-max-time 60 "http://localhost:8080/health" || bash -c 'kill -s 15 -1 && (sleep 10; kill -s 9 -1)'

The important step to understand here is that the retry logic is self-contained in the curl command, the Docker retry here actually is mandatory but useless. Then if the curl HTTP request fails 3 times, then kill is executed. First it sends a SIGTERM to all the processes in the container, to allow them to gracefully stop, then after 10 seconds it sends a SIGKILL to completely kill all the processes in the container. It must be noted that when the PID1 of a container dies, then the container itself dies and the restart policy is invoked.

Gotchas: kill behaves differently in bash than in sh. In bash you can use -1 to signal all the processes with PID greater than 1 to die.

Solution 3 - Docker

For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.

Solution 4 - Docker

You can try put in your Dockerfile something like this:

HEALTHCHECK --interval=5s --timeout=2s CMD curl --fail http://localhost || kill 1

Don't forget --restart always option.

kill 1 will kill process with pid 1 in container and force container exit. Usually the process started by CMD or ENTRYPOINT has pid 1.

Unfortunally, this method likely don't change container's state to unhealthy, so be careful with it.

Solution 5 - Docker

Docker has a couple of ways to get details on container health. You can configure health checks and how often they run. Also, health checks can be run on applications running inside a container, like http (this would use curl --fail option.) You can view the health_status event to get details.

For detailed information on an unhealthy container the inspect command comes in handy, docker inspect --format='{{json .State.Health}}' container-name (see https://blog.newrelic.com/2016/08/24/docker-health-check-instruction/ for more details.)

You should resolve the error condition causing the "unhealthy" tag (anytime the health check command runs and gets an exit code of 1) first. This may or may not require that Docker restart the container, depending on the error. If you are starting/restarting your containers automatically, then either trapping the start errors or logging them and the health check status can help address errors quickly. Check the link if you are interested in auto start.

Solution 6 - Docker

According to https://codeblog.dotsandbrackets.com/docker-health-check/

Create container and add " restart: always".

In the use of healthcheck, pay attention to the following points:

For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGovind KailasView Question on Stackoverflow
Solution 1 - DockerNavrockyView Answer on Stackoverflow
Solution 2 - DockerNaramsimView Answer on Stackoverflow
Solution 3 - DockerFreaxView Answer on Stackoverflow
Solution 4 - DockerWhatView Answer on Stackoverflow
Solution 5 - DockerLea KleinView Answer on Stackoverflow
Solution 6 - DockerFreaxView Answer on Stackoverflow