News
🛠️ DevOps Tutorials Docker Health Checks: Keep Containers Self-Healing

Docker Health Checks: Keep Containers Self-Healing

Add a HEALTHCHECK to your Dockerfile so Docker knows when a container is ready and when it has gone wrong — and can restart it automatically.

A container can be running without the process inside it actually working. The web server is up but deadlocked. The database started but has not finished initialising. Docker's HEALTHCHECK instruction gives you a way to define what "healthy" means, so orchestrators and Compose can make informed decisions instead of just assuming a running container is a working one.


Add a healthcheck in the Dockerfile

FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci
EXPOSE 3000
CMD ["node", "server.js"]

HEALTHCHECK --interval=30s --timeout=5s --start-period=15s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

What each option does:

  • --interval=30s — how often to run the check. Default is 30 seconds.
  • --timeout=5s — how long to wait for the check command to return before counting it as a failure. Default is 30 seconds — much too long for most services.
  • --start-period=15s — grace period after the container starts before failures count against the retry limit. Gives slow-starting services time to initialise without being killed.
  • --retries=3 — how many consecutive failures before the status becomes unhealthy. Default is 3.

The CMD can be any shell command. Exit code 0 means healthy, anything else means unhealthy.


Check container health status

docker ps

The STATUS column shows (healthy), (unhealthy), or (health: starting) alongside the normal uptime.

More detail:

docker inspect --format='{{json .State.Health}}' container_name | jq

This shows the last few check results, their exit codes, and the output from each run — useful for debugging why a healthcheck is failing.


Define a healthcheck in Docker Compose

services:
  api:
    build: .
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 5s
      start_period: 15s
      retries: 3

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: app
      POSTGRES_PASSWORD: secret
      POSTGRES_DB: mydb
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app -d mydb"]
      interval: 5s
      timeout: 3s
      retries: 5

Compose healthchecks override the one in the Dockerfile. Use this to tune intervals per environment without rebuilding the image.


depends_on with health conditions

This is where healthchecks become essential in Compose:

services:
  api:
    build: .
    depends_on:
      db:
        condition: service_healthy

  db:
    image: postgres:16-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app -d mydb"]
      interval: 5s
      retries: 5

condition: service_healthy holds the api container until db is actually accepting connections — not just until it has started. Without this, the API process tries to connect to Postgres before it is ready and crashes on the first database call.


What to expose in the /health endpoint

A health endpoint should return a non-200 status code when the service cannot serve traffic. What counts as "cannot serve" depends on the application:

// Express example
app.get('/health', async (req, res) => {
  try {
    await db.query('SELECT 1');  // verify database connectivity
    res.status(200).json({ status: 'ok' });
  } catch (err) {
    res.status(503).json({ status: 'error', detail: err.message });
  }
});

Keep the health endpoint cheap — it runs every 30 seconds. Do not trigger full business logic or external API calls. Check the things that, if broken, mean the container is genuinely unable to handle requests: database connectivity, cache availability, critical config being present.


Restart policy and health

Set a restart policy to automatically recover an unhealthy container:

services:
  api:
    build: .
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 5s
      retries: 3

With restart: unless-stopped and a healthcheck, a container that becomes unhealthy will be restarted automatically. The combination covers both crash recovery (process exits) and degradation recovery (process is running but broken).


Disable an inherited healthcheck

If a base image includes a healthcheck and you want to remove it:

HEALTHCHECK NONE

Useful when the base image's check does not apply to your use case, or when you are defining the check in Compose instead.