Docker Health Checks: Keep Containers Self-Healing
Add a HEALTHCHECK to your Dockerfile so Docker knows when a container is ready and when it has gone wrong — and can restart it automatically.
A container can be running without the process inside it actually working. The web server is up but deadlocked. The database started but has not finished initialising. Docker's HEALTHCHECK instruction gives you a way to define what "healthy" means, so orchestrators and Compose can make informed decisions instead of just assuming a running container is a working one.
Add a healthcheck in the Dockerfile
FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci
EXPOSE 3000
CMD ["node", "server.js"]
HEALTHCHECK --interval=30s --timeout=5s --start-period=15s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1
What each option does:
--interval=30s— how often to run the check. Default is 30 seconds.--timeout=5s— how long to wait for the check command to return before counting it as a failure. Default is 30 seconds — much too long for most services.--start-period=15s— grace period after the container starts before failures count against the retry limit. Gives slow-starting services time to initialise without being killed.--retries=3— how many consecutive failures before the status becomesunhealthy. Default is 3.
The CMD can be any shell command. Exit code 0 means healthy, anything else means unhealthy.
Check container health status
docker ps
The STATUS column shows (healthy), (unhealthy), or (health: starting) alongside the normal uptime.
More detail:
docker inspect --format='{{json .State.Health}}' container_name | jq
This shows the last few check results, their exit codes, and the output from each run — useful for debugging why a healthcheck is failing.
Define a healthcheck in Docker Compose
services:
api:
build: .
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 5s
start_period: 15s
retries: 3
db:
image: postgres:16-alpine
environment:
POSTGRES_USER: app
POSTGRES_PASSWORD: secret
POSTGRES_DB: mydb
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d mydb"]
interval: 5s
timeout: 3s
retries: 5
Compose healthchecks override the one in the Dockerfile. Use this to tune intervals per environment without rebuilding the image.
depends_on with health conditions
This is where healthchecks become essential in Compose:
services:
api:
build: .
depends_on:
db:
condition: service_healthy
db:
image: postgres:16-alpine
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d mydb"]
interval: 5s
retries: 5
condition: service_healthy holds the api container until db is actually accepting connections — not just until it has started. Without this, the API process tries to connect to Postgres before it is ready and crashes on the first database call.
What to expose in the /health endpoint
A health endpoint should return a non-200 status code when the service cannot serve traffic. What counts as "cannot serve" depends on the application:
// Express example
app.get('/health', async (req, res) => {
try {
await db.query('SELECT 1'); // verify database connectivity
res.status(200).json({ status: 'ok' });
} catch (err) {
res.status(503).json({ status: 'error', detail: err.message });
}
});
Keep the health endpoint cheap — it runs every 30 seconds. Do not trigger full business logic or external API calls. Check the things that, if broken, mean the container is genuinely unable to handle requests: database connectivity, cache availability, critical config being present.
Restart policy and health
Set a restart policy to automatically recover an unhealthy container:
services:
api:
build: .
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 5s
retries: 3
With restart: unless-stopped and a healthcheck, a container that becomes unhealthy will be restarted automatically. The combination covers both crash recovery (process exits) and degradation recovery (process is running but broken).
Disable an inherited healthcheck
If a base image includes a healthcheck and you want to remove it:
HEALTHCHECK NONE
Useful when the base image's check does not apply to your use case, or when you are defining the check in Compose instead.
SysEmperor