fix(health): make /health/ready a strict readiness probe (#133)
The /health/ready endpoint previously hardcoded a 200 response, which
meant orchestrators using it as a readiness probe never saw the api
report unready even when document processing was broken. /health itself
was tolerant by design (docling-serve cold-start takes minutes; failing
liveness during that window caused restart loops), but that left no
endpoint that returned 503 when docling-serve was permanently down — so
ALBs and k8s pods would keep routing traffic to instances that couldn't
do their job.
Now:
- /health remains tolerant. 200 healthy/degraded when core stores
(Redis, S3, queue) are up, 503 only when a core store is down. Stays
appropriate for liveness probes that should tolerate transient
docling-serve outages.
- /health/ready becomes strict. Returns 200 only when every dependency
the pipeline needs (Redis, S3, queue, docling-serve) is reachable;
503 the moment any of them isn't. Suitable for orchestrator
readiness probes (Kubernetes readinessProbe, ECS ALB target groups).
- Module docstring + both endpoint docstrings explain the deliberate
split so operators know which one to wire up where.
- The stale "circuit breaker handles it at request level" comment is
gone; the retry behaviour is real but only absorbs transient
outages, not permanent ones — which is exactly what /health/ready
now surfaces.
Integration tests cover all four states for each endpoint. Added the
pytestmark = pytest.mark.integration marker that was missing on the
file (the original tests were being silently deselected by make
test-integration's -m integration filter).
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>