fix(health): make /health/ready a strict readiness probe (#133)

The /health/ready endpoint previously hardcoded a 200 response, which meant orchestrators using it as a readiness probe never saw the api report unready even when document processing was broken. /health itself was tolerant by design (docling-serve cold-start takes minutes; failing liveness during that window caused restart loops), but that left no endpoint that returned 503 when docling-serve was permanently down — so ALBs and k8s pods would keep routing traffic to instances that couldn't do their job. Now: - /health remains tolerant. 200 healthy/degraded when core stores (Redis, S3, queue) are up, 503 only when a core store is down. Stays appropriate for liveness probes that should tolerate transient docling-serve outages. - /health/ready becomes strict. Returns 200 only when every dependency the pipeline needs (Redis, S3, queue, docling-serve) is reachable; 503 the moment any of them isn't. Suitable for orchestrator readiness probes (Kubernetes readinessProbe, ECS ALB target groups). - Module docstring + both endpoint docstrings explain the deliberate split so operators know which one to wire up where. - The stale "circuit breaker handles it at request level" comment is gone; the retry behaviour is real but only absorbs transient outages, not permanent ones — which is exactly what /health/ready now surfaces. Integration tests cover all four states for each endpoint. Added the pytestmark = pytest.mark.integration marker that was missing on the file (the original tests were being silently deselected by make test-integration's -m integration filter). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Blake Bertuccelli-Booth committed on May 15, 2026, 06:51 PM

Showing 2 changed files +152 additions -88 deletions

M src/api/health.py +75 -37

M tests/integration/test_health.py +77 -51