Equalify Reflow is composed of three services that work together: a conversion engine, a WordPress plugin, and a feedback service. This page covers the conversion engine architecture in detail.
┌─────────────────┐
│ WordPress Site │
│ (reflow-wp) │
└────────┬────────┘
│
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────┐
│ Equalify Reflow API │
│ (FastAPI + Uvicorn) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐│
│ │ Document │ │ Pipeline │ │ Approval ││
│ │ Endpoints │ │ Viewer │ │ Endpoints ││
│ └─────┬────┘ └─────┬────┘ └────────┬─────────┘│
│ │ │ │ │
│ ┌─────┴─────────────┴────────────────┴─────────┐│
│ │ Service Layer ││
│ │ ┌────────────┐ ┌──────────┐ ┌───────────┐ ││
│ │ │ Processing │ │ Storage │ │ Queue │ ││
│ │ │ Service │ │ Service │ │ Service │ ││
│ │ └──────┬─────┘ └─────┬───┘ └─────┬─────┘ ││
│ └─────────┼──────────────┼─────────────┼───────┘│
└────────────┼──────────────┼─────────────┼────────┘
│ │ │
┌────────┼──────────────┼─────────────┼────────┐
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Docling │ │ S3 │ │ Redis │ │
│ │ Serve │ │ │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ Infrastructure │
└──────────────────────────────────────────────┘
System component diagram showing the three-layer architecture: a WordPress site connects to the Equalify Reflow API (FastAPI + Uvicorn), which contains Document, Pipeline Viewer, and Approval endpoint groups. These feed into a shared Service Layer (Processing, Storage, and Queue services), which communicates with the infrastructure layer containing Docling Serve (PDF extraction), S3 (document storage), and Redis (job state and queuing).
The core service. A FastAPI application that accepts PDF uploads, runs the five-stage pipeline, and returns accessible markdown.
Key responsibilities:
Integrates the conversion engine with WordPress. Administrators process PDFs from the Media Library; results are stored as WordPress posts and served through a built-in viewer.
See the WordPress Plugin Guide.
A separate FastAPI + SQLite service that collects issue reports and text corrections from viewers. Provides filtering, aggregation, and a Metabase dashboard for analyzing feedback patterns.
1. PDF uploaded → S3 temp bucket
2. PII scan (Microsoft Presidio)
├─ Pass → queue for processing
└─ Fail → hold for human approval
3. Pipeline processing (5 stages)
└─ Each stage: AI agent processes → edits recorded in change ledger
4. Results stored in S3 results bucket
5. Job marked completed → SSE event emitted
6. Client downloads markdown + figures via pre-signed S3 URLs
Data flow diagram showing the six steps of document processing: PDF upload to S3, PII scanning with pass/fail branching, five-stage pipeline processing with change ledger recording, results storage in S3, job completion notification via SSE, and client download of markdown and figures via pre-signed URLs.
Real-time progress is delivered via Server-Sent Events. The architecture is designed so the pipeline runs independently of client connections:
job_idEventSource can't send headers)Orchestrates the conversion pipeline. Manages the dossier (document context that accumulates through pipeline stages), coordinates AI agents, and records the change ledger.
Wraps S3 operations with circuit breakers and retry logic. Handles upload, download, and pre-signed URL generation for both temp and results buckets.
Redis-based job queuing. Documents are enqueued after PII scanning and dequeued by background workers.
Manages job state in Redis using Lua scripts for atomic operations. Tracks status transitions, stores metadata, and publishes state-change events.
Scans document text using Microsoft Presidio. Detects email addresses, phone numbers, SSNs, and other PII entity types. Configurable confidence threshold.
The pipeline uses PydanticAI to define agents with tool-call interfaces. Each agent:
The pipeline uses Claude Haiku (via AWS Bedrock) for all AI processing steps. Model configuration is managed centrally in src/agents/model_tiers.py.
make dev # Starts everything via Docker Compose
| Service | Port | Purpose |
|---|---|---|
| API Gateway | 8080 | FastAPI application |
| Redis | 6379 | Job state, queues, pub/sub |
| LocalStack | 4566 | S3 emulation |
| Docling Serve | 5001 | PDF extraction sidecar |
| Prometheus | 9090 | Metrics collection |
| Grafana | 3001 | Monitoring dashboards |
| Jaeger | 16686 | Distributed tracing |
terraform//health verifies Redis, S3, and queue connectivity; /health/ready for orchestration probes