1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194---
title: Contributing
date: 2026-03-23
author: Equalify Tech Team
description: Development setup, testing strategy, project structure, and how to contribute to Equalify Reflow.
---
# Contributing
Equalify Reflow is open source under the [AGPL-3.0 license](https://www.gnu.org/licenses/agpl-3.0.en.html). We welcome contributions โ bug fixes, pipeline improvements, documentation, and new integrations.
## Prerequisites
- **Docker** and **Docker Compose** โ all services run in containers
- **make** โ wraps all common operations
- **AWS credentials** (optional) โ needed for AI processing via Bedrock. Without them, you can still run the infrastructure and tests that mock LLM calls
## Development Setup
```bash
# Clone the repo
git clone https://github.com/EqualifyEverything/equalify-reflow.git
cd equalify-reflow
# Copy the example environment file
cp .env.example .env
# Start all services
make dev
```
This starts:
| Service | Port | Description |
|---------|------|-------------|
| API Gateway | `localhost:8080` | FastAPI app with hot reload |
| Redis | `localhost:6379` | Job state, queues, pub/sub |
| LocalStack | `localhost:4566` | S3 emulation |
| Docling Serve | `localhost:5001` | PDF extraction sidecar |
| Prometheus | `localhost:9090` | Metrics |
| Grafana | `localhost:3001` | Dashboards |
| Jaeger | `localhost:16686` | Tracing |
Code is volume-mounted from `src/` into the container. Edits on your host trigger automatic reload โ no rebuild needed.
### Useful Commands
```bash
make dev # Start all services
make down # Stop all services
make logs # View all logs
make logs-api # View API logs only
make shell # Shell into the API container
make redis-cli # Connect to Redis CLI
make health # Verify infrastructure health
```
**Important:** Do not run `python`, `pytest`, or `uv` directly on your host. Everything runs inside Docker:
```bash
make test-fast # Run unit tests (~30s)
make test-integration # Run integration tests (~2min)
make test-e2e # Run end-to-end tests (~5min)
make coverage # Tests with coverage report
```
## Project Structure
```
src/
โโโ main.py # FastAPI app entry point + worker startup
โโโ config.py # Settings from environment variables
โโโ dependencies.py # Dependency injection (S3, Redis, services)
โโโ api/ # REST endpoints
โ โโโ documents.py # Document submission and status
โ โโโ pipeline_viewer.py # Pipeline viewer
โ โโโ approval.py # PII approval workflow
โโโ workers/ # Background task processors
โ โโโ pii_worker.py # PII detection queue consumer
โ โโโ timeout_worker.py # Approval timeout checks
โโโ services/ # Business logic
โ โโโ pipeline_viewer.py # 5-stage pipeline orchestration
โ โโโ document_processing.py # Job lifecycle management
โ โโโ storage.py # S3 with circuit breakers
โ โโโ queue.py # Redis queue operations
โ โโโ job.py # Job state (Lua scripts)
โ โโโ pii_detection.py # Presidio integration
โโโ agents/ # AI pipeline
โ โโโ orchestrator.py # Pipeline orchestration + dossier
โ โโโ dossier.py # Document context model
โ โโโ shared_prompts.py # Reusable prompt fragments
โ โโโ model_tiers.py # Model selection (Sonnet/Haiku)
โ โโโ worker.py # Per-page content correction agent
โ โโโ paragraph_agent.py # Sub-agent orchestration
โ โโโ recovery.py # Error recovery agent
โ โโโ critic.py # Verification agent
โ โโโ document_worker.py # Cross-page assembly agent
โ โโโ prompts/ # Stage-specific prompt modules
โ โโโ structure_analysis.py
โ โโโ heading_reconciliation.py
โ โโโ boundary_fix.py
โ โโโ footnote_relocation.py
โ โโโ revision.py
โโโ middleware/ # HTTP middleware
โ โโโ auth.py # API key + docs auth
โ โโโ rate_limit.py # Per-IP rate limiting
โ โโโ metrics.py # Prometheus instrumentation
โโโ shared/ # Constants and shared models
โโโ utils/ # Helpers (retry, circuit breaker, tokens)
```
## Testing
The project uses a three-tier testing strategy:
### Unit Tests (`make test-fast`)
Fast, isolated tests that mock external dependencies (S3, Redis, LLM calls). Run these before every commit.
```bash
make test-fast
```
Key patterns:
- **PydanticAI agents** โ mock `_get_*_agent` or `_get_*_subagent` factories to avoid real LLM calls
- **Circuit breakers** โ use `reset_llm_circuit_breaker()` in `autouse=True` fixtures to prevent state leakage between tests
- **Conditional tools** โ test `prepare` functions return `ToolDefinition` or `None` based on task type
### Integration Tests (`make test-integration`)
Tests that exercise real service interactions (Redis, S3 via LocalStack) but still mock LLM calls. Run before PRs.
```bash
make test-integration
```
### End-to-End Tests (`make test-e2e`)
Full pipeline tests with real documents. Requires AWS credentials for Bedrock. Run before merges.
```bash
make test-e2e
```
### Test Markers
Tests are tagged with pytest markers:
```python
@pytest.mark.unit # Unit test (mocked dependencies)
@pytest.mark.integration # Needs Redis + S3
@pytest.mark.e2e # Full pipeline, real LLM calls
@pytest.mark.slow # Takes >10 seconds
```
## Development Workflow
1. **Create a feature branch** from `main`
2. **Start services** with `make dev`
3. **Edit code** in `src/` โ changes auto-reload in the container
4. **Run unit tests** with `make test-fast` for quick feedback
5. **Test manually** via the viewer at `http://localhost:8080/viewer` or the API at `http://localhost:8080/docs`
6. **Run integration tests** with `make test-integration` before opening a PR
7. **Open a pull request** against `main`
## Adding a Pipeline Stage
Pipeline stages are defined in `src/services/pipeline_viewer.py`. Each stage:
1. Receives the current markdown and document context (the dossier)
2. Makes modifications
3. Returns an updated version
To add a new stage:
1. Create a prompt module in `src/agents/prompts/` defining the agent's instructions
2. Add the step method to `PipelineViewerService` in `src/services/pipeline_viewer.py`
3. Wire it into the pipeline sequence in the `process()` method
4. Add a version bump if the stage produces a new document version
5. Write unit tests mocking the LLM calls
6. Update the viewer stage groupings in `clients/viewer/src/components/pipeline-viewer/StageTabs.tsx`
## Code Style
- **Python 3.11+** with type hints
- **Pydantic** models for all data structures
- **Async/await** throughout โ the FastAPI app is fully asynchronous
- **No direct host execution** โ all code runs in Docker containers
## Getting Help
- Open an issue on [GitHub](https://github.com/EqualifyEverything/equalify-reflow/issues)
- For partner support, contact [Blake Bertuccelli-Booth](mailto:b3b@uic.edu)