๐Ÿ“ฆ EqualifyEverything / equalify-docs

๐Ÿ“„ contributing.md ยท 195 lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195---
title: Contributing
date: 2026-03-23
author: Equalify Tech Team
description: Development setup, testing strategy, project structure, and how to contribute to Equalify Reflow.
---

# Contributing

Equalify Reflow is open source under the [AGPL-3.0 license](https://www.gnu.org/licenses/agpl-3.0.en.html). We welcome contributions โ€” bug fixes, pipeline improvements, documentation, and new integrations.

## Prerequisites

- **Docker** and **Docker Compose** โ€” all services run in containers
- **make** โ€” wraps all common operations
- **AWS credentials** (optional) โ€” needed for AI processing via Bedrock. Without them, you can still run the infrastructure and tests that mock LLM calls

## Development Setup

```bash
# Clone the repo
git clone https://github.com/EqualifyEverything/equalify-reflow.git
cd equalify-reflow

# Copy the example environment file
cp .env.example .env

# Start all services
make dev
```

This starts:

| Service | Port | Description |
|---------|------|-------------|
| API Gateway | `localhost:8080` | FastAPI app with hot reload |
| Redis | `localhost:6379` | Job state, queues, pub/sub |
| LocalStack | `localhost:4566` | S3 emulation |
| Docling Serve | `localhost:5001` | PDF extraction sidecar |
| Prometheus | `localhost:9090` | Metrics |
| Grafana | `localhost:3001` | Dashboards |
| Jaeger | `localhost:16686` | Tracing |

Code is volume-mounted from `src/` into the container. Edits on your host trigger automatic reload โ€” no rebuild needed.

### Useful Commands

```bash
make dev              # Start all services
make down             # Stop all services
make logs             # View all logs
make logs-api         # View API logs only
make shell            # Shell into the API container
make redis-cli        # Connect to Redis CLI
make health           # Verify infrastructure health
```

**Important:** Do not run `python`, `pytest`, or `uv` directly on your host. Everything runs inside Docker:

```bash
make test-fast        # Run unit tests (~30s)
make test-integration # Run integration tests (~2min)
make test-e2e         # Run end-to-end tests (~5min)
make coverage         # Tests with coverage report
```

## Project Structure

```
src/
โ”œโ”€โ”€ main.py                    # FastAPI app entry point + worker startup
โ”œโ”€โ”€ config.py                  # Settings from environment variables
โ”œโ”€โ”€ dependencies.py            # Dependency injection (S3, Redis, services)
โ”œโ”€โ”€ api/                       # REST endpoints
โ”‚   โ”œโ”€โ”€ documents.py           # Document submission and status
โ”‚   โ”œโ”€โ”€ pipeline_viewer.py     # Pipeline viewer (dev tool)
โ”‚   โ”œโ”€โ”€ pipeline_feedback.py   # Feedback and review sessions
โ”‚   โ””โ”€โ”€ approval.py            # PII approval workflow
โ”œโ”€โ”€ workers/                   # Background task processors
โ”‚   โ”œโ”€โ”€ pii_worker.py          # PII detection queue consumer
โ”‚   โ””โ”€โ”€ timeout_worker.py      # Approval timeout checks
โ”œโ”€โ”€ services/                  # Business logic
โ”‚   โ”œโ”€โ”€ pipeline_viewer.py     # 7-stage pipeline orchestration
โ”‚   โ”œโ”€โ”€ document_processing.py # Job lifecycle management
โ”‚   โ”œโ”€โ”€ storage.py             # S3 with circuit breakers
โ”‚   โ”œโ”€โ”€ queue.py               # Redis queue operations
โ”‚   โ”œโ”€โ”€ job.py                 # Job state (Lua scripts)
โ”‚   โ””โ”€โ”€ pii_detection.py       # Presidio integration
โ”œโ”€โ”€ agents/                    # AI pipeline
โ”‚   โ”œโ”€โ”€ orchestrator.py        # Pipeline orchestration + dossier
โ”‚   โ”œโ”€โ”€ dossier.py             # Document context model
โ”‚   โ”œโ”€โ”€ shared_prompts.py      # Reusable prompt fragments
โ”‚   โ”œโ”€โ”€ model_tiers.py         # Model selection (Sonnet/Haiku)
โ”‚   โ”œโ”€โ”€ worker.py              # Per-page content correction agent
โ”‚   โ”œโ”€โ”€ paragraph_agent.py     # Sub-agent orchestration
โ”‚   โ”œโ”€โ”€ recovery.py            # Error recovery agent
โ”‚   โ”œโ”€โ”€ critic.py              # Verification agent
โ”‚   โ”œโ”€โ”€ document_worker.py     # Cross-page assembly agent
โ”‚   โ””โ”€โ”€ prompts/               # Stage-specific prompt modules
โ”‚       โ”œโ”€โ”€ structure_analysis.py
โ”‚       โ”œโ”€โ”€ heading_reconciliation.py
โ”‚       โ”œโ”€โ”€ boundary_fix.py
โ”‚       โ”œโ”€โ”€ footnote_relocation.py
โ”‚       โ””โ”€โ”€ revision.py
โ”œโ”€โ”€ middleware/                 # HTTP middleware
โ”‚   โ”œโ”€โ”€ auth.py                # API key + docs auth
โ”‚   โ”œโ”€โ”€ rate_limit.py          # Per-IP rate limiting
โ”‚   โ””โ”€โ”€ metrics.py             # Prometheus instrumentation
โ”œโ”€โ”€ shared/                    # Constants and shared models
โ””โ”€โ”€ utils/                     # Helpers (retry, circuit breaker, tokens)
```

## Testing

The project uses a three-tier testing strategy:

### Unit Tests (`make test-fast`)

Fast, isolated tests that mock external dependencies (S3, Redis, LLM calls). Run these before every commit.

```bash
make test-fast
```

Key patterns:
- **PydanticAI agents** โ€” mock `_get_*_agent` or `_get_*_subagent` factories to avoid real LLM calls
- **Circuit breakers** โ€” use `reset_llm_circuit_breaker()` in `autouse=True` fixtures to prevent state leakage between tests
- **Conditional tools** โ€” test `prepare` functions return `ToolDefinition` or `None` based on task type

### Integration Tests (`make test-integration`)

Tests that exercise real service interactions (Redis, S3 via LocalStack) but still mock LLM calls. Run before PRs.

```bash
make test-integration
```

### End-to-End Tests (`make test-e2e`)

Full pipeline tests with real documents. Requires AWS credentials for Bedrock. Run before merges.

```bash
make test-e2e
```

### Test Markers

Tests are tagged with pytest markers:

```python
@pytest.mark.unit          # Unit test (mocked dependencies)
@pytest.mark.integration   # Needs Redis + S3
@pytest.mark.e2e           # Full pipeline, real LLM calls
@pytest.mark.slow          # Takes >10 seconds
```

## Development Workflow

1. **Create a feature branch** from `main`
2. **Start services** with `make dev`
3. **Edit code** in `src/` โ€” changes auto-reload in the container
4. **Run unit tests** with `make test-fast` for quick feedback
5. **Test manually** via the viewer at `http://localhost:8080/viewer` or the API at `http://localhost:8080/docs`
6. **Run integration tests** with `make test-integration` before opening a PR
7. **Open a pull request** against `main`

## Adding a Pipeline Stage

Pipeline stages are defined in `src/services/pipeline_viewer.py`. Each stage:

1. Receives the current markdown and document context (the dossier)
2. Makes modifications
3. Returns an updated version

To add a new stage:

1. Create a prompt module in `src/agents/prompts/` defining the agent's instructions
2. Add the step method to `PipelineViewerService` in `src/services/pipeline_viewer.py`
3. Wire it into the pipeline sequence in the `process()` method
4. Add a version bump if the stage produces a new document version
5. Write unit tests mocking the LLM calls
6. Update the viewer stage groupings in `clients/viewer/src/components/pipeline-viewer/StageTabs.tsx`

## Code Style

- **Python 3.11+** with type hints
- **Pydantic** models for all data structures
- **Async/await** throughout โ€” the FastAPI app is fully asynchronous
- **No direct host execution** โ€” all code runs in Docker containers

## Getting Help

- Open an issue on [GitHub](https://github.com/EqualifyEverything/equalify-reflow/issues)
- For partner support, contact [Blake Bertuccelli-Booth](mailto:b3b@uic.edu)