1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55---
title: Supported Document Types
date: 2026-04-16
author: Equalify Tech Team
description: What Reflow produces, which document types convert well, and which are outside the current scope.
---
# Supported document types
Equalify Reflow is designed for **course materials** โ syllabi, academic papers, policy documents, and presentations. This page is a quick lookup for what's in scope, what the pipeline produces, and where quality drops off. For the judgement side โ how to evaluate a specific conversion โ see [interpret the output](../how-to/interpret-the-output.md).
## Size limits
| Limit | Value | Behaviour when exceeded |
|---|---|---|
| File size | 100 MB | API rejects with `413 Payload Too Large` at submission |
| Page count | 50 pages | Job moves to `failed` status with error: `PDF has N pages, which exceeds the maximum of 50. Please split into smaller documents.` |
Documents close to the 50-page ceiling also incur the most cost (roughly linear in page count โ plan ~$0.08โ0.10 per page for a Haiku-tier run).
## Quality by document type
| Document type | Typical quality | Common issues |
|---|---|---|
| Syllabi and course materials | High | Occasional heading-level disagreements |
| Policy documents | High | Complex nested numbering schemes |
| Letters and memos | High | Letterhead content may be over-described |
| Academic chapters | Medium | Footnote ordering, reading order in multi-column layouts |
| Presentations (slides) | Medium | Slide boundaries, text embedded in images |
| Infographics and posters | Lower | Spatial relationships lost when linearised |
| Brochures with complex layouts | Lower | Multi-column reading order confusion |
The pipeline emits `warnings` on the job response for document types it handles poorly, visible in both the API response and the viewer.
## Known limitations
The following are outside current scope and will produce lower-quality output:
- **Scanned multi-column academic chapters** โ reading order across columns is unreliable for scanned content
- **Heavy infographics** โ spatial relationships (flow diagrams, org charts) flatten into linear text
- **Mathematical equations** โ complex LaTeX formulas are not fully supported
- **Bilingual scanned documents** โ OCR quality degrades with mixed-language scanned content
- **Very long documents** โ while the technical limit is 50 pages, quality and cost scale with complexity; documents over ~40 pages may benefit from being split along natural section boundaries
## What the output contains
Every completed job produces:
- **`result.md`** โ a single markdown file with the full document content (semantic headings H1โH6, alt text on images, accessible tables with header rows, reconstructed lists, inline hyperlinks, logical reading order)
- **Figures** โ individual image files (PNG) for each extracted figure/chart/diagram, each tied to a `figure_id` referenced from the markdown
- **Change ledger** โ a JSON record of every edit the pipeline made, with before/after text and a one-sentence reason per edit
- **Bundle** โ optional ZIP of the above, downloadable from the `/bundle` endpoint
Decorative images (logos, spacers) are identified and left with empty alt text, following WCAG best practices. Informational images get descriptive alt text generated by the image sub-agent.