Understanding the Output

Published March 23, 2026 · By Equalify Tech Team

Understanding the Output

When Equalify Reflow processes a PDF, it produces an accessible markdown document and a set of extracted images. This guide explains what the output includes, how to evaluate its quality, and what the system's current limitations are.

What You Get

Accessible Markdown

The primary output is a single markdown file containing the full document content with:

Semantic headings — a properly nested heading hierarchy (H1 through H6) that reflects the document's logical structure, not just its visual appearance
Alt text for images — descriptive text for informational images, charts, and diagrams. Decorative images (logos, background graphics) are marked as decorative
Accessible tables — tables with header rows identified, preserving the relationship between headers and data cells
Reconstructed lists — bulleted and numbered lists with proper nesting, even when the original PDF layout fragmented them across columns or pages
Hyperlinks — URLs detected in the text are converted to clickable links
Clean reading order — content flows in a logical sequence, free from the column-jumping and page-break artifacts of the PDF layout

Extracted Figures

Images, charts, diagrams, and photos are extracted from the PDF and saved as separate files. Each figure includes:

A unique identifier linking it to its position in the markdown
The page number where it appeared
The original caption, if one existed in the PDF

During the Translation stage, a specialist sub-agent generates alt text for each figure and embeds it directly in the markdown (e.g., ![Description of chart](figures/figure-1.png)). Decorative images like logos are identified and left with empty alt text, following WCAG best practices.

The Change Ledger

Every edit the pipeline makes is recorded in a change ledger. Each entry includes:

Action — what type of change was made (add, modify, delete)
Target — what was changed (heading, paragraph, table, figure, etc.)
Before / After — the exact text before and after the edit
Reasoning — why the AI made this change

The ledger is available through the API (GET /api/v1/documents/{job_id}/ledger) and in the pipeline viewer's Changes panel.

Evaluating Quality

What to Check

When reviewing a converted document, focus on these areas:

Structure

Does the heading hierarchy make sense? H1 should be the document title, H2s should be major sections, and so on
Are sections in the correct order?
Do lists have the right nesting?

Content Accuracy

Is the text faithful to the original? Look for OCR errors (character substitutions, missing words)
Are numbers, dates, and proper nouns correct?
Are footnotes in the right place and correctly numbered?

Tables

Do tables have header rows identified?
Are cells aligned with the correct headers?
For complex tables (merged cells, multi-level headers), verify the structure manually

Images

Do informational images have descriptive alt text?
Is the alt text accurate — does it convey the same information as the image?
Are decorative images (logos, dividers) appropriately left without alt text?

Formatting

Are hyperlinks clickable (not just plain text URLs)?
Is emphasis (bold, italic) preserved where it carries meaning?
Are code blocks properly identified and formatted?

Quality by Document Type

Some document types convert better than others:

Document Type	Typical Quality	Common Issues
Syllabi and course materials	High	Occasional heading level disagreements
Policy documents	High	Complex nested numbering schemes
Letters and memos	High	Letterhead content may be over-described
Academic chapters	Medium	Footnote ordering, reading order in multi-column layouts
Presentations (slides)	Medium	Slide boundaries, text embedded in images
Infographics and posters	Lower	Spatial relationships lost when linearized
Brochures with complex layouts	Lower	Multi-column reading order confusion

Known Limitations

The system is designed for course materials — syllabi, academic papers, policy documents, presentations, and similar content. The following document types are outside the current scope and may produce lower-quality results:

Scanned multi-column academic chapters — reading order detection across columns is unreliable for scanned documents
Heavy infographics — spatial relationships (flow diagrams, organizational charts) are lost when linearized to text
Mathematical equations — complex LaTeX formulas are not fully supported
Bilingual scanned documents — OCR quality degrades significantly with mixed-language scanned content
Documents over 40 pages — the system processes them, but quality and cost scale with complexity

When the pipeline detects a document type it handles poorly, it emits warnings in the response. These warnings appear in both the API response and the viewer interface.

Providing Feedback

If you find an issue in a converted document, see the Providing Feedback guide for how to report issues and suggest corrections through the WordPress plugin.