LiteParse
Spatial Parsing in Action

See why layout-preserving extraction beats naive text dumps. Pre-parsed examples show the difference LiteParse makes for AI pipelines.

📄 Document AI 🔲 Spatial Parsing 🔍 Built-in OCR By LlamaIndex

Select Document

Ready

Extraction Output

Click "Parse Document" to see the extraction...

Document Regions

LiteParse returns bounding box coordinates for every text region. Hover a region to see its extracted content.

Hover a region to see coordinates

Detected Layout

LiteParse handles 50+ file formats. Here are the ones that matter most for AI pipelines.

📕
PDF
.pdf
Tables, headers, multi-column layouts, embedded fonts. The format LiteParse was built for.
Preserves column alignment
📝
Word
.docx
Styled text, embedded tables, headers/footers, tracked changes. Extracts structure faithfully.
Keeps table cell positions
📊
Excel
.xlsx
Cell positions, formulas, sheet names, merged cells. Spatial grid maps directly to spreadsheet layout.
Cell-level coordinates
🖼️
Images
.png .jpg .tiff
Built-in Tesseract.js OCR. Scanned docs, photos of whiteboards, receipts, handwritten notes.
OCR with spatial positions
📽️
PowerPoint
.pptx
Slide text, speaker notes, text box positions. Each slide parsed as a separate spatial page.
Slide-level bounding boxes
📧
Email
.eml .msg
Headers, body text, attachment metadata. Thread structure preserved in extraction order.
Header/body separation

What LiteParse Does Differently

Traditional parsers detect document structure and try to convert it to Markdown. Multi-column layouts, nested tables, and merged cells break constantly. LiteParse skips the structural guessing — it projects text onto a spatial grid, preserves the whitespace, and trusts that LLMs can read a table that looks like a table.

Spatial Grid

Text is placed on a character grid matching its position on the page. Columns stay columnar. Tables stay tabular. Nothing gets flattened into a wall of text.

Bounding Boxes

Every line returns precise coordinates — where it sat on the page, how wide it was. Useful for region-specific extraction and downstream processing.

Zero Cloud Dependency

Runs entirely on your machine via npm. No API keys, no cloud calls, no data leaving your network. Parse in milliseconds, reason immediately.

← Cloudflare AI Demo Next Demo: LangChain → All Demos