Skip to content

feat: add PaddleOCR-VL parser engine#3189

Open
gavin913-lss wants to merge 5 commits into
HKUDS:mainfrom
gavin913-lss:feat/paddleocr-vl-parser
Open

feat: add PaddleOCR-VL parser engine#3189
gavin913-lss wants to merge 5 commits into
HKUDS:mainfrom
gavin913-lss:feat/paddleocr-vl-parser

Conversation

@gavin913-lss
Copy link
Copy Markdown

What does this PR do?

Adds PaddleOCR-VL as a new parser engine for LightRAG, providing an alternative to MinerU and Docling for document parsing.

Fixes #3114

Features

  • New parser engine: for document parsing
  • Multi-language support: Particularly good for Chinese documents
  • Simple API: Single endpoint for OCR with detection, recognition, and classification
  • Layout analysis: Returns bounding box coordinates for text blocks

Files Added

    • Module exports
    • PaddleOCRVLClient for API communication
    • PaddleOCRVLIRBuilder for IR conversion
    • Cache utilities
    • Manifest for parsed results

Configuration

Environment Variables

Variable Default Description
PADDLEOCR_ENDPOINT http://localhost:8000 API endpoint
PADDLEOCR_LANG ch Language (ch, en, etc.)
PADDLEOCR_DET true Enable text detection
PADDLEOCR_REC true Enable text recognition
PADDLEOCR_CLS true Enable direction classification

Usage

Dependencies

Required

  • httpx>=0.24.0
  • paddleocr>=2.7.0
  • paddlepaddle>=2.5.0

Testing

  1. Start PaddleOCR-VL server:

  2. Run LightRAG with PaddleOCR-VL:

Notes

  • PaddleOCR-VL is particularly good for Chinese documents
  • The API is simpler than MinerU/Docling but may be slower for large documents
  • Supports both detection and recognition in a single API call

@ysys143
Copy link
Copy Markdown

ysys143 commented Jun 4, 2026

Before merging this PR, it may be worth discussing the abstraction layer first — see #3197 (RFC: pluggable OCR/VLM parser abstraction). The current approach of adding a new four-file subpackage per engine is the same pattern we've seen for MinerU and Docling, and the list of requested engines is growing (SmolDocling, qwen3-vl, DeepSeek-OCR, GLM-OCR, Mistral OCR, Upstage Document Parse, ...). A shared BaseExternalParser protocol would let PaddleOCR-VL land as a clean implementation rather than another copy-paste subpackage.

@gavin913-lss
Copy link
Copy Markdown
Author

Good point. I agree that a shared BaseExternalParser protocol makes more sense than copy-pasting the four-file structure per engine.

I'm happy to either:

  1. Refactor feat: add PaddleOCR-VL parser engine #3189 to implement against the BaseExternalParser protocol once RFC: introduce a BaseExternalParser protocol for pluggable OCR/VLM backends #3197 is finalized
  2. Help implement the abstraction layer in RFC: introduce a BaseExternalParser protocol for pluggable OCR/VLM backends #3197 first, then rebase feat: add PaddleOCR-VL parser engine #3189 on top

Which approach do you prefer?

@ysys143
Copy link
Copy Markdown

ysys143 commented Jun 5, 2026

Option 1 makes more sense — let's finalize the protocol interface in #3197 first, then your PR can be the first concrete implementation against it.

Waiting for maintainer feedback on the RFC direction before drafting the interface spec. Once that's in, happy to collaborate on the design in #3197 so you have a clear target to implement against.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request]: 能否接入PaddleOCR-VL,测试发现PaddleOCR-VL比MinerU要好一些

2 participants