Skip to main content
Skip table of contents

Release pdfOCR 4.1.0

Release date: Sep 2nd, 2025

pdfOCR is our add-on for iText Core to perform OCR on documents and images.

This release of pdfOCR brings a huge change with a new built-in OCR engine. It adds the pdfocr-onnxtr module, which implements the OnnxTR library for OCR tasks, with specific requirements for model predictors and resource management. It significantly improves recognition accuracy for English text, and other Latin-based languages.

The Open Neural Network Exchange (ONNX) is an open standard format for machine learning models, enabling interoperability across various frameworks and tools. OnnxTR is a Python OCR library which is a wrapper around the popular OCR tool doctr, enhanced with support for ONNX models.

It makes OCR processing faster and more accessible by leveraging optimized ONNX models without requiring heavy frameworks. This allows easy integration of OCR into applications with minimal resource consumption and high processing speed, offering fast processing and support for multiple platforms, with features like modularity and lightweight dependencies. Using the existing pdfOCR API, we’ve simply added another OCR engine to the existing pdfOcr-tesseract4 module

Not only that, but pdfOCR now directly supports PDF as input files. This can be a big benefit for OCR workflows, as it removes the need to process PDFs with iText Core to extract images from scanned documents.

Downloads


GitHub

Maven

NuGet

Artifactory

iText pdfOCR – 4.1.0 (Java)

link

link (API)

link (Tesseract)

link (ONNX)

N/A

link (API)

link (Tesseract)

link (ONNX)

iText pdfOCR – 4.1.0 (.NET)

link

N/A

link (API)

link (Tesseract)

link (ONNX)

link (API)

link (Tesseract)

link (ONNX)

Release Related Examples

Changelog

New features

  • DEVSIX-5151: Support PDF as input

  • DEVSIX-9154 Add pdfocr-onnxtr module

  • DEVSIX-9237: Implement LevenshteinDistance for pdfocr-onnxtr

  • DEVSIX-9254: Support TextPositioning.BY_LINES in pdfocr-onnxtr

  • DEVSIX-9295: Support multilingual model for onnxtr

Improvements

  • DEVSIX-9193: Improve image reading API in pdfocr-onnxtr

  • DEVSIX-9235: Support .NET Standard 2.0 for pdfocr-api and onnxtr

Bug fixes

  • DEVSIX-9233: Fix bug in correctOrientations (only .NET)

Installation Instructions

Examples (latest ones)

FAQ (latest ones)

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.