Release pdfOCR 5.0.0

Release date: 2026-04-01

This release of the pdfOCR add-on for iText Core not only supports PaddleOCR and EasyOCR models, but also offers some huge performance improvements and general OCR improvements across the board. Therefore it’s significant enough to warrant a major release version, bumping the version number to 5.0.0.

PaddleOCR/EasyOCR Model Support

The ML-based OCR engine is extended to include support for pretrained ONNX PaddleOCR and EasyOCR models, adding to the docTR models already supported. In our ongoing OCR tests, these models perform extremely well over a wide range of use cases and have extensive language support.

We are now maintaining a HuggingFace repository where you can download many compatible models to get started quickly. You’re free to experiment with alternative models, though some models will need to be converted to the ONNX format to work with pdfOCR’s ONNX engine. The PaddleOCR documentation has details on converting PaddleOCR models to ONNX format, however, EasyOCR does not provide official documentation on converting models.

GPU Acceleration

Optional GPU acceleration is also now enabled for pdfOCR’s ONNX engine, which not only lets the CPU handle other tasks but can also result in major performance gains.

ONNX Runtime supports multiple execution providers for hardware acceleration, although not all are ready for production. At present, we have only tested pdfOCR using Nvidia CUDA-enabled GPUs, so you should refer to OnnxRuntime’s official docs on execution providers for other hardware.

General OCR Improvements

Another nice change is we've significantly improved how pdfOCR positions recognized text boxes for rotated content. This allows pdfOCR to better match the original orientation and placement of text, including small-angle rotations (not only 0°, 90°, 180°, 270° as previously).

Additionally, support for retrieving OCR text bounding rectangles in image pixel coordinates rather than PDF coordinate space has been added, removing the need for manual conversion when working at the image level.

Huge .NET Performance Gains

For .NET, the performance of the ONNX engine is massively improved thanks to some clever optimizations. Most significantly, a wrapper for Java’s FloatBuffer class was created to prevent the copying of huge float data arrays for models. Additional improvements to general image handling and processing brought some nice wins, resulting in blazing-fast performance on both Java and .NET.

Breaking Changes

Since this is a major version release, you can expect some breaking changes. The most important change is to split up and rename the module for pdfOCR’s ONNX engine.

Since we now support more than simply docTR ONNX models, the pdf-ocr-onnxtr package has been renamed and split into pdf-ocr-onnx-abstract and pdfocr-onnx-cpu. This change also accommodates for GPU acceleration using the onnxruntime_gpu package.

See the breaking changes for details on the differences from previous releases of pdfOCR.

Downloads

	GitHub	Maven	NuGet	Artifactory
iText pdfOCR – 5.0.0 (Java)	link	link (API) link (Tesseract) link (ONNX-abstract) link (ONNX-cpu)	N/A	link (API) link (Tesseract) link (ONNX-abstract) link (ONNX-cpu)
iText pdfOCR – 5.0.0 (.NET)	link	N/A	link (API) link (Tesseract) link (ONNX-abstract) link (ONNX-cpu)	link (API) link (Tesseract) link (ONNX-abstract) link (ONNX-cpu)

iText pdfOCR – 5.0.0 (Java)

link

link (API)

link (Tesseract)

link (ONNX-abstract)

link (ONNX-cpu)

N/A

link (API)

link (Tesseract)

link (ONNX-abstract)

link (ONNX-cpu)

iText pdfOCR – 5.0.0 (.NET)

link

N/A

link (API)

link (Tesseract)

link (ONNX-abstract)

link (ONNX-cpu)

link (API)

link (Tesseract)

link (ONNX-abstract)

link (ONNX-cpu)

Changelog

New features

DEVSIX-9706 – Support EasyOCR and PaddleOCR models for pdfOCR
DEVSIX-9740 – Support PdfOCR-Onnx execution on GPU
DEVSIX-9792 – Allow multiple files for IOcrEngine#doImageOcr input
DEVSIX-9793 – Add the ability to get rectangles in image pixels for TextInfo

Improvements

DEVSIX-9739 – PdfOCR: Improve text boxes taking into account arbitrary rotation
DEVSIX-9458 – Improve BY_LINES TextPositioning mode to handle whitespaces
DEVSIX-9775 – Create .NET wrapper for FloatBuffer to make some classes autoportable
DEVSIX-9723 – PdfOCR: port BufferedImageUtil and image type related logic to .NET
DEVSIX-9724 – PdfOCR: port YamlUtil to .NET

Release date: 2026-04-01

PaddleOCR/EasyOCR Model Support

GPU Acceleration

General OCR Improvements

Huge .NET Performance Gains

Breaking Changes

Downloads

Changelog

New features

Improvements

Bug fixes

Installation Instructions

Examples (latest ones)

FAQ (latest ones)

Release date: 2026-04-01

PaddleOCR/EasyOCR Model Support

GPU Acceleration

General OCR Improvements

Huge .NET Performance Gains

Breaking Changes

Release Related Examples

Downloads

Changelog

New features

Improvements

Bug fixes

Installation Instructions

Examples (latest ones)

FAQ (latest ones)