pdfOCR module - Onnx | iText Knowledge Base

With the release of iText Suite 9.6.0 (pdfOCR 5.0.0), we also released new pdfOCR modules, called pdfocr-onnx-abstract and pdfocr-onnx-cpu, which enable the use of Open Neural Network Exchange (ONNX) compatible models with iText.

The new modules bring additional ONNX model support and optional GPU acceleration, along with other improvements. Therefore, they replace pdfOCR module - onnxTR - [Deprecated] from earlier releases.

It is really super easy, especially if you are already familiar with the pdfOCR API (Java/.NET).

If you haven’t installed it, you can find the Java installation instructions here and for .NET here.

Java

IDetectionPredictor detectionPredictor = OnnxDetectionPredictor.paddleOcr(DETECTION);
IRecognitionPredictor recognitionPredictor = OnnxRecognitionPredictor.paddleOcr(RECOGNITION);

try (OnnxOcrEngine ocrEngine = new OnnxOcrEngine(detectionPredictor, recognitionPredictor)) {
    OcrPdfCreator ocrPdfCreator = new OcrPdfCreator(ocrEngine);
    try (PdfWriter writer = new PdfWriter(PATH_TO_OUTPUT_PDF)) {
        String imagePath = "src/images/image.png";
        PdfDocument pdf = ocrPdfCreator.createPdf(Collections.singletonList(new File(imagePath)), writer);
        pdf.close();
    }
}

You will notice, though, that with the OnnxOcrEngine (Java/.NET) constructor, there are two arguments that go into it.

Detection - the predictor that identifies where text appears in the document.
Recognition - the predictor that identifies what the text is at the location detected by the detection predictor.

By supporting ONNX we can support multiple engines (currently docTR, PaddleOCR, and EasyOCR). You need to download the ONNX model(s) you’d want to use (you will need to specify them with OnnxOcrEngine).

You can find a wide range of compatible PaddleOCR/EasyOCR models from the following Hugging Face repository:

https://huggingface.co/itextresearch

For docTR, we currently recommend the following models:

Felix92/onnxtr-fast-tiny for detection
Felix92/doctr-dummy-torch-crnn-vgg16-bn for recognition

You will have to download the model .onnx files and use them for OnnxDetectionPredictor.fast() and OnnxRecognitionPredictor.crnnVgg16() respectively (Java/.NET)

More examples can be found on our GitHub for Java and .NET:

`PdfOcrOnnxExample`	Performs OCR using the provided `OnnxOcrEngine` for the given list of input images and saves output to a PDF file using the provided path.
`PdfOcrOnnxMultilingualExample`	Performs OCR using the onnxtr-parseq-multilingual-v1.onnx recognition model for the given list of input images with different latin languages. Also, this example demonstrates how to show the recognition result using `OcrPdfCreatorProperties` to set color for recognized text.
`PdfOcrOnnxPdfAsInputExample`	Performs OCR of all images in an input PDF file and generates a searchable PDF using the provided `OnnxOcrEngine`.
`PdfOcrOnnxTextPositioningExample`	Defines the way text is retrieved from OCR engine output specifying `TextPositioning` (to collect text by lines or by words) in `OnnxEngineProperties` in order to perform OCR using the provided `OnnxOcrEngine` for the given images. Saves output to a PDF file.
`PdfOcrOnnxTxtFileExample`	Performs OCR using provided `OnnxOcrEngine` for the given list of input images and saves output to a text file using the provided path.
`CustomOnnxRuntimeSessionOptionsExample`	Shows how to provide custom `ai.onnxruntime.OrtSession.SessionOptions` used to construct `OrtSession` which wraps an ONNX model and allows inference calls. This will allow to specify whether to run OCR on GPU or CPU, execution mode, optimization level and other options. In order to run models on GPU, add `pdfocr-onnx-abstract` and `onnxruntime_gpu` dependencies. `com.itextpdf.pdfocr.onnx.DefaultOrtSessionOptionsCreator` supports GPU mode by default, so no additional changes required unless you want to set up some custom options.
`PdfOcrOnnxPaddleOcrExample`	Shows how to perform OCR using `OnnxOcrEngine` and `PaddleOCR` ML-models for the given list of input images, and save output to a PDF file using the provided path. PaddleOCR models converted to ONNX format can be found at https://huggingface.co/itextresearch.
`PdfOcrOnnxEasyOcrExample`	Shows how to perform OCR using `OnnxOcrEngine` and `EasyOCR` ML-models for the given list of input images, and save output to a PDF file using the provided path. EasyOCR models converted to ONNX format can be found at https://huggingface.co/itextresearch.
`PdfOcrOnnxDisableArbitraryRotationExample`	Shows how to disable arbitrary rotation for OCR result for the given list of input images. As a result of this particular example, only `0`, `90`, `180` and `270` degrees text rotation will be used.
`EasyOcrDisableTextBoxMergerExample`	Shows how to perform OCR using `OnnxOcrEngine` and `EasyOCR` ML-models for the given list of input images, while disabling the text box merging algorithm for EasyOCR’s detection post-processor.

For complete tests that are part of our functional tests, be sure to check our GitHub repository for our Java and .NET tests.