pdfOCR: PaddleOCR model support
The release of pdfOCR 5.0.0 introduced support for pretrained ONNX PaddleOCR and EasyOCR models, adding to the docTR models already supported.
The following code sample shows how to generate a searchable PDF by running OCR with a PaddleOCR model (converted to ONNX format) through pdfOCR’s ONNX-based OCR engine.
After loading the specified input image, it builds a detection predictor and a recognition predictor from the PaddleOCR ONNX model files (inference.onnx) and their accompanying configuration files (inference.yml), and then creates a output PDF.
Check the comments in the example for more details.
Compatible PaddleOCR/EasyOCR models already converted to ONNX format are available from our Hugging Face repository.
Java
##GITHUB:https://github.com/itext/itext-publications-examples-java/blob/develop/src/main/java/com/itextpdf/samples/sandbox/pdfocr/onnx/PdfOcrOnnxPaddleOcrExample.java##
C#
##GITHUB:https://github.com/itext/itext-publications-samples-dotnet/blob/develop/itext/itext.samples/itext/samples/sandbox/pdfocr/onnx/PdfOcrOnnxPaddleOcrExample.cs##