Release pdfOCR 2.0.2
Release date: October 25, 2022
For this release of our OCR add-on for iText 7, we have upgraded the underlying tess4j library to version 4.6.0, which uses version 4.1.3 of the Tesseract OCR engine and version 1.82.0 of the Leptonica image processing and analysis library.
A small note for users encountering a MethodTooLargeException with JDK19 and pdfOCR; there is currently an issue with the Leptonica library and JDK19. See this issue for more information and a possible solution.
Downloads:
| GitHub | Maven | NuGet | Artifactory | |
|---|---|---|---|---|
| iText pdfOCR – 2.0.2 (Java) | link | link | N/A | link |
| iText pdfOCR – 2.0.2 (.NET) | link | N/A | link | link |
Changelog:
Improvements
Updated tess4j:tess4j from 4.5.5 to 4.6.0 (Java), which pushes the following upgrades:
- Tesseract 4.1.3 (f38e7a7)
- Leptonica 1.82.0 (lept4j-1.16.1)
Installation Instructions
Examples (latest ones)
FAQ (latest ones)
- Which languages are supported in pdfOCR?
- What does TextPositioning in pdfOCR do?
- Could not find a glyph corresponding to Unicode character
- pdfOCR: If your scanned document has a mixture of sections with paragraphs and tables, what is a recommended strategy here?
- pdfOCR: Is handwriting recognition supported