The release of iText Suite 9.6 includes a new major version of the pdfOCR add-on, going from version 4.1.2 to 5.0.0.
The primary reasons for this change are the support for additional ONNX models and GPU acceleration using OnnxRuntime, however, we’ve taken this opportunity to refine the API in other areas, adding and renaming artifacts/packages, and various classes and methods.
For more details on the new and improved features in pdfOCR 5.0.0, see the release notes.
General Breaking Changes
New and renamed artifacts/packages to accommodate extra model support and optional GPU acceleration:
Renamed Artifacts/Packages
New Artifacts/Packages
Detailed Breaking Changes
Breaking change | Upgrade path |
|---|
Class com.itextpdf.pdfocr.onnx.actions.data.PdfOcrOnnxTrProductData renamed to PdfOcrOnnxProductData | |
Class com.itextpdf.pdfocr.onnx.actions.events.PdfOcrOnnxTrProductEvent renamed to PdfOcrOnnxProductEvent. Constant PROCESS_IMAGE_ONNXTR renamed to PROCESS_IMAGE_ONNX. Static method createProcessImageOnnxTrEvent renamed to createProcessImageOnnxEvent. | |
Class com.itextpdf.pdfocr.onnx.exceptions.PdfOcrOnnxTrExceptionMessageConstant renamed to PdfOcrOnnxExceptionMessageConstant Constant ONLY_SUPPORT_RGB_IMAGES removed. | |
Class com.itextpdf.pdfocr.onnx.OnnxTrEngineProperties renamed to OnnxEngineProperties Method setTextPositioning removed. Method getTextPositioningMode removed. Method getTextPositioning started to return com.itextpdf.pdfocr.onnx.text.TextPositioning instead of com.itextpdf.pdfocr.onnx.TextPositioning | Use setTextPositioning(TextPositioning) instead. Use getTextPositioning instead. |
Class com.itextpdf.pdfocr.onnx.OnnxTrOcrEngine renamed to OnnxOcrEngine | |
Class com.itextpdf.pdfocr.onnx.AbstractOnnxPredictor changed: Constructor AbstractOnnxPredictor(java.lang.String, com.itextpdf.pdfocr.onnx.OnnxInputProperties, long[]) removed | Use AbstractOnnxPredictor(com.itextpdf.pdfocr.onnx.AbstractOnnxPredictorProperties, long[]) instead. |
Enum com.itextpdf.pdfocr.onnx.TextPositioning removed. | Use com.itextpdf.pdfocr.onnx.text.TextPositioning instead. |
Class com.itextpdf.pdfocr.onnx.OnnxInputProperties changed: Constant EXPECTED_CHANNEL_COUNT removed. Constructor OnnxInputProperties(float[], float[], long[], boolean) removed. | Use OnnxInputProperties(com.itextpdf.pdfocr.onnx.ImageResizeOptions, float[], float[], in) instead. |
Class com.itextpdf.pdfocr.tesseract4.exceptions.PdfOcrTesseract4ExceptionMessageConstant changed: Constant CANNOT_WRITE_TO_FILE removed. | |
Class com.itextpdf.pdfocr.tesseract4.logs.Tesseract4LogMessageConstant changed: Constants CANNOT_RETRIEVE_PAGES_FROM_IMAGE and START_OCR_FOR_IMAGES removed. | |
Class com.itextpdf.pdfocr.onnx.FloatBufferMdArray changed: Constructor FloatBufferMdArray(FloatBuffer data, long[] shape) now takes FloatBufferWrapper instead of FloatBuffer Method getData() now returns FloatBufferWrapper instead of FloatBuffer | |
Class com.itextpdf.pdfocr.TextInfo was changed: Constructor TextInfo(String, Rectangle, TextOrientation) now takes Point[] instead of Rectangle and TextOrientation: TextInfo(String, Point[]), where Point[] is array of 4 Points describing text bbox (lower-left based relative to text) expressed in points (0 - lower-left, 1 - upper-left, 2 - upper-right, 3 - lower-right point). Return type of the public void setText(String) is changed to TextInfo Methods Rectangle getBboxRect() and void setBboxRect(Rectangle), TextOrientation getOrientation() and void setOrientation(TextOrientation) are removed. Use Point[] getTextPoints() and TextInfo setTextPoints(Point[]) instead. | In order to create TextInfo instead of
CODE
final TextInfo textInfo = new TextInfo(text, bboxRect, orientation);
use
CODE
Point[] textBox = new Point[]{
new Point(bboxRect.getLeft(), bboxRect.getBottom()),
new Point(bboxRect.getLeft(), bboxRect.getTop()),
new Point(bboxRect.getRight(), bboxRect.getTop()),
new Point(bboxRect.getRight(), bboxRect.getBottom())
};
Point[] rotatedTextBox;
switch (orientation) {
case HORIZONTAL_ROTATED_90:
rotatedTextBox = new Point[]{textBox[3], textBox[0], textBox[1], textBox[2]};
break;
case HORIZONTAL_ROTATED_180:
rotatedTextBox = new Point[]{textBox[2], textBox[3], textBox[0], textBox[1]};
break;
case HORIZONTAL_ROTATED_270:
rotatedTextBox = new Point[]{textBox[1], textBox[2], textBox[3], textBox[0]};
break;
case HORIZONTAL:
default:
rotatedTextBox = textBox;
break;
}
final TextInfo textInfo = new TextInfo().setText(text).setTextPoints(rotatedTextBox);
|
Behavior breaking change. Added support for arbitrary rotation angles for text chunks. The text rotation angle depends on the text detection result (also taking into account orientation prediction result). | In order to enable old behavior with only 0, 90, 180 and 270 degrees rotation support, apply PdfOcrTextBuilder.correctRotationAngle to doImageOcr output. You could override OnnxOcrEngine to do that:
CODE
/**
* Implementation of the {@link OnnxOcrEngine} supporting only 0, 90, 180 and 270 degrees text rotation.
*/
public static class RotationAgnosticOnnxOcrEngine extends OnnxOcrEngine {
public RotationAgnosticOnnxOcrEngine(IDetectionPredictor detectionPredictor,
IOrientationPredictor orientationPredictor,
IRecognitionPredictor recognitionPredictor,
OnnxEngineProperties properties) {
super(detectionPredictor, orientationPredictor, recognitionPredictor, properties);
}
@Override
public Map<Integer, List<TextInfo>> doImageOcr(File input, OcrProcessContext ocrProcessContext) {
return PdfOcrTextBuilder.correctRotationAngle(super.doImageOcr(input, ocrProcessContext));
}
}
|
Interface com.itextpdf.pdfocr.IOcrEngine has been changed:
doImageOcr methods taking a list of images as an argument have been added to the interface:
Map<Integer, List<TextInfo>> doImageOcr(List<File> inputs)
Map<Integer, List<TextInfo>> doImageOcr(List<File> inputs, OcrProcessContext ocrProcessContext) | Override the following methods in classes that implement com.itextpdf.pdfocr.IOcrEngine: Map<Integer, List<TextInfo>> doImageOcr(List<File> inputs)
Map<Integer, List<TextInfo>> doImageOcr(List<File> inputs, OcrProcessContext ocrProcessContext)
|