By default, pdfOCR merges the recognized text into the image that just got processed, but you may want to keep this information separated. To do this, all you need is under the
OcrPdfCreatorProperties (Java/.NET) class.
With it, you can define:
- If you want a separate text layer (either of the two options below will trigger the creation of a text layer)
- If you want a separate image layer
Here's a quick example with all bells and whistles turned on (all previously listed options being used):
don't forget to specify the path to your Tesseract Data in your code
TESS_DATA_FOLDER below. You can always find trained models here.