How do I create a separate OCR layer?

By default, pdfOCR merges the recognized text into the image that just got processed, but you may want to keep this information separated. To do this, all you need is under the OcrPdfCreatorProperties (Java/.NET) class.

With it, you can define:

If you want a separate text layer (either of the two options below will trigger the creation of a text layer)
- by defining its name (Java/.NET)
- by defining its color (Java/.NET) - bear in mind that if you do not define this parameter, the text will be transparent
If you want a separate image layer
- by defining its name (Java/.NET)

Here's a quick example with all bells and whistles turned on (all previously listed options being used):

don't forget to specify the path to your Tesseract Data in your code TESS_DATA_FOLDER below. You can always find trained models here.