We ship pdfOCR with Liberation (GitHub) Sans Regular, as that has a pretty good coverage of glyphs (think letters) to be rendered. But that doesn't mean it covers everything that is available out there (especially if you're going outside of the Latin realm), so pdfOCR will warn you if it can't find a corresponding glyph.

For such scenarios, you just need to specify a font file with the PdfOcrFontProvider class (Java/.NET) that contains the relevant glyphs to be rendered.

final PdfOcrFontProvider fontProvider = new PdfOcrFontProvider(); fontProvider.addFont("font.ttf"); properties.setFontProvider(fontProvider); final OcrPdfCreator ocrPdfCreator = new OcrPdfCreator(tesseractReader, properties);
var fontProvider = new PdfOcrFontProvider(); fontProvider.AddFont(@"font.ttf"); properties.SetFontProvider(fontProvider); var ocrPdfCreator = new OcrPdfCreator(tesseractReader, properties);

If you want to easily display glyph coverage information for a specific font, we think a pretty neat tool to check any given ttf file is FontDrop! (we're 100% unaffiliated).