How to convert HTML containing Arabic/Hebrew characters to PDF?
This is a duplicate of the question Which languages are supported in pdfHTML?. The answer can be found in chapter 6, but this question is asked so frequently that an extra entry in the FAQ section is justified. It's also an occasion to provide an extra example.
In the C07E14_SayPeace (Java/.NET) example, we convert the say_peace.html HTML file to PDF.
We see English, Arabic, and Hebrew in this text. We'll use a different font file for each of these languages.
public static final String[] FONTS = {
"src/main/resources/fonts/noto/NotoSans-Regular.ttf",
"src/main/resources/fonts/noto/NotoNaskhArabic-Regular.ttf",
"src/main/resources/fonts/noto/NotoSansHebrew-Regular.ttf"
};
We'll create a FontProvider
instance that only uses these font files, and we'll use this FontProvider
as a converter property.
public void createPdf(String src, String[] fonts, String dest) throws IOException {
ConverterProperties properties = new ConverterProperties();
FontProvider fontProvider = new DefaultFontProvider(false, false, false);
for (String font : fonts) {
FontProgram fontProgram = FontProgramFactory.createFont(font);
fontProvider.addFont(fontProgram);
}
properties.setFontProvider(fontProvider);
HtmlConverter.convertToPdf(new File(src), new File(dest), properties);
}
The result is a PDF file in which the text is rendered correctly:
If you used the appropriate fonts, and you get a different result, in the sense that the Hebrew and Arabic text is rendered from left to right, instead of from right to left, you have forgotten to add the pdfCalligraph add-on to your CLASSPATH.