Is it possible to convert Hebrew HTML to PDF?
I'm trying to convert an HTML file with Hebrew characters (UTF-8) to PDF by using iText, but I'm getting all letters in reverse order. As far I understand, I can set RTL only for ColumnText and PdfCell objects. So here's my doubt: is it possible to convert Hebrew HTML to PDF? This is my HTML:
/span> "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> html xmlns="https://www.w3.org/1999/xhtml"> head> title>Title of document/title> /head> body style="font-size:12.0pt; font-family:Arial"> ???? ???? /body> /html>
When I convert this HTML to PDF using XML Worker, I get this result:
These is "Hello World" in Hebrew written from left to right. It should be written from right to left.
Posted on StackOverflow on Jun 15, 2015 by Anatoly
Please take a look at the ParseHtml10 example. In this example, we have take the file hebrew.html:
[blockcode]
Hebrew text
???? ????
[/blockcode]
And we convert it to PDF using this code:
public void createPdf(String file) throws IOException, DocumentException { // step 1 Document document = new Document(); // step 2 PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file)); // step 3 document.open(); // step 4 // Styles CSSResolver cssResolver = new StyleAttrCSSResolver(); XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS); fontProvider.register("resources/fonts/NotoSansHebrew-Regular.ttf"); CssAppliers cssAppliers = new CssAppliersImpl(fontProvider); HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers); htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory()); // Pipelines PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer); HtmlPipeline html = new HtmlPipeline(htmlContext, pdf); CssResolverPipeline css = new CssResolverPipeline(cssResolver, html); // XML Worker XMLWorker worker = new XMLWorker(css, true); XMLParser p = new XMLParser(worker); p.parse(new FileInputStream(HTML), Charset.forName("UTF-8"));; // step 5 document.close(); }
The result looks like hebrew.pdf:
Text from right to left
What are the hurdles you need to take?
-
You need to wrap your text in an element such as a [blockcode]
[/blockcode] or a [blockcode][/blockcode].
-
You need to add the attribute
dir="rtl"to define the direction. -
You need to make sure that you're using a font that knows how to display Hebrew. I used a NOTO font for Hebrew. This is one of the fonts distributed by Google in their program to provide fonts for every possible language
Important: this solution requires at least iText and XML Worker 5.5.5, because support for the dir attribute was introduced in 5.5.4 and improved in 5.5.5.