Legacy notice!

iText 5 is the previous major version of iText's leading PDF SDK. iText 5 has been EOL, and is no longer developed. Switch your project to iText 7, integrating the latest developments.
Check related iText 7 content!

Is it possible to convert Hebrew HTML to PDF?

I'm trying to convert an HTML file with Hebrew characters (UTF-8) to PDF by using iText, but I'm getting all letters in reverse order. As far I understand, I can set RTL only for ColumnText and PdfCell objects. So here's my doubt: is it possible to convert Hebrew HTML to PDF? This is my HTML:

/span>
    "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
html xmlns="https://www.w3.org/1999/xhtml">
head>
  title>Title of document/title>
/head>
body style="font-size:12.0pt; font-family:Arial">
  ???? ????
/body>
/html>
When I convert this HTML to PDF using XML Worker, I get this result:

Wrong order

These is "Hello World" in Hebrew written from left to right. It should be written from right to left.

Posted on StackOverflow on Jun 15, 2015 by Anatoly

Please take a look at the ParseHtml10 example. In this example, we have take the file hebrew.html:

[blockcode]

Hebrew text

???? ????

[/blockcode]

And we convert it to PDF using this code:

public void createPdf(String file) throws IOException, DocumentException {
    // step 1
    Document document = new Document();
    // step 2
    PdfWriter writer =
        PdfWriter.getInstance(document, new FileOutputStream(file));
    // step 3
    document.open();
    // step 4
    // Styles
    CSSResolver cssResolver = new StyleAttrCSSResolver();
    XMLWorkerFontProvider fontProvider =
        new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
    fontProvider.register("resources/fonts/NotoSansHebrew-Regular.ttf");
    CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
    HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
    htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());

    // Pipelines
    PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
    HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
    CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

    // XML Worker
    XMLWorker worker = new XMLWorker(css, true);
    XMLParser p = new XMLParser(worker);
    p.parse(new FileInputStream(HTML), Charset.forName("UTF-8"));;
    // step 5
    document.close();
}

The result looks like hebrew.pdf:

Text from right to left

Text from right to left

What are the hurdles you need to take?

  • You need to wrap your text in an element such as a [blockcode]

    [/blockcode] or a [blockcode][/blockcode].
  • You need to add the attribute dir="rtl" to define the direction.

  • You need to make sure that you're using a font that knows how to display Hebrew. I used a NOTO font for Hebrew. This is one of the fonts distributed by Google in their program to provide fonts for every possible language

Important: this solution requires at least iText and XML Worker 5.5.5, because support for the dir attribute was introduced in 5.5.4 and improved in 5.5.5.