How to do HTML to XML conversion to generate closed tags?

When I try converting html to pdf using iText and XML Worker, I'm asked to give the closing tag for <hr> and <br> tags. It works if I do this manually, but I don't want to add each closing tag manually. How can I do this in an automated way?


Posted on StackOverflow on Oct 30, 2014 by Kannu Verma

You are experiencing this problem because you are feeding HTML to iText's XML Worker. XML Worker requires XML, so you need to convert your HTML into XHTML.

There is an example on how to do this here: D00_XHTML

public static void tidyUp(String path) throws IOException {
    File html = new File(path);
    byte[] xhtml = Jsoup.parse(html, "US-ASCII").html().getBytes();
    File dir = new File("results/xml");
    FileOutputStream fos = new FileOutputStream(new File(dir, html.getName()));

In this example, we get a path to an ordinary HTML file (similar to what you have). We then use the Jsoup library to parse the HTML into an XHTML byte array. In this example, we use that byte array to write an XHTML file to disk. You can use the byte array directly as input for XML Worker.

