Parsing XML and XHTML | iText Knowledge Base

There are a lot of questions about HTMLWorker on StackOverflow. Many of these questions remain unanswered as HTMLWorker has been abandoned in favor of XML Worker. HTMLWorker was initially meant as a parser for a small selection of HTML tags. People started using it as if it were a full-blown HTML to PDF converter and then complained because HTMLWorker doesn't support CSS parsing. The HTMLWorker code grew organically up until a point where it was no longer maintainable.

We started another project, called XML Worker. It can be used to convert XHTML to PDF. It's not an URL to PDF converter in the sense that it won't "print your web site to PDF". In HTML, you can encounter content at the end of the file that needs to be added at the start of the document. When this happens, one would expect that the start of the document is the first page. That isn't possible with iText as iText flushes finished pages to the OutputStream as soon as possible and there is no way to return to a previous page to add the extra content.

XML Worker is meant to create simple reports using an easy language such as HTML (and some CSS). It won't resolve ASP pages, nor execute JavaScript. It will only deal with finished XHTML.

How to add a rich Textbox (HTML) to a table cell?

How to make a particular sub-string Bold when converting HTML to PDF?

How to do HTML to XML conversion to generate closed tags?

How to render certain HTML entities (such as arrows) in PDF?