Creating Well-Tagged PDF Documents with pdfHTML
The PDF Association recently published their new Well-Tagged PDF (WTPDF) specification; which aims to define how to represent reusable and accessible electronic documents in PDF 2.0 files across a wide spectrum of possible use-cases. The concept of Tagged PDF was introduced with the PDF 1.4 specification, and WTPDF builds on the concept by utilizing PDF 2.0-specific features.
WTPDF also forms the basis of the PDF/A-4 and PDF/UA-2 standards for archiving and universal accessibility, and you can find more information on these standards and their relationship to WTPDF and PDF 2.0 in our dedicated article on the iText site.
As participants in the PDF Association and ISO PDF committee working groups which develop these standards, Apryse has contributed an example document to demonstrate the iText PDF SDK’s conformance to the WTPDF specification. You can find this example PDF on the PDF Association’s WTPDF page. As a bonus, our example also conforms to the PDF/A-4 and PDF/UA-2 standards, and was generated from a HTML template using the pdfHTML add-on for iText Core.
To find out how the example document was created, read on.
Why Convert From HTML?
Two of the recent additions to iText Core’s arsenal of supported PDF standards were PDF/A-4 (ISO 19005-4) in version 8.0.2 and PDF/UA-2 (ISO 14289-2) in version 8.0.3. This means you can create such documents from scratch just using iText’s high-level document creation APIs. Indeed, we have up-to-date code examples demonstrating the use of the PdfADocument and PdfUADocument sub-classes of PdfDocument in our Java and .NET Sandbox repositories on GitHub.
However, as noted in Chapter 3 and Chapter 4 of our pdfHTML tutorial, you can make your life easier when creating PDF/A and PDF/UA documents by using HTML/XML templates. This is a great idea because pdfHTML can reuse the semantic and structural information in your HTML/XML and CSS, and map it to the appropriate iText objects and styles - saving potentially heaps of time manually tagging PDF content to achieve conformance with the required standards.
In our article https://itextpdf.com/blog/technical-notes/easier-pdfa-pdfhtml, we explained in detail the advantages of pdfHTML for creating archivable PDF/A-3B documents. Many of the concepts discussed there are applicable to generating documents conforming to other PDF/A conformance levels, Tagged PDF and PDF/UA-1.
For WTPDF and its related PDF/A-4 and PDF/UA-2 standards, achieving conformant output is a little different since these are purely PDF 2.0 standards; meaning we can take advantage of PDF 2.0-specific features and upgrades to Tagged PDF. As noted by the PDF Association in their article, the re-development of the Tagged PDF section was one of the most significant upgrades in PDF 2.0.
Below, you can find the code and associated resources we used to create our WTPDF example document. As mentioned earlier, the generated PDF also conforms to both PDF/A-4 and PDF/UA-2. Feel free to adapt and experiment with the code for your own purposes.
Generate a WTPDF File with pdfHTML
We’ll first need some suitable HTML to use as a template, so what could be more appropriate than the PDF Association’s own article on their efforts in advancing accessibility?
Here’s how if looks on the web:

How the article is rendered in-browser.
Now we’ll need to gather the required resources to create a WTPDF-conformant file. You can find these in the attached zip file under the Resources heading below. However, we recommend checking out the wtpdf-demo repository on our GitHub to ensure you have all the required dependencies.
Example Code
Let’s take a look at the App.java code to understand what it is actually doing:
package com.itextpdf.wtpdfsample;
import com.itextpdf.html2pdf.ConverterProperties;
import com.itextpdf.html2pdf.HtmlConverter;
import com.itextpdf.html2pdf.attach.ITagWorker;
import com.itextpdf.html2pdf.attach.ProcessorContext;
import com.itextpdf.html2pdf.attach.impl.DefaultTagWorkerFactory;
import com.itextpdf.html2pdf.attach.impl.tags.HTagWorker;
import com.itextpdf.html2pdf.resolver.font.DefaultFontProvider;
import com.itextpdf.kernel.pdf.PdfAConformanceLevel;
import com.itextpdf.kernel.pdf.PdfDocumentInfo;
import com.itextpdf.kernel.pdf.PdfOutputIntent;
import com.itextpdf.kernel.pdf.PdfString;
import com.itextpdf.kernel.pdf.PdfVersion;
import com.itextpdf.kernel.pdf.PdfViewerPreferences;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.kernel.pdf.WriterProperties;
import com.itextpdf.kernel.xmp.XMPException;
import com.itextpdf.kernel.xmp.XMPMeta;
import com.itextpdf.kernel.xmp.XMPMetaFactory;
import com.itextpdf.layout.IPropertyContainer;
import com.itextpdf.layout.element.Div;
import com.itextpdf.layout.element.IElement;
import com.itextpdf.layout.element.Paragraph;
import com.itextpdf.pdfa.PdfADocument;
import com.itextpdf.styledxmlparser.node.IElementNode;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
/**
* Hello world!
*/
public class App {
private static final String SOURCE_FOLDER = "./src/main/resources/";
private static final Set<String> H_TAGS = new HashSet<>(Arrays.asList("h1", "h2", "h3", "h4", "h5", "h6", "h7"));
public static void main(String[] args) throws IOException, XMPException {
String outFile = "wtpdf.pdf";
PdfOutputIntent outputIntent = new PdfOutputIntent(
"Custom",
"",
"http://www.color.org",
"sRGB IEC61964-2.1",
Files.newInputStream(Paths.get(SOURCE_FOLDER + "sRGB Color Space Profile.icm")));
WriterProperties writerProperties = new WriterProperties().setPdfVersion(PdfVersion.PDF_2_0);
PdfADocument pdfDocument = new PdfADocument(new PdfWriter(outFile, writerProperties), PdfAConformanceLevel.PDF_A_4, outputIntent);
DefaultTagWorkerFactory factory = new DefaultTagWorkerFactory() {
@Override
public ITagWorker getCustomTagWorker(IElementNode tag, ProcessorContext context) {
if (H_TAGS.contains(tag.name())) {
return new HTagWorker(tag, context) {
@Override
public boolean processTagChild(ITagWorker childTagWorker, ProcessorContext context) {
return super.processTagChild(childTagWorker, context);
}
@Override
public IPropertyContainer getElementResult() {
IPropertyContainer elementResult = super.getElementResult();
if (elementResult instanceof Div) {
for (IElement child : ((Div) elementResult).getChildren()) {
if (child instanceof Paragraph) {
((Paragraph) child).setNeutralRole();
}
}
}
return elementResult;
}
};
}
return super.getCustomTagWorker(tag, context);
}
};
// setup the general requirements for a wtpdf document
byte[] bytes = Files.readAllBytes(Paths.get(SOURCE_FOLDER + "simplePdfUA2.xmp"));
XMPMeta xmpMeta = XMPMetaFactory.parse(new ByteArrayInputStream(bytes));
pdfDocument.setXmpMetadata(xmpMeta);
pdfDocument.setTagged();
pdfDocument.getCatalog().setViewerPreferences(new PdfViewerPreferences().setDisplayDocTitle(true));
pdfDocument.getCatalog().setLang(new PdfString("en-US"));
PdfDocumentInfo info = pdfDocument.getDocumentInfo();
info.setTitle("Well tagged PDF document");
// Use custom font provider as we only want embedded fonts
DefaultFontProvider fontProvider = new DefaultFontProvider(false, false, false);
fontProvider.addFont(SOURCE_FOLDER + "NotoSans-Regular.ttf");
fontProvider.addFont(SOURCE_FOLDER + "NotoEmoji-Regular.ttf");
ConverterProperties converterProperties = new ConverterProperties()
.setBaseUri(SOURCE_FOLDER)
.setTagWorkerFactory(factory)
.setFontProvider(fontProvider);
File file = new File(SOURCE_FOLDER + "article.html");
try (FileInputStream str = new FileInputStream(file)) {
HtmlConverter.convertToPdf(str, pdfDocument, converterProperties);
}
pdfDocument.close();
System.out.println("WTPDF created");
}
}
You’ll notice we select PdfVersion.PDF_2_0 in the WriterProperties, since WTPDF requires PDF 2.0 conformance.
Much like the PDF/A-3B example from the article mentioned earlier, we must supply the fonts to be embedded. You’ll note that since our source document uses an emoji, we need to embed NotoEmoji-Regular as well as the NotoSans-Regular font.
We must also set the output intent and provide an appropriate color profile, as well as the necessary XMP metadata to embed into the file. For more information on XMP, see https://kb.itextpdf.com/itext/how-to-add-metadata-to-a-pdf-using-pdfhtml.
We also set the PdfAConformanceLevel to PDF/A-4, and since we want a Tagged PDF, we also use pdfDocument.setTagged() to add a hierarchical structure tree to the PDF. We also want to conform to PDF/UA-2, and so we must set properties for the document’s title and language, along with viewer preferences.
Validating the Results
And that’s it! However, we should validate our document to make sure it passes the required checks for PDF/A-4 and PDF/UA-2. To verify our output we can run our created document through the veraPDF Conformance Checker, which in the current release (1.26.2) supports both PDF/A-4 and PDF/UA-2 validation:

As you can see, our WTPDF document passes all the checks for PDF/A-4 and PDF/UA-2.