How can I generate a PDF/UA compatible PDF with iText? | iText 5 PDF Development Guide

❓

We have a number of dynamically generated PDFs on our site that were created using iText 2.1.7. However, we also have a large number of users that have disabilities and use screen readers, like JAWS, to render our PDFs. We use the setTagged() method to tag the PDFs, but some elements of the PDF appear out of order. Some even become more jumbled after calling setTagged()!

I read about PDF/UA in a 2013 interview about iText with Bruno Lowagie, and this seems like something that might help with our problem. However, I have not been able to find a good example of how to generate a PDF/UA document. Can you provide an example?

Posted on StackOverflow on Jan 29, 2015 by k-den

Please take a look at the PdfUA example. It explains step by step what is needed to be compliant with PDF/UA. A similar example was presented at the iText Summit in 2014 and at JavaOne. Watch the iText Summit video tutorial.

Java

static public void main(String args[]) throws IOException, DocumentException {
        File file = new File(DEST);
        file.getParentFile().mkdirs();
        new PdfUA().createPdf(DEST);
    }
    
    /**
     * Creates an accessible PDF with images and text.
     * @param dest  the path to the resulting PDF
     * @throws IOException
     * @throws DocumentException 
     */
    public void createPdf(String dest) throws IOException, DocumentException {
        Document document = new Document(PageSize.A4.rotate());
        PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(dest));
        writer.setPdfVersion(PdfWriter.VERSION_1_7);
        //TAGGED PDF
        //Make document tagged
        writer.setTagged();
        //===============
        //PDF/UA
        //Set document metadata
        writer.setViewerPreferences(PdfWriter.DisplayDocTitle);
        document.addLanguage("en-US");
        document.addTitle("English pangram");
        writer.createXmpMetadata();
        //=====================
        document.open();

        Paragraph p = new Paragraph();
        //PDF/UA
        //Embed font
        Font font = FontFactory.getFont(FONT, BaseFont.WINANSI, BaseFont.EMBEDDED, 20);
        p.setFont(font);
        //==================
        Chunk c = new Chunk("The quick brown ");
        p.add(c);
        Image i = Image.getInstance(FOX);
        c = new Chunk(i, 0, -24);
        //PDF/UA
        //Set alt text
        c.setAccessibleAttribute(PdfName.ALT, new PdfString("Fox"));
        //==============
        p.add(c);
        p.add(new Chunk(" jumps over the lazy "));
        i = Image.getInstance(DOG);
        c = new Chunk(i, 0, -24);
        //PDF/UA
        //Set alt text
        c.setAccessibleAttribute(PdfName.ALT, new PdfString("Dog"));
        //==================
        p.add(c);
        document.add(p);

        p = new Paragraph("\n\n\n\n\n\n\n\n\n\n\n\n", font);
        document.add(p);
        List list = new List(true);
        list.add(new ListItem("quick", font));
        list.add(new ListItem("brown", font));
        list.add(new ListItem("fox", font));
        list.add(new ListItem("jumps", font));
        list.add(new ListItem("over", font));
        list.add(new ListItem("the", font));
        list.add(new ListItem("lazy", font));
        list.add(new ListItem("dog", font));
        document.add(list);
        document.close();
    }

You make the document tagged with the setTagged document, but that's not sufficient. You also need to set document data: the document title needs to be displayed and you need to indicate the language used in the document. XMP metadata is mandatory.

Furthermore you need to embed all fonts. When you have images, you need an alternate description. In the example, we replace the words "dog" and "fox" by an image. To make sure that these images are "read out loud" correctly, we need to use the setAccessibleAttribute() method.

At the end of the example, I added a numbered list. In another question, you claim that the list is not read out loud correctly by JAWS. If you check the PDF file created with the above example, more specifically pdfua.pdf, you'll discover that JAWS reads the document as expected, with the numbers and the text in the right order.

The reason why "it doesn't work" when you try this, is simple. You are using a version of iText that is 3 years older than the PDF/UA standard. Also: in the version you are using, you are responsible for creating the tag structure at the lowest PDF level when you use the setTagged() method. In more recent versions, iText takes care of this at a high level. You need the latest iText version to achieve what you want.

Click this link if you want to see how to answer this question in iText 7.