Release iText 5.4.0

Exactly 13 years after the first use of the name iText and the first official iText release, we're releasing iText 5.4.0.

Release Notes:

Thirteen years ago, on February 14th of the magical year 2000, Bruno published version 0.30 of a library he had been writing in my spare time. This library allowed developers to enhance their applications with simple PDF generation functionality without having to know anything about PDF syntax.

Being a fan of Donald Knuth, Bruno was looking for a name that sounded like TeX, but that was different enough for people not to confuse it with TeX. As the first versions of my library were only able to process text —images weren't supported until the summer of 2000—, I experimented with variations on the words TeX or TeXt. At that time, everything was "e-": e-mail, e-marketing, e-Business,... My first idea was to call my library "eTeXt", but I didn't like the sound of that word, so he changed it into iText.
I often get the question if I was inspired by Apple's product line, but he's never been a Mac-user, so he didn't think of the iMac (1998), and the other devices that made the "i-" popular are from a much later date: the iPod (2001), iPhone (2007) and iPad (2010).

What's new in this release?

We have done great effort to offer support for PDF/UA. UA stands for Universal Accessibility and it means you make your documents accessible for blind and visually impaired users. There's still some work to do, but with tomorrow's release, you'll be able to create a PDF/UA compliant PDF file "out of the box" by following these instructions:

"One of the main requirements of PDF/UA is that the PDF needs to be Tagged. You can achieve this with the PdfWriter.setTagged() method before opening the Document. This method sets a tagged flag that instructs iText to preserve the order of the content of all the high-level objects that are added to the Document. At the same time, iText will also create an appropriate structure tree. In this structure tree, the type of high-level object (Paragraph, PdfPTable, List,...) will determine the "role" of the structure element. You can programmatically change the default value in every implementation of the IAccessibleElement interface by using the setRole() method. As long as you stick to using high-level objects, your document will be tagged correctly."

For the PDF to be compliant with ISO 14289 (the PDF/UA standard), you also need to use the following methods:

Document.addTitle()— a method you use to give the document a title,
PdfWriter.setViewerPreferences(PdfWriter.DisplayDocTitle)— a method that makes sure the document title is shown in the viewer,
Document.addLanguage()— a method you should use to indicate which language is used in the document,
PdfWriter.createXmpMetadata()— a method that creates an XMP stream and adds this stream as document-level metadata.

"Now when you run the resulting PDF through an accessibility checker, you'll see that it conforms with the PDF/UA standard."

This is what we call a "minimum implementation" of PDF/UA. In the future, we'll try to improve the API. For instance: in PDF 2.0, metadata will no longer be stored in an Info dictionary. This concept will be deprecated in favor of using an XMP stream. It probably makes sense for us to always create an XMP stream instead of keeping the createXmpMetadata() method optional. We also may choose to set the viewer preference by default for Tagged PDFs, and to throw an exception if you create a Tagged PDF without setting a title or defining a language.

Other PDF/UA functionality includes the manipulation of existing documents: we can already split and merge Tagged PDF documents without breaking the structure. In the near future, we'll take a closer look at filling out PDF/UA forms, maintaining their PDF/UA status. We've also been improving iText's text extraction capabilities. Whereas the series of 5.3.x releases mainly brought new digital signature functionality, the 5.4.x will bring more functionality for structured PDF. This includes Level A support for PDF/A. Better support for structured/unstructured documents is one of the main goals on our technical roadmap for 2013.

This doesn't mean we've stopped working on digital signatures, which was one of our major goals for 2012 (resulting in a 150-page book about PDF and digital signatures). In iText 5.4.0, we're switching from BouncyCastle 1.47 to BouncyCastle 1.48, we've fixed a problem with the creation of OCSP responses for use in a Document Security Store, we added a missing OID for an RSA algorithm, and so on.

Important: Upgrading from iText 5.3.5 to iText 5.4.0 is also a must because our major I/O changes introducing a new com.itextpdf.text.io package, caused a problem when using multiple threads: embedding fonts was no longer multi-threaded. We've fixed this problem.

Surprisingly, we've also discovered some bugs that have been present in iText for a long time, but that surfaced recently:

When prefilling an AcroForm template, Adobe Reader would ask the end user if he wants to save the document, even if the end user didn't change anything.
Pre-filled fields were disappearing when filling out an AcroForm template.
Changing the font in an AcroForm template didn't work in case the optional /DR entry was missing in the form.
PdfSmartCopy caused an OutOfMemoryException when it encountered a PDF with circular references (an object A referring to an object B that refers to object A).
A Chunk object was able to change the size of an Image which caused strange side-effects when the Image was used in a different context.

The fact that these old bugs (some of which date from almost 10 years ago) are now surfacing probably means that more and more people are using iText in ways I've never imagined before. Back in 2000, I had no idea I would be announcing exciting new iText functionality on the library's 13th birthday!

Looking forward: what will the next release bring us? Apart from further PDF/UA improvements, we're also working on better support for ligatures. This would finally allow us to create documents in Hindi and other Indic languages. Rest assure: we won't wait until iText's 14th birthday to release this interesting functionality!

Changelog

iText 5.4.0

Changes made by Paulo Soares
- Make EOF consistent in class RandomAccessFileOrArray.
- Fix PdfStamper: avoid double closing; update Javadoc.
- Fix PdfReader: fixed confusing error message that reported the wrong value.
- Fix CMapAwareDocumentFont: Apply the font mapping first before overriding it with the ToUnicode.
- GlyphList: added support for char names of the format uniXXXX (with XXXX a hexadecimal value).
- Started working on Indic support based on code contributions by Palash Ray.
  This functionality has been disabled for this release, because it needs much more work.
Changes made by Kevin Day
- Refactoring IO: add optional exclusive lock capability
Changes made by Alexander Chingarev
- Added tagged DIV element support
- Fixed "incorrect reading order" problem on some PDF documents
- A fix for a multithreading issue introduced in 5.3.5 that occurred when embedding ttf fonts.
- Bugfix in PdfSmartCopy: circular references in PDFs (constructions where object A refers to object B and object B refers back to object A) could cause endless loops resulting in an OutOfMemoryException.
Changes made by Denis Koleda
- Adding tag attributes for PdfDiv, PdfPTable and lists
Changes made by Pavel Alay
- Fixed margin mirroring functionality.
- Fixed problems when copying/concatenating Tagged PDFs.
- Improved file size by removing unused objects after copying Tagged PDFs.
- Improved tags structure in case the order of documents and pages are mixed.
- Create nums tree for incorrect tagged document.
- Fixed a bug with PdfStructureTreeRoot.buildTree() for PdfWriter
- Throw an exception when trying to merge Tagged PDFs with an invalid structure
Changes made by Eugene Markovskyi
- Empty line processing in BidiLine: the remaining width of an empty line should be equal to original width of its container (be it ColumnText or PdfDocument).
- Fix layout problems when using consecutive spaces.
Changes made by Raf Hens
- MappedRandomAccessFile: fixed IndexOutOfBoundsException
Changes made by Bruno
- PdfReader: The method eliminateSharedStreams() now has to be called explicitly if you intend to change something to one specific stream.
- Avoiding a NullPointerException when using an ExternalBlankSignatureContainer
- The isRevocationValid() method shouldn't assume SHA-1 as digest algorithm.
- Fix when parsing PDFs because the same glyph name can correspond with more than one character value.
- Fix: a Chunk shouldn't have the "power" to change the properties of an Image object. If the Image is also used in a different context, you risk unwanted side-effects (getting the image in a different size than you expected).
- If an OCSP response doesn't define a 'next update', we use the date of the OSCP response + 3 minutes.
- Applied suggestion by W Trevor King to add support for UTF-8 to FdfReader.
- EncryptionAlgorithms: Added missing OID for RSA.
- PdfContentByte: introduction of an isTagged() function that checks if the writer object isn't null before invoking writer.isTagged().
- AcroFields: changing the font with setFieldProperty() didn't work if no resources dictionary (/DR) was available.
- AcroFields: the boolean generateAppearances is true by default, which means we need to remove the /NeedAppearances entry. It will be reintroduced if somebody triggers setGenerateAppearances(false); Note that the presence of /NeedAppearances with value true causes recent versions of Adobe Reader to ask the end user if he wants to save the form, even if he didn't change anything.
- AcroFields bugfix: In case a prefilled form was flattened, the prefilled text fields were disappearing because they weren't regenerated.
- LtvVerification: Added a method that gets the issuing certificate of a certificate from a list of available certificates. This method is used when getting an OCSP response for a certificate (which requires the parent certificate).
- BouncyCastle upgrade: we're now using BouncyCastle 1.48 instead of BouncyCaslte 1.47.

XML Worker 5.4.0

Changes made by Eugene Markovskyi
- Fixed issues with positioning.
- Fix layout problems when using consecutive spaces.

iText RUPS 5.4.0

Changes made by Bruno
- Adding missing brackets for String values in the content stream
- Solved the problem with missing page table (in some cases)
Changes made by Raf Hens
- Fixed a problem in TextAreaOutputStream
Changes made by Jens Ponnet
- Make it possible to view image streams as real images
- Added error handling, some pages/page dictionaries were throwing nullpointer exceptions due to having no parent node.