Release iText 5.4.4

When we released iText 5.4.3, we promised that we would continue working on PDF/A, more specifically: provide all the checks that ensure you the file you're creating is either PDF/A-2 or PDF/A-3, just the way we did with PDF/A-1.

iText 5.4.4 brings interesting new functionality in the area of accessibility (PDF/UA), more specifically when forms are involved. The next release will again focus on archiving (PDF/A).

Release Notes

Merging accessible files

In iText 5.3.4, we enhanced PdfCopy so that it would preserve the StructTreeRoot. Before that version, all structure information was lost when concatenation different PDF files that were tagged. In plain language: since 5.3.4, you can merge tagged PDF files, and tagged PDF is essential for accessible files (in de US: files that are compliant with Section 508). Granted, the functionality was still experimental in version 5.3.4 and we fixed plenty of bugs in later version, but one of our customers bumped into a very specific problem: they wanted to concatenate Tagged PDF files containing form fields.

As documented, PdfCopy doesn't support merging forms. When dealing with fonts, we used to advise the use of PdfCopyFields instead. Unfortunately, PdfCopyFields doesn't support merging Tagged PDFs. Our customer faced a dilemma: either merge accessible files and lose the forms, or merge the forms and lose the accessibility. Neither choice was acceptable for the customer of our customer, so we made their requirement our priorityn resulting in iText 5.4.4.

From now on you can now merge forms and preserve the tagged PDF structure when using the addDocument() method in PdfCopy. At the same time, we've deprecated PdfCopyFields.

Flattening accessible files

While one team was working on merging forms, one developer experimented with an alternative solution: what if it was possible to flatten a filled out form, preserving not only the structured tree root, but also the reading order of the content in the content stream (the latter being overkill according to the specs).

You could then think of a scenario where you flatten the form first, and merge them afterwards, preserving the Section 508 compatibility throughout the workflow. We ended up with code that works for many forms, but not for all (the pitfalls are documented in the source code). We decided to ship this experimental code in the xtra package, so that those who need it, can take a look at it and see if it meets their needs.

Accessibility

PDF/UA, Section 508 and accessibility: we've used those words frequently when announcing our plans for 2013. In this new release, you'll discover that we've fixed a number of issues related to tagged PDF and structure: table borders are now marked as artefacts, images were tagged incorrectly in some cases, links weren't added to the structure tree correctly,...

The more customers joining our accessibility efforts, the better the accessibility functionality is getting. We're almost there!

Performance

Two customers informed us that the performance of the latest versions of iText was worse than the performance of some earlier versions. We discovered that the use of the Java UUID class in combination with some specific JVM implementations on Linux were indeed slowing iText down. We removed the dependency on UUID, fixing the problem completely.

Unfortunately, our profiling tests indicated that the performance of iText has indeed decreased in cases where PdfPTable is used. This can't be avoided because the PdfPTable code is now much more accurate than it used to be. The only way to improve the performance in this case, is to use PdfPTable in a different way.

Images

iText supports a wide variety of images, but even within one specific image type, there can also be a wide variety of flavors. That's an understatement when looking at TIFF. Again we've discovered a strange phenomenon that made some specific type of TIFF file appear as a pink image instead of a white image when adding it to a PDF using iText. That's fixed now.

We also fixed a problem when manipulating existing PDFs that contain JBIG2 images as well as existing PDFs of which the /Length parameter of the image stream is one byte off.

Invalid PDFs

The more customers we have, the more weird PDFs are sent to us. PDFs using names with a length greater than 127, PDFs without a root dictionary, PDFs with a page tree that refers to page dictionaries that are null,... In many cases, iText threw a NullPointerException because these PDFs are invalid and there's very little you can do about it.

Now we've started changing these NullPointerExceptions into InvalidPdfExceptions informing the users what is wrong with the PDF file. Note that this isn't always possible: in many cases human eyes are needed to see what is wrong with the file.

Changelog

iText Core 5.4.4

Changes made by Alexander Chingarev
- Performance improvement: replacing AccessibleElement UUID with AccessibleElementId
- Table borders and backgrounds in tagged PDF are now artifacts.
- Tag image correctly when adding it directly to document (not wrapping to chunk).
- Fixed a problem with incorrect link insertion into structTree.
- Fix: annotation structures are now properly copied when merging documents with PdfCopy/PdfSmartCopy.
- Removed obsolete PdfCopy functionality.
- Fixed issue with mixed tages when merging tagged PDF documents with PdfCopy.
- Deprecating PdfCopyFields
- PdfCopy: introduced a method that allows you to add a document, preserving the form fields.
- Content parsing: fix for color parsing.
Changes made by Pavel Alay
- Fixed bug in GetCOName() in PdfCopy (formerly in PdfCopyFields)
- Improve memory usage in PdfStructureElement.
Changes made by Eugene Markovskyi
- Replaced the iText XMP implementation with Adobe's XMP Core library
- Fixed incorrect line wrapping for Chinese characters. Type0 DocumentFont should use the metrics of DescendentFont(/DW, /W...) instead the predefined ones in cjkMirror(CJKFont). Unfortunately the fix look like a workaround because we've tried to keep backward compatibility. Refactoring of the font functionality is scheduled for 2014.
Changes made by Raf Hens
- Respect the NeedAppearances setting of a PDF that is read, set generateAppearances accordingly.
- PdfCopyFields: Enable NeedAppearances in the output when one the input documents has it enabled.
- Fix for inline images that have 1 byte more than expected.
- Added support for TIFFs with "new style" JPEG compression and photometric RGB.
Changes made by Michaël Demey
- PdfWriter.getBoxSize now has an overloaded method that returns the intersection of a box (crop, bleed, art, ...) with the given rectangle.
Changes made by Bruno
- Fix: Signed attributes aren't always DER encoded.
- Fix: Make sure the correct digest algorithm is used; subfilter adbe.pkcs7.sha1 only supports SHA1 as subfilter.
- Make sure you can use OCGRemover in case the /Contents of a page is represented as an array instead of as a stream.
- OCG functionality: Add some checks to avoid a NullPointerException.
- Experimental code to flatten forms in a tagged PDF, preserving the accessibility (Section 508).
- Dealing with PDFs of which the root of the page tree refers to an object with number 0 (invalid PDF syntax), throwing an InvalidPdfException instead of a NullPointerException.
- Dealing with PDFs of which the root of the page tree refers to an object that doesn't exist (no page tree available), throwing an InvalidPdfException instead of a NullPointerException.
- Dealing with PDFs of which the root dictionary is missing, throwing an InvalidPdfException instead of a NullPointerException.
- When an InvalidPdfException is encountered, objects shouldn't just be considered as being "null" (unless in debugmode, for instance when you want to look at the file using RUPS).
- Bugfix: The end-of-line marker may not be taken into account when measuring the length of a stream.
- Bugfix: flattening fields didn't work if a combined field/widget dictionary was present in the Fields array, but not in the page Annots.
- Fixed ArrayIndexOutOfBoundsException reported by Ivan Gregor in case an existing PDF has empty ID values.
- PdfImage: moved the code that deals with transparency outside the "Raw Image" area as proposed by Ivan Gregor.

iText RUPS 5.4.4

Changes made by Michaël Demey
- UX: focus to password field on password protected files
- UX: move cursor to the start of the content when opening a content stream
- Added copy and clear functionality to the console panel.
- Added copy to clipboard functionality to the stream panel. Copying with no selected text copies the entire text.
- Added save to file to stream panel context menu (also allows saving only a selection of the complete stream).
- Added save to file functionality form streams in the PdfTree panel.
Changes made by Bruno Lowagie
- Cleaned up metadata, such as a reference to an old version number as well as to the "working title" for RUPS (my initial idea was to call it "Trapeze").
- Bugfix: password protected files couldn't be opened (not even after providing a correct password).
- Bugfix: Dictionaries weren't rendered correctly when present in a content stream.
- Bugfix: Hexadecimal strings weren't rendered correctly.
- Added a tab that shows the Structure tree of the document.
- Introduced debugmode so that some invalid PDF files (that throw an exception in iText) can be viewed anyway (even if only partially).