Release iText 5.2.0

The previous release was iText 5.1.3 (dated 11-11-11, another special day) and we've been working with iText 5.1.4-SNAPSHOT for a long time now, but eventually we decided not to release a version 5.1.4, but to skip to version 5.2.0

Release Notes:

IMPORTANT

All 5.2.x versions have been removed from our servers because of a serious flaw that was introduced when dealing with large PDFs!

We announced a new iText release for March, but as 2012 is a leap year, we decided to release on February 29th. Otherwise we'd have to wait for another four years before we have the chance to release on such a special day.

The previous release was iText 5.1.3 (dated 11-11-11, another special day) and we've been working with iText 5.1.4-SNAPSHOT for a long time now, but eventually we decided not to release a version 5.1.4, but to skip to version 5.2.0.

The philosophy of the version numbers is that you don't have to change any of your existing code when you upgrade when you upgrade from version x.y.z to version x.y.z+1. When we move from version x.y.z to version x.y+1.z, you may need to adapt some of your code.

In this case, you will need to replace your old itext-asian.jar with a new one, otherwise your code using CJKFont won't work. You'll find it in extrajars-2.2.zip. You'll also need to apply small changes (nothing more than changing package names) if your application depends on java.awt classes such as PdfGraphics2D. We have been experimenting with iText on Google Android and Google App Engine, and we reduced the dependency of iText on java.awt classes to a minimum.

What else is new?

We focused on two major fields:

iText 5.2.0: better PDF parsing

We received plenty of feedback regarding PDF parsing, and we've taken into account almost all the issues that were reported. This means that PDF to text conversion with iText has now improved dramatically. Soon the Belgian IRS will start using iText to parse thousands of documents looking for a national number on the first page.

We're using different strategies to do this: we parse the text at a specific position if we know it; or we parse the whole page looking for a pattern if the number can be anywhere on the page. We've also improved the parsing of PDF documents in languages such as Chinese, Korean, Japanese,...

XML Worker 1.1.2: better HTML rendering

We received plenty of feedback regarding HTML parsing, and we've taken into account a lot of issues that were reported. This means that HTML to PDF conversion with iText has improved dramatically. We still don't offer URL2PDF conversion. For instance: float still isn't supported, but version 1.1.2 of the XML Worker does a much better job at converting flowing HTML to PDF.

Besides the two major areas of interest of this release, we also introduced experimental SVG parsing, we filled some gaps regarding PAdES, we now support PDF files of over 2 GB (up to 10 GB for traditional PDFs and up to 1 TB for PDFs with a cross-reference stream), and we fixed some bugs.

Changelog:

IMPORTANT: READ THIS BEFORE YOU UPGRADE!

If you use CJK fonts in your existing code, you will need to update the itext-asian.jar. You'll find this jar in extrajars-2.2.zip.
If you use AWT classes such as AffineTransform, you should switch to using the classes in package com.itextpdf.awt.geom.
If you use AWT-related classes such as PdfGraphics2D in your existing code, you'll have to make a minor change to your code. This class has moved to another package: com.itextpdf.awt.

iText 5.2.0

Changes made by Paulo Soares
- Digital signatures: Encapsulation of the basic OCSP response and correction for the CRL inclusion.
- Support for PAdES-LVT timestamp verification.
- Support digests in timestamps other than SHA-1.
- Unification of cmap handling. CJK fonts support all the encodings.
- Support for big PDFs over 2GB; you can now create 10GB PDFs with a classic cross-reference table and PDFs as big as 1TB with a cross-reference stream. (Suggestions by Welman Jordan)
- Added classes to Map in LtvTimestamp (generics).
- Replaced escape-method in SimpleNamedDestination
- PDF Parsing:
  - Made the getFont() method in pdfContentStreamProcessor private
  - Text extraction with CJK encodings such as GBK-EUC-H is now possible.
  - Several fixes when reading documents with fonts using the /ToUnicode entry.
  - Fix for strange numbers such as --234
  - Resource dictionaries may have direct fonts.
Changes made by Kevin Day
- PdfReader and related classes:
  - Better error messages and better handling zero sized files and attempts to read past the end of the file.
  - Removed restriction that using memory mapping requires the file be smaller than ~2GB.
  - Avoid NullPointerException in RandomAccessFileOrArray
- PDF parsing:
  - Made a utility method in pdfContentStreamProcessor private and clarified the stateful nature of the class
  - LocationTextExtractionStrategy: bounds checking on string lengths and refactoring to make code easier to read.
  - Better handling of color space dictionaries in images.
  - improve handling of quasi improper inline image content.
  - don't decode inline image streams until we absolutely need them.
  - avoid NullPointerException of resource dictionary isn't provided.
Changes made by Eugene Markovskyi
- FontWeight is added to font descriptor of DocumentFont.
- Bugfix PRAcroForm: avoid NullPointerException
- Bugfix ColumnText: Image position should be shifted on descent of previous line.
- Bugfix BidiLine: Taking into account percentage width of LineSepartor.
Changes made by Alexander Chingarev
- PdfName: Added FontFamily tag.
- XfaForm: Fixed bug in XFA forms filling.
Changes made by Bruno
- Making iText more fool-proof: it's forbidden to construct, stroke or fill paths inside a text object.
- AWT-related changes to simplify creating the Android/GAE port of iText:
  - Bugfix by Ivan Farkas: Avoiding a NullPointerException in PdfStamperImp
  - Moved AWT related methods to the bottom of the source code of several class files (PdfContentByte, Barcode, Image, PdfImageObject).
  - Introduce Apache Harmony classes in a package com.itextpdf.awt.geom.
  - Removed several dependencies on AWT classes such as java.awt.Rectangle and java.awt.AffineTransform.
  - Moved PdfGraphics2D, FontMapper, and related classes to package com.itextpdf.awt.
- PDF Parsing: The RegionTextRenderFilter now works with com.itextpdf.text.Rectangle.
- PDF Parsing: It doesn't make sense to take zero length text into account; change made after Adam Read reported a StringIndexOutOfBoundsException on the mailing list (December 5, 2011).
- PdfConcatenate: removing a System.out.println() (originally added for debugging).
- Suggestion by Martin Pallmann to move the IllegalArgumentException out of the try/catch in ICC_Profile.

XML Worker 1.1.2

Changes made by Balder Van Camp
- Fix indentation of Ordered Lists, list are set to autoindent if they are ordered; otherwise the numbering would overwrite the listitems text (bug reported by Stephen Bell on the mailinglist for itext C# version, proposal of a fix by Keith O adapted and added).
- Some javadoc fixes.
- Create abstraction for CssAppliers allowing developers to write their own CssAppliers class and in turn write their own CssApplier. The CssAppliers.getInstance() method has been removed in favor of injection into tag processors through CssAppliersAware interface. Then injection is done in the HtmlPipeline. And a custom CssAppliers implementation can be set in the HtmlPipelineContext. If it's not set the default CssAppliersImpl is used. This code change should not affect users but can affect classes that override current implementations.
- Remove quotes from fontfamily names ( based on http://itext-general.2136553.n4.nabble.com/XMLWorker-HTML-to-PDF-problem-with-external-css-td4373089.html )
Changes made by Jeroen Nouws
- Removed bug causing XMLWorker to crash when trying to parse Headers inside TableData.
Changes made by Eugene Markovskyi
- Using UNDEFINED value as default for font and color properties. Default leading is NaN.
- Applying font properties to paragraph
- The logic of max leading of paragraph is disabled (iText based logic should use multiplier leading of Paragraph).
- Clean up default margin properties for correct merging of paragraph css styles and para element attributes.
- Separate method for applying of font dependent CSS styles.
- Fixed uppercase/lowercase problems (using equalsIgnoreCase() and introducing the method CssUtils.stripDoubleSpacesTrimAndToLowerCase())
- Fixed RuntimeExcpetion - better handling of invalid nested tags
- Improved parsing of processing instructions.
- FontFactory versus FontProvider: introduced XMLWorkerFontProvider
- Introduction of the class LineHeightCalculator.
- Introduction of "valign" and "align" in the HTML class.
- Fixed several alignment, row height and line size issues in table cells.
- Fixed issues with font styles that weren't applied correctly.
- Fixed page break issues.
- ColumnText does not support resizing of image height. If a cell has a fixed height, an image with larger height disappeared from this cell.
- Fixed white space issues.
- Several code optimalisations.
Changes made by Bruno Lowagie
- Introduced experimental code to parse SVG to a PdfTemplate. This works for tiger.svg, but it certainly doesn't work for all SVG files yet. This is a code contribution by VVB who wants to remain anonymous. The code was slightly adapted by Bruno.
- Removed dependencies on java.awt.
- When a tag like this is encountered: : XML Worker tries to make an absolute value of the width. This has now been fixed.