Smarter Font Handling in iText pdfOptimizer: Consolidating Font Subsets

Intro

When it comes to optimizing PDF documents, fonts are often a big contributor to file size. Even text-heavy PDFs with little to no imagery can contain multiple font subsets that come from a single larger parent font, which causes the file size of the output document to increase.

PDF producers often embed subsetted fonts, which contain only the glyphs actually used in the document. In a lot of cases, this is efficient and saves space, but when the same font is subsetted multiple times (e.g., by different tools in the workflow), the document ends up containing multiple copies of overlapping font data, which unnecessarily bloats the size of the produced file.

What’s new?

The new optimization handler in pdfOptimizer 4.1.0 can now recognize when multiple subsets actually belong to the same font. Instead of storing each subset separately, it merges them into a single, unified font that the whole document can use.

This works in three main steps:

Smart detection: pdfOptimizer carefully analyzes fonts to determine when subsets are really the same, without relying only on font names (which can be misleading).
Consolidation: All the glyphs used across the different subsets are collected and merged into one clean font program.
Cleanup: The redundant font data is removed, and the document is updated to reference the unified font.

The result is a more size-efficient PDF that avoids duplicate font data, without changing how the text looks or behaves.

And because this new feature is built on the same optimization handler system already used by pdfOptimizer, it integrates smoothly into existing workflows by simply adding the FontSubsettingOptimizer (Java/.NET) to the optimization workflow with a single line of code.

How it works

To see the impact of this feature in action, we created a small test file, just 36 KB in size. The file contained four different font subsets, all originating from the same base font.

Previously, pdfOptimizer treated these subsets as separate, leaving the redundant font data in place. With the new consolidation feature, pdfOptimizer is able to detect that all four subsets actually belong to the same font, merge them into one, and clean up the duplicates.

The result? The 36 KB file was reduced by 45%, down to 20 KB, by simply consolidating the font subsets. And that’s with a relatively small, text-only PDF.

The code to achieve the consolidation of the subsetted fonts is shown below:

Java:

JAVA

    public void optimize(String inFile, String outFile) throws IOException{
        PdfOptimizer optimizer = new PdfOptimizer();
        //Add the new FontMergingOptimizer handler to the PdfOptimizer workflow
        optimizer.addOptimizationHandler(new FontMergingOptimizer());
        optimizer.optimize(new FileInputStream(inFile),new FileOutputStream(outFile));
    }

C#:

C#

            public void Optimize(string inFile, string outFile)
            {
                PdfOptimizer optimizer = new PdfOptimizer();
                //Add the new FontMergingOptimizer handler to the PdfOptimizer workflow
                optimizer.AddOptimizationHandler(new FontMergingOptimizer());
                optimizer.Optimize(new FileStream(inFile, FileMode.Open),new FileStream(outFile,FileMode.Create));
            }

Sample output

sample.pdf

sample_opt.pdf