How to create a PDF with font information and embed the actual font while merging the files into a single PDF?
I create different PDFs and then concatenate them into a single PDF. My resulting PDF is a lot bigger than I had expected in file size. As it turns out, my PDF has a ton of duplicate fonts, and this is the reason why it's so big. I would like to create PDFs which only embed font information, not the full font. Then when I merge these PDFs into a single document, I want to insert actual font needed by the PDF.
Posted on StackOverflow on Feb 24, 2014 by pixerce
I've created the MergeAndAddFont example to explain the different options.
We'll create PDFs using this code snippet:
public void createPdf(String filename, String text, boolean embedded, boolean subset)
throws DocumentException, IOException {
// step 1
Document document = new Document();
// step 2
PdfWriter.getInstance(document, new FileOutputStream(filename));
// step 3
document.open();
// step 4
BaseFont bf = BaseFont.createFont(FONT, BaseFont.WINANSI, embedded);
bf.setSubset(subset);
Font font = new Font(bf, 12);
document.add(new Paragraph(text, font));
// step 5
document.close();
}
We use this code to create 3 test files, 1, 2, 3 and we'll do this 3 times: A, B, C.
The first time, we use the parameters embedded = true
and subset = true
, resulting in the files testA1.pdf with text "abcdefgh"
(3.71 KB), testA2.pdf with text "ijklmnopq"
(3.49 KB) and testA3.pdf with text "rstuvwxyz"
(3.55 KB). The font is embedded and the file size is relatively low because we only embed a subset of the font.
Now we merge these files using the following code, using the smart
parameter to indicate whether we want to use PdfCopy
or PdfSmartCopy
:
public void mergeFiles(String[] files, String result, boolean smart)
throws IOException, DocumentException {
Document document = new Document();
PdfCopy copy;
if (smart)
copy = new PdfSmartCopy(document, new FileOutputStream(result));
else
copy = new PdfCopy(document, new FileOutputStream(result));
document.open();
PdfReader[] reader = new PdfReader[3];
for (int i = 0; i
When we merge the document, be it with PdfCopy
or PdfSmartCopy
, the different subsets of the same font will be copied as separate objects in the resulting PDF testA_merged1.pdf / testA_merged2.pdf (both 9.75 KB).
This is the problem you are experiencing: PdfSmartCopy
can detect and reuse identical objects, but the different subsets of the same font aren't identical and iText can't merge different subsets of the same font into one font.
The second time, we use the parameters embedded = true
and subset = false
, resulting in the files testB1.pdf (21.38 KB), testB2.pdf (21.38 KB) and testA3.pdf (21.38 KB). The font is fully embedded and the file size of a single file is a lot bigger than before because the full font is embedded.
If we merge the files using PdfCopy
, the font will be present in the merged document redundantly, resulting in the bloated file testB_merged1.pdf (63.16 KB). This is definitely not what you want!
However, if we use PdfSmartCopy
, iText detects an identical font stream and reuses it, resulting in testB_merged2.pdf (21.95 KB) which is much smaller than we had with PdfCopy
. It's still bigger than the document with the subsetted fonts, but if you're concatenating a huge amount of files, the result will be better if you embed the complete font.
The third time, we use the parameters embedded = false
and subset = false
, resulting in the files testC1.pdf (2.04 KB), testC2.pdf (2.04 KB) and testC3.pdf (2.04 KB). The font isn't embedded, resulting in an excellent file size, but if you compare with one of the previous results, you'll see that the font looks completely different.
We merge the files using PdfSmartCopy
, resulting in testC_merged1.pdf (2.6 KB). Again, we have an excellent file size, but again we have the problem that the font isn't visualized correctly.
To fix this, we need to embed the font:
private void embedFont(String merged, String fontfile, String result)
throws IOException, DocumentException {
// the font file
RandomAccessFile raf = new RandomAccessFile(fontfile, "r");
byte fontbytes[] = new byte[(int)raf.length()];
raf.readFully(fontbytes);
raf.close();
// create a new stream for the font file
PdfStream stream = new PdfStream(fontbytes);
stream.flateCompress();
stream.put(PdfName.LENGTH1, new PdfNumber(fontbytes.length));
// create a reader object
PdfReader reader = new PdfReader(merged);
int n = reader.getXrefSize();
PdfObject object;
PdfDictionary font;
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(result));
PdfName fontname = new PdfName(
BaseFont.createFont(fontfile, BaseFont.WINANSI, BaseFont.NOT_EMBEDDED)
.getPostscriptFontName());
for (int i = 0; i
Now, we have the file testC_merged2.pdf (22.03 KB) and that's actually the answer to your question. As you can see, the second option is better than this third option.
Caveats: This example uses the Gravitas One font as a simple font. As soon as you use the font as a composite font (you tell iText to use it as a composite font by choosing the encoding IDENTITY-H
or IDENTITY-V
), you can no longer choose whether or not to embed the font, whether or not to subset the font. As defined in ISO-32000-1, iText will always embed composite fonts and will always subset them.
This means that you can't use the above solutions when you need special fonts (Chinese, Japanese, Korean). In that case, you shouldn't embed the fonts, but use so-called CJK fonts. They CJK fonts will use font packs that can be downloaded by Adobe Reader.