Skip to main content
Skip table of contents

What are the extra characters in the font name of my PDF?

Why does it happen only for a few PDF file?

While extracting font name from PDF, I get some junk characters followed by plus sign and then the font name with font style. I want to remove the junk characters. I get those junk characters only for a few PDF file, for example: MMLPEO+RemingtonNoiseless

 

string curFont = renderInfo.GetFont().PostscriptFontName;

 

Posted on StackOverflow on May 16, 2013 by pdp

The "junk" characters indicate that the font isn't embedded completely. You'll find names such as ABC123+RemingtonNoiseless, XYZ456+RemingtonNoiseless, etc... meaning that there may be different subsets of the same font inside the PDF.

For an explanation have a look at section 9.6.4 Font Subsets of the PDF specification ISO 32000-1:2008:

For a font subset, the PostScript name of the font — the value of the font’s BaseFont entry and the font descriptor’s FontName entry — shall begin with a tag followed by a plus sign (+). The tag shall consist of exactly six uppercase letters; the choice of letters is arbitrary, but different subsets in the same PDF file shall have different tags.

EXAMPLE EOODIA+Poetica is the name of a subset of Poetica®, a Type 1 font.

In other words: these characters aren't merely "junk". If you want to remove them, that's a no-brainer, just use the appropriate string manipulation method, but be aware that removing them throws away information that may be useful in some contexts.

In iText 7 to get the PostScript font name you’ll need:

font.getFontProgram().getFontNames().getFontName();

Where font is a PdfFont instance.

Click What are the extra characters in the font name of my PDF? if you want to see how to answer this question in iText 5.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.