pdfHTML: Using emojis in iText
Introduction
From their humble beginnings in 1999, emojis have become a staple of digital communication and most document and communication formats support them in one form or another. For us this means of course that they are also supported in the PDF format.
Emojis may give you the impression that they are small images in a traditional sense, but they are actually more closely related to character or symbol glyphs than images: You can select, copy or paste them, adjust their size and more. This also means that they can be represented as Unicode codepoints. For example, the grinning emoji can be represented as the following codepoint: U+1F603
For us this is very convenient because it means we can use escape sequences in our programming language of choice to add these emojis to our document, even though we cannot add the emojis directly. The only consideration we have to keep in mind when adding emojis to the document is the same as when adding non-Roman characters (such as Chinese, Greek or Hindi for example): We need a font program that is able to draw these characters, as they are not included into the PDF document standard fonts.
How it works
When converting HTML files to PDF documents the process to include emojis is simple and straightforward; we need to add the font with the emojis to a FontProvider so it can be provided to the HtmlConverter during conversion. When creating PDF documents directly however, we find out that codepoints made up of more than one byte are not allowed to be directly escaped, so in Java we will need the help of a helper method that splits the code point into 2 escaped characters (surrogate pairs).
Below you can find a sample that shows both approaches: the #fromHtml()
method shows conversion from a HTML file, while the #createEmojiDocument()
method creates a PDF document directly and uses a possible implementation for a helper method to add emojis.
In the link included below you’ll find more information related to what Unicode values are and how they are used: