Before we start: Overview of the classes and interfaces
When we talk about iText's basic building blocks, we refer to all classes that implement the IElement
interface. iText is originally written in Java, then ported to C#. Because of our experience with both programming languages, we've adopted the convenient habit - typical for C# developers - to start every name of an interface with the letter I
.
Figure 0.1 shows an overview of the relationship between IElement
and some other interfaces.
At the top of the hierarchy, we find the IPropertyContainer
interface. This interface defines methods to set, get, and delete properties. This interfaces has two direct subinterfaces: IElement
and IRenderer
. The IElement
interface will be implemented by objects such as Text
, Paragraph
and Table
. These are the objects that we'll add to a document, either directly or indirectly. The IRenderer
interface will be implemented by objects such as TextRenderer
, ParagraphRenderer
and TableRenderer
. These renderers are used internally by iText, but we can subclass them if we want to tweak the way an object is rendered.
The IElement
interface has two subinterfaces of its own. The ILeafElement
interface will be implemented by building blocks that can't contain any other elements. For instance: you can add a Text
or an Image
element to a Paragraph
object, but you can't add any object to a Text
or an Image
element. Text
and Image
implement the ILeafElement
interface to reflect this. Finally, there's the LargeElement
interface that allows you to render an object before you've finished adding all the content. It's implemented by the Table
class, which means that you add a table to a document before you've completed adding all the Cell
objects. By doing so, you can reduce the memory use: all the table content that can be rendered before the content of the table is completed, can be flushed from memory.
The IPropertyContainer
interface is implemented by the abstract ElementPropertyContainer
class. This class has three subclasses; see figure 0.2.
The Style
class is a container for all kinds of style attributes such as margins, paddings and rotation. It inherits style values such as widths, heights, colors, borders and alignments from the abstract ElementPropertyContainer
class.
The RootElement
class defines methods to add content, using either an add()
method or a showTextAligned()
method. The Document
object will add this content to a page. The Canvas
object doesn't know the concept of a page. It acts as a bridge between the high-level layout API and the low-level kernel API.
Figure 0.3 gives us an overview of the AbstractElement
implementations.
All classes derived from the AbstractElement
class implement the IElement
interface. Text
, Image
, Tab
and Link
also implement the ILeafElement
interface. The ILargeElement
interface is only implemented by the Table
class. The basic building blocks make it very easy for you to create tagged PDF.
Tagged PDF is a requirement for PDF/A level A, a standard for long-term preservation of documents, and for PDF/UA, an accessibility standard. A properly tagged PDF includes semantic information about all the relevant content.
An ordinary PDF can show a human reader content that is organized as a table. This table is rendered using a bunch of text snippets and lines. To a machine, the table isn't more than that: text positioned at arbitrary places, lines drawn at arbitrary places. A seeing person can detect rows and columns and understand which rows are actually header or footer rows and which rows are body rows. There is no simple way for a machine to do this. When a machine detects a text snippet, it doesn't know if that text snippet is part of a paragraph, part of a title, part of a cell, or part of something else. When a PDF is tagged, it contains a structure tree that allows a machine to understand the structure of the content. Some text will be marked as part of a cell in a header row, other text will be marked as the caption of the table. All real content will be tagged. Other content, such as lines between rows and columns, running headers, page numbers, will be marked as an artifact.
The semantic meaning of all elements that implement the IAccessibleElement
interface will be added to the resulting PDF if we define a PdfDocument
as a tagged PDF using the setTagged()
method. In that case, iText will create a structure tree so that a Table
is properly tagged as a table, a List
properly tagged as a list, and so on. This is true for the Text
, Link
, Image
, Paragraph
, Div
, List
, ListItem
, Table
, Cell
, and LineSeparator
objects. The Tab
or the AreaBreak
object don't implement the IAccessibleElement
interface because they objects don't have any real content. The whitespace they create doesn't have any semantic meaning.
In this tutorial, we won't create tagged PDF; iText will just render the content to the document using the appropriate IRenderer
implementation. Figure 0.4 shows an overview of the IRenderer
implementations.
When you compare figure 0.4 with 0.3, you'll discover that each AbstractElement
and each RootElement
has its corresponding renderer. We won't discuss figure 0.4 in much detail. The concept of renderers will become clear the moment we start making some examples.