Installing iText pdfOCR for Java developers

How to install pdfOCR Java version

Thank you for your interest in our OCR add-on pdfOCR, we hope you will enjoy using our product and share your experiences with us and the iText community. We will walk you through the installation process, from downloading iText pdfOCR to adding the dependency to your Java build tool.

If you require any extra help please have a look at our FAQs or the community discussion at StackOverflow. If you are interested in getting support from our in-house developers and/or a license key for commercial iText products, you will need to acquire a commercial license.

Before you install

If you want to use pdfOCR for non-commercial purposes, make sure you have read and agreed upon the AGPL license. All downloads we offer open-sourcce come with the AGPL license model.
If you want to use pdfOCR for commercial purposes, make sure you have purchased a commercial license for iText Core. All downloads we offer closed-source come with our commercial license model.
For closed-source pdfOCR installation, download and install the proper license key library, you can find the installation guide here (you will need at least version 3.1.1 of the library).
Check the compatibility matrix to ensure the version you specify when adding the add-on's dependency matches the version of iText Core you have a license for.
Download the modules (.jar) of iText Core/Community and pdfOCR (ZIP files) from Maven Central or the iText Artifactory Server.
Install iText Core or Community, you can find the installation guide here.
Important remark: in the installation guide we use Maven as build tool for Java.
You will need Tesseract's training data, which you can get here.

Installation

Using the Central Repository

iText pdfOCR is available via Maven on The Central repository. Simply add iText pdfOCR as a dependency to your pom.xml:

Using the iText Artifactory Server

iText pdfOCR is also available on the iText Artifactory server. Here you can also find the license key library, and pdfOCR add-on - you require an additional license key if you want to use pdfOCR closed-source (commercial purposes).

You can add this server as an additional Maven repository in the repositories section of your pom.xml or settings.xml, as described in the Maven documentation. Maven will then automatically query this repository for the add-on .jar files.

You can also browse the iText Artifactory server and download jars manually.

1. Add repository to .pom project file

XML

<!-- All add-ons and iText Core-->
<repositories>
  <repository>
    <id>itext</id>
    <name>iText Repository - releases</name>
    <url>https://repo.itextsupport.com/releases</url>
  </repository>
</repositories>

2. Add the pdfOCR dependencies to .pom project file

XML

<properties>
       <itext.pdfocr.version>$release-pdfOCR-variable</itext.pdfocr.version>
</properties>

<dependencies>
  <dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>pdfocr-tesseract4</artifactId>
    <version>${itext.pdfocr.version}</version>
  </dependency>
</dependencies>

iText pdfOCR Java on GitHub

The source code is available on GitHub.

You can download the modules (.jar) of iText pdfOCR in ZIP files from Maven Central: pdfOCR.