pdf2Data SDK
The pdf2Data SDK is a native Java (or .NET) application. Its primary function is to extract data from PDF files using predefined extraction rules.
The extracted data is output in either JSON or XML format.
Installation
Java
The preferred way to set up iText pdf2Data in Java is to use a build system like Maven or Gradle and download pdf2Data artifacts from the iText Artifactory located at https://repo.itextsupport.com/pdf2data/
The groupId is com.itextpdf.pdf2data
, and the artifactId is pdf2data
In Maven, the configuration would look similar to the example below:
Maven
<repository>
<id>pdf2Data</id>
<name>pdf2Data Maven Repository</name>
<url>https://repo.itextsupport.com/pdf2data</url>
</repository>
<dependency>
<groupId>com.itextpdf.pdf2data</groupId>
<artifactId>pdf2data</artifactId>
<version>4.0.0</version>
</dependency>
.NET
For .NET iText pdf2Data is distributed as a NuGet package which is available at NuGet.org or at iText Artifactory.
You can browse for the desired NuGet package manually or install it with the Install-Package itext7.pdf2data
NuGet Package Manager command.
Integrating pdf2Data into your code
As from iText pdf2Data 4.0 the format of extraction templates has been changed, compared to iText pdf2Data 3.*. Please see the Migration guide to get to know more
With the pdf2Data Manager in iText pdf2Data 4.0, you can download templates optimized for use in the pdf2Data SDK, so you can extract data in two lines of code.
Extraction (Java)
Make sure to load the license file before invoking any code
LicenseKey.loadLicenseFile(pathToLicenseFile);
The initialization of the Pdf2DataExtractor
instance from a processed template should now be done with one function call:
Pdf2DataExtractor extractor = Pdf2DataExtractor.create(new File(P2D_TEMPLATE_PATH));
Parse PDF using the extraction template
Perform extraction
ParsingResult result = extractor.recognize(new File(P2D_TEMPLATE_PATH));
You can use extracted values directly from the result or save them in one of two structured formats
Save to XML
result.saveToXml(new File(RESULT_XML_PATH));
Save to JSON
result.saveToJson(new File(RESULT_JSON_PATH));