The pdf2Data SDK is a native Java (or .NET) application. Its primary function is to extract data from PDF files using predefined extraction rules.
The extracted data is output in XML format.
Installation
Java
The preferred way to set up pdf2Data in Java is to use a build system like Maven or Gradle and download pdf2Data artifacts from the iText Artifactory located at https://repo.itextsupport.com/pdf2data/
The groupId is com.itextpdf.pdf2data
, and the artifactId is pdf2data
In Maven, the configuration would look similar to the example below:
<repository>
<id>pdf2Data</id>
<name>pdf2Data Maven Repository</name>
<url>https://repo.itextsupport.com/pdf2data</url>
</repository>
<dependency>
<groupId>com.itextpdf.pdf2data</groupId>
<artifactId>pdf2data</artifactId>
<version>3.1.0</version>
</dependency>
.NET
For .NET pdf2Data is distributed as a NuGet package which is available at NuGet.org or at iText Artifactory.
You can browse for the desired NuGet package manually or install it with the Install-Package itext7.pdf2data
NuGet Package Manager command.
Integrating pdf2Data into your code
Below is an example of how pdf2Data can be used in code:
// Make sure to load license file before invoking any code
LicenseKey.loadLicenseFile(pathToLicenseFile);
// Parse template into an object that will be used later on
Template template = Pdf2DataExtractor.parseTemplateFromPDF(pathToPdfTemplate);
// Create an instance of Pdf2DataExtractor for the parsed template
Pdf2DataExtractor extractor = new Pdf2DataExtractor(template);
// Feed file to be parsed against the template. Can be called multiple times for different files
ParsingResult result = extractor.recognize(pathToFileToParse);
// Save result to XML or explore the ParsingResult object to fetch information programmatically
result.saveToXML(pathToOutXmlFile);
// Make sure to load license file before invoking any code
LicenseKey.LoadLicenseFile(pathToLicenseFile);
// Parse template into an object that will be used later on
Template template = Pdf2DataExtractor.ParseTemplateFromPDF(pathToPdfTemplate);
// Create an instance of Pdf2DataExtractor for the parsed template
Pdf2DataExtractor extractor = new Pdf2DataExtractor(template);
// Feed file to be parsed against the template. Can be called multiple times for different files
ParsingResult result = extractor.Recognize(pathToFileToParse);
// Save result to XML or explore the ParsingResult object to fetch information programmatically
result.SaveToXML(pathToOutXmlFile);