pdf2Data Command Line Interface
pdf2Data Command Line Interface allows extracting data from PDF files from the command line. The output format for data extraction is XML
To start PDF data capturing, you need to download the CLI application from the iText Artifactory.
Basically, you don`t need to configure your environment specifically, as long as you have Java 8, you can use pdf2Data CLI from the command line.
The steps are similar to the ones you would typically do in code.
Creating template entity from a template PDF
java -jar cli.jar preprocess -t template.pdf -x template.xml -l license.json
File recognition
java -jar cli.jar parse -t template.xml -s file_for_parsing.pdf -p recognized.pdf -x recognized.xml -l license.json
Help information
java -jar cli.jar help preprocess
java -jar cli.jar help parse