pdf2Data 4.0 Command Line Interface
pdf2Data Command Line Interface allows extracting data from PDF files from the command line. The output format for data extraction is XML or JSON
To start PDF data capturing, you need to download the CLI application from the iText Artifactory.
Basically, you don`t need to configure your environment specifically, as long as you have Java 8, you can use pdf2Data CLI from the command line.
The steps are similar to the ones you would typically do in code.
Process pdf2Date 4.0 template
java -jar cli.jar preprocess -s template.p2dta -d template.p2d
PDF to XML parsing
java -jar cli.jar parse -t template.p2d -s file_for_parsing.pdf -p recognized.pdf -x recognized.xml -l license.json
PDF to JSON parsing
java -jar cli.jar parse -t template.p2d -s file_for_parsing.pdf -p recognized.pdf -j recognized.json -l license.json
Help information
java -jar cli.jar -h
java -jar cli.jar --help
java -jar cli.jar preprocess -h
java -jar cli.jar preprocess --help
java -jar cli.jar parse -h
java -jar cli.jar parse --help