Using pdf2Data CLI with PHP
In addition to the Java and .NET SDKs for pdf2Data, we also provide a CLI version. The pdf2Data CLI offers an interoperable way of extracting data from PDF files using command line instructions, meaning you can execute the CLI from any programming language.
One such example is PHP, which is a widely-used general-purpose scripting language. PHP (which is a recursive initialism for PHP: Hypertext Preprocessor) allows you to execute an external program with the exec function, and so it’s a pretty simple task to call the pdf2Data CLI using it.
The following examples show how you can execute pdf2Data CLI using the PHP exec function.
First, we need to create an XML template from the template.p2dta containing your defined extraction rules.
<?php
/* Generate an xml template file from PDF template file template.pdf*/
exec('java -jar cli.jar preprocess -s template.p2dta -d template.p2d');
?>
Then you can extract data from the PDF file file_for_parsing.pdf with the help of the template you generated above.
The extracted data can be saved as XML.
<?php
/*Extract data from PDF file input.pdf and save it into output.xml*/
exec('java -jar cli.jar parse -t template.p2d -s file_for_parsing.pdf -p recognized.pdf -x recognized.xml -l license.json');
?>
or as JSON
<?php
/*Extract data from PDF file input.pdf and save it into output.xml*/
exec('java -jar cli.jar parse -t template.p2d -s file_for_parsing.pdf -p recognized.pdf -j recognized.json -l license.json');
?>