Skip to main content
Skip table of contents

Using pdf2Data CLI with PHP

In addition to the Java and .NET SDKs for pdf2Data, we also provide a CLI version. The pdf2Data CLI offers an interoperable way of extracting data from PDF files using command line instructions, meaning you can execute the CLI from any programming language.

One such example is PHP, which is a widely-used general-purpose scripting language. PHP (which is a recursive initialism for PHP: Hypertext Preprocessor) allows you to execute an external program with the exec function, and so it’s a pretty simple task to call the pdf2Data CLI using it.

The following examples show how you can execute pdf2Data CLI using the PHP exec function.

First, we need to create an XML template from the template.p2dta containing your defined extraction rules.

PHP
<?php
/* Generate an xml template file from PDF template file template.pdf*/
exec('java -jar cli.jar preprocess -s template.p2dta -d template.p2d');
?>

Then you can extract data from the PDF file file_for_parsing.pdf with the help of the template you generated above.
The extracted data can be saved as XML.

PHP
<?php
/*Extract data from PDF file input.pdf and save it into output.xml*/
exec('java -jar cli.jar parse -t template.p2d -s file_for_parsing.pdf -p recognized.pdf -x recognized.xml -l license.json');
?>

or as JSON 

PHP
<?php
/*Extract data from PDF file input.pdf and save it into output.xml*/
exec('java -jar cli.jar parse -t template.p2d -s file_for_parsing.pdf -p recognized.pdf -j recognized.json -l license.json');
?>
For details on the requirements for using pdf2Data CLI see Requirements and Prerequisites, and you should also see Getting started with iText pdf2Data if you are unfamiliar with how it works.
JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.