The grouping selector was introduced back in version 2.1.9 of pdf2Data, though its functionality was extended with the addition of the JSON output support in iText pdf2Data 3.1.1.
The Grouping selector is a quite flexible yet powerful mechanism, which can be used in many use-cases. One of which we will demonstrate here.
To get the most out of this article you should have
- An understanding of how to build your own template and the pdf2Data Editor UI.
- Deployed the pdf2data Editor, or have access to our trial instance.
The grouping selector allows the grouping of a data field's values depending on its y coordinate related to the coordinates of another data field's values.
For now, the creation of this rule is available only in the Expert mode of the pdf2Data template editor, which is actually not as difficult as it sounds.
Create Data Fields
As you can see in the screenshot below we have two different invoices:
The extraction of the invoice numbers and their related total amounts is our goal here.
We will create 2 data fields (in the User mode):
The output will have the following structure for both JSON and XML
InvoiceNumber : <ListOfAllInvoiceNumbersExtracted>
TotalAmount : <ListOfAllTotalAmountsExtracted>
To be honest, the structure is not that useful for further processing.
Group Data Fields
A structure like the following, however:
<FirstInvoiceNumberExtracted> : <TotalAmount>,
<SecondInvoiceNumberExtracted> : <TotalAmount>
is much more useful from an integration perspective.
In our file, we can see that the Invoice word strictly separates the data field values, so we can use it as a value for the grouping selector.
We will create an auxiliary data field for a grouping of values, or you can use InvoiceNumber which gives you a slightly different output structure:
For both of our fields we need to add the Grouping selector and specify "Divider" as a grouping field.
And the output will be as follows: