Skip to main content
Skip table of contents

Release iText pdf2Data 4.0

                                                                                                                    

Introduction

iText pdf2Data is our user-friendly template-based data extraction solution.

It is a crucial part of your digital document workflow. iText pdf2Data helps you to unlock and reuse data from PDF files with perfect accuracy, across many different domains from logistics to finance.

At the same time, it makes it simple to define and manage extraction rules and templates, thanks to user-friendly components which are designed for non-technical users.

We are proud to introduce a new major release of pdf2Data, iText pdf2Data 4.0.

Breaking changes

pdf2Data Manager

In this release, we are introducing a management component for iText pdf2Data – the pdf2Data Manager.

It acts not only as a centralized storage for all your templates but also provides:

  • user access control
  • management of multiple workspaces
  • replacement of reference PDFs
  • easy extraction template creation from blueprints, and
  • parsing adjustments for existing templates

To allow iText pdf2Data to support all this, we are moving to a new more flexible and reusable format for extraction templates. You won’t need to recreate your existing templates though, since the pdf2Data Manager also includes a converter tool. This will enable you to import and convert your legacy templates into the new format.

pdf2Data Editor

The new pdf2Data Manager is natively integrated with our existing pdf2Data Editor, which also gets some improvements.

As you might expect, user-friendliness and a great user experience is key for this component. Despite this, in previous versions, you sometimes needed to use the expert mode and be familiar with the specific extraction language of pdf2Data to get the most out of it.

That's not the case anymore…

From now on, all extraction functionality is entirely available in the UI. The expert mode still exists though, so you can continue to use it if you want. However, you now also get the benefit of the new more convenient syntax. 

pdf2Data SDK

The SDK is the key component that handles the job of extracting your data. It is usually hidden from users, but not for developers.

We’ve been thinking for a while about API improvements, so developers can read less documentation in order to integrate it into workflows. Since this release is a major one, we’ve introduced a number of API changes for the pdf2Data SDK. Overall, they make the API clearer and more consistent.

Extraction

Another key part of iText pdf2Data is the SDK’s extraction algorithms. These are custom-built to deal with document elements such as tables, paragraphs, dates, etc. We are working on adding to and improving these all the time, and this release is no exception.

In a nutshell:

Table extraction gained improved merging strategies, for tables that span multiple pages. Error messages became clearer, so more useful for debugging. The overall extraction process became more stable, reducing the chance of exceptions leading to problems.

Of course, the SDK also fully supports the new template format.

For more details, please see the "migration guide"


iText pdf2Data 4.0 SDK Java (Artifactory), .NET (Artifactory, NuGet
iText pdf2Data 4.0 Editor

Pull image from Docker Hub (instructions 

iText pdf2Data 4.0 CLIJar file download


Breaking changes

Improvements & New Features

pdf2Data Manager

  • Multiple workspaces
  • Authorization 
  • Access control is based on roles and workspace 
  • Creation of template from blueprints
  • Search by template name and metadata
  • A reference PDF in a template can be replaced

pdf2Data Editor

  • Grouping selector 
  • Multiline regular expressions
  • Improved result preview
  • Client-side validation of selectors  

pdf2Data SDK

  • Improved extraction of large tables which span multiple pages 
  • Standardization of the extraction API
  • Extended support of File and Stream classes in API
  • Support of pdf2Data 4.0 Template formats (.p2d, .p2dta)  
JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.