Skip to main content
Skip table of contents

Regular expression selector

The regular expression selector is the most powerful selector in iText pdf2Data's toolbox. Unsurprisingly then, it is also the least user-friendly selector.

It implements the standard regular expression search, and accordingly requires knowledge of RegExp syntax from a user.



This selector has only one mandatory parameter - Pattern, that contains a regular expression to be found in a PDF.
The regular expressions may also contain groups defined within round brackets. In this case, only the string captured by the group within brackets will be extracted.

Example
Pattern:  Invoice\s+(\d{3}) returns a 3-digit number that appears after the word "Invoice", this number should be separated from "Invoice" by one or more spaces.

Most of the data you require from a PDF can be extracted without this selector, please see the tutorial for example usage. However, if you feel passionate about rexExps, you don`t need anything but the regular expression selector for data extraction.

You can specify two-line regular expression, however in the majority of cases this can be replaced by using the Paragraph selector


Output data format: 

lines

List of selectors

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.