Expert mode

The pdf2Data Editor allows you to use the expert mode for selectors; so-called because it gives you extra flexibility, but also requires extended knowledge to build an extraction pipeline.

Prerequisites

We assume you know how to edit the data field in the expert mode.

Expert mode selectors

There are a few pdf2Data selectors which are exclusively available in the expert mode.

Table frequency selector

Keyword: tableFreq: selectCell=1;2, selectRow=1:2, selectColumn=2:2

Uses text frequency analysis to detect table cells and might work better than the default Table selector for borderless tables.

The properties selectCell, selectRow, selectColumn are optional, and specify the row and column numbers (or ranges using start:end syntax), if only a part of the table needs to be extracted.

Grouping

Grouping is used to structure the XML output by combining the detected data fields into groups.

Keyword: groupByTb: FIELD_NAME
FIELD_NAME is a name of any other field in the template

This selector results in all instances of the current data field being placed inside the preceding (vertically top to bottom) data field FIELD_NAME.

info

Please see the article about that to know more.

Font size selector (expert)

Keyword: fontSize: minSize=X, maxSize=Y

Unlike the standard Font size selector, it selects all characters with a font size between X and Y. If minSize and maxSize parameters are present, and the font size of the text inside the field region is ignored.

All pdf2Data selectors can be used in expert mode with special keywords, and some of them also allow you to specify additional parameters that affect accuracy. Please see a particular selector page to get insight on how to use it in expert mode.