Expert mode
The pdf2Data Editor allows you to use the expert mode for selectors; so-called because it gives you extra flexibility, but also requires extended knowledge to build an extraction pipeline.
Prerequisites
We assume you know how to edit the data field in the expert mode.
Expert mode selectors
There are a few pdf2Data selectors which are exclusively available in the expert mode.
Table frequency selector
- Keyword:
tableFreq: selectCell=1;2, selectRow=1:2, selectColumn=2:2
Uses text frequency analysis to detect table cells and might work better than the default Table selector
for borderless tables.
The properties selectCell
, selectRow
, selectColumn
are optional, and specify the row and column numbers (or ranges using start:end
syntax), if only a part of the table needs to be extracted.
Grouping
Grouping is used to structure the XML output by combining the detected data fields into groups.
- Keyword:
groupByTb: FIELD_NAME
FIELD_NAME
is a name of any other field in the template
This selector results in all instances of the current data field being placed inside the preceding (vertically top to bottom) data field FIELD_NAME
.
info
Please see the article about that to know more.
Font size selector (expert)
- Keyword:
fontSize: minSize=X, maxSize=Y
Unlike the standard Font size selector, it selects all characters with a font size between X
and Y
. If minSize
and maxSize
parameters are present, and the font size of the text inside the field region is ignored.
All pdf2Data selectors can be used in expert mode with special keywords, and some of them also allow you to specify additional parameters that affect accuracy. Please see a particular selector page to get insight on how to use it in expert mode.