This question is related to the
TextPositioning property, which you can read about it here.
In this situation where you have mixed content and it's unclear whether it would be better to use
BY_LINE, we would recommend using the
BY_WORDS strategy, as it would still allow you to group words into paragraphs without losing the words' boundaries.
In addition, since pdfOCR 1.0.1, you can also use
BY_WORDS_AND_LINES. This is similar to the
BY_WORDS mode, but the top and bottom of the word bounding box are inherited from the line.