I used the following code to get data in PDF from a particular location.

Rectangle rect = new Rectangle(0,0,250,250);
RenderFilter filter = new RegiontextRenderFilter(rect);
fontBasedTextExtractionStrategy strategy = new fontBasedTextExtractionStrategy();
strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter); //Throws Error.

I want to get the bold text present in that location. Would creating a new method or class called FontBasedTextExtractionStrategy instead of a simple TextExtractionStrategy help?

Posted on StackOverflow on Jul 1, 2014 by Raka

Please take a look at the ParseCustom example for iText 7. In this example, we create a custom TextRegionEventFilter (not ITextExtractionStrategy):

protected class CustomFontFilter extends TextRegionEventFilter {
        public CustomFontFilter(Rectangle filterRect) {
            super(filterRect);
        }

        @Override
        public boolean accept(IEventData data, EventType type) {
            if (type.equals(EventType.RENDER_TEXT)) {
                TextRenderInfo renderInfo = (TextRenderInfo) data;
                PdfFont font = renderInfo.getFont();
                if (null != font) {
                    String fontName = font.getFontProgram().getFontNames().getFontName();
                    return fontName.endsWith("Bold") || fontName.endsWith("Oblique");
                }
            }

            return false;
        }
    }

This text will filter all text so that only text of which the Postscript font name ends with Bold or Oblique.

This is how you use this filter:

protected void manipulatePdf(String dest) throws IOException {
        PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC));

        Rectangle rect = new Rectangle(36, 750, 523, 56);
        CustomFontFilter fontFilter = new CustomFontFilter(rect);
        FilteredEventListener listener = new FilteredEventListener();

        // Create a text extraction renderer
        LocationTextExtractionStrategy extractionStrategy = listener
                .attachEventListener(new LocationTextExtractionStrategy(), fontFilter);

        // Note: If you want to re-use the PdfCanvasProcessor, you must call PdfCanvasProcessor.reset()
        PdfCanvasProcessor parser = new PdfCanvasProcessor(listener);
        parser.processPageContent(pdfDoc.getFirstPage());

        // Get the resultant text after applying the custom filter
        String actualText = extractionStrategy.getResultantText();

        pdfDoc.close();

As you can see, we create a LocationTextExtractionStrategy that takes our self-made filter based on the font. To extract text we use processPageContent().

Click this link if you want to see how to answer this question in iText 5.