Skip to main content
Skip table of contents

How to use a text extraction strategy after applying a location extraction strategy?

I used the following code to get data in PDF from a particular location.

Rectangle rect = new Rectangle(0,0,250,250); RenderFilter filter = new RegiontextRenderFilter(rect); fontBasedTextExtractionStrategy strategy = new fontBasedTextExtractionStrategy(); strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter); //Throws Error.

I want to get the bold text present in that location. Would creating a new method or class called FontBasedTextExtractionStrategy instead of a simple TextExtractionStrategy help?

Posted on StackOverflow on Jul 1, 2014 by Raka


Please take a look at the ParseCustom example. In this example, we create a custom RenderFilter (not a TextExtractionStrategy):

JAVA
class FontRenderFilter extends RenderFilter {
    public boolean allowText(TextRenderInfo renderInfo) {
        String font = renderInfo.getFont().getPostscriptFontName();
        return font.endsWith("Bold") || font.endsWith("Oblique");
    }
}

This text will filter all text so that only text of which the Postscript font name ends with Bold or Oblique.

This is how you use this filter:

JAVA
public void parse(String filename) throws IOException {
    PdfReader reader = new PdfReader(filename);
    Rectangle rect = new Rectangle(36, 750, 559, 806);
    RenderFilter regionFilter = new RegionTextRenderFilter(rect);
    FontRenderFilter fontFilter = new FontRenderFilter();
    TextExtractionStrategy strategy = new FilteredTextRenderListener(
            new LocationTextExtractionStrategy(), regionFilter, fontFilter);
    System.out.println(PdfTextExtractor.getTextFromPage(reader, 1, strategy));
    reader.close();
}

As you can see, we create a FilteredTextRenderListener that takes two filters, a RegionTextRenderFilter and our self-made filter based on the font.

Click this link if you want to see how to answer this question in iText 7.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.