How to use a text extraction strategy after applying a location extraction strategy?
I used the following code to get data in PDF from a particular location.
Rectangle rect = new Rectangle(0,0,250,250); RenderFilter filter = new RegiontextRenderFilter(rect); fontBasedTextExtractionStrategy strategy = new fontBasedTextExtractionStrategy(); strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter); //Throws Error.
I want to get the bold text present in that location. Would creating a new method or class called
FontBasedTextExtractionStrategy
instead of a simpleTextExtractionStrategy
help?Posted on StackOverflow on Jul 1, 2014 by Raka
Please take a look at the ParseCustom example. In this example, we create a custom RenderFilter
(not a TextExtractionStrategy
):
class FontRenderFilter extends RenderFilter {
public boolean allowText(TextRenderInfo renderInfo) {
String font = renderInfo.getFont().getPostscriptFontName();
return font.endsWith("Bold") || font.endsWith("Oblique");
}
}
This text will filter all text so that only text of which the Postscript font name ends with Bold or Oblique.
This is how you use this filter:
public void parse(String filename) throws IOException {
PdfReader reader = new PdfReader(filename);
Rectangle rect = new Rectangle(36, 750, 559, 806);
RenderFilter regionFilter = new RegionTextRenderFilter(rect);
FontRenderFilter fontFilter = new FontRenderFilter();
TextExtractionStrategy strategy = new FilteredTextRenderListener(
new LocationTextExtractionStrategy(), regionFilter, fontFilter);
System.out.println(PdfTextExtractor.getTextFromPage(reader, 1, strategy));
reader.close();
}
As you can see, we create a FilteredTextRenderListener
that takes two filters, a RegionTextRenderFilter
and our self-made filter based on the font.
Click this link if you want to see how to answer this question in iText 7.