I used the following code to get data in PDF from a particular location.

Rectangle rect = new Rectangle(0,0,250,250);
RenderFilter filter = new RegiontextRenderFilter(rect);
fontBasedTextExtractionStrategy strategy = new fontBasedTextExtractionStrategy();
strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter); //Throws Error.

I want to get the bold text present in that location. Would creating a new method or class called FontBasedTextExtractionStrategy instead of a simple TextExtractionStrategy help?

Posted on StackOverflow on Jul 1, 2014 by Raka

Please take a look at the ParseCustom example for iText 7. In this example, we create a custom TextRegionEventFilter (not ITextExtractionStrategy):

protected class CustomFontFilter extends TextRegionEventFilter { public CustomFontFilter(Rectangle filterRect) { super(filterRect); } @Override public boolean accept(IEventData data, EventType type) { if (type.equals(EventType.RENDER_TEXT)) { TextRenderInfo renderInfo = (TextRenderInfo) data; PdfFont font = renderInfo.getFont(); if (null != font) { String fontName = font.getFontProgram().getFontNames().getFontName(); return fontName.endsWith("Bold") || fontName.endsWith("Oblique"); } } return false; } }
protected class CustomFontFilter : TextRegionEventFilter { public CustomFontFilter(Rectangle filterRect):base(filterRect) { } public override bool Accept(IEventData data, EventType type) { if (type.Equals(EventType.RENDER_TEXT)) { TextRenderInfo renderInfo = (TextRenderInfo) data; PdfFont font = renderInfo.GetFont(); if (null != font) { string fontname = font.GetFontProgram().GetFontNames().GetFontName(); return fontname.EndsWith("Bold") || fontname.EndsWith("Oblique"); } } return false; } }

This will filter only the text where the PostScript font name ends with Bold or Oblique.

This is how you use this filter:

protected void manipulatePdf(byte[] bytes) throws IOException { PdfDocument pdfDoc = new PdfDocument(new PdfReader(new ByteArrayInputStream(bytes))); Rectangle rect = new Rectangle(36, 750, 523, 56); CustomFontFilter fontFilter = new CustomFontFilter(rect); FilteredEventListener listener = new FilteredEventListener(); // Create a text extraction renderer LocationTextExtractionStrategy extractionStrategy = listener .attachEventListener(new LocationTextExtractionStrategy(), fontFilter); // Note: If you want to re-use the PdfCanvasProcessor, you must call PdfCanvasProcessor.reset() PdfCanvasProcessor parser = new PdfCanvasProcessor(listener); parser.processPageContent(pdfDoc.getFirstPage()); // Get the resultant text after applying the custom filter String actualText = extractionStrategy.getResultantText(); pdfDoc.close(); }
public void manipulatePdf(byte[] bytes) { PdfDocument pdf = new PdfDocument(new PdfReader(new MemoryStream(bytes))); Rectangle rect = new Rectangle(100, 100, 200, 200); CustomFontFilter fontFilter = new CustomFontFilter(rect); FilteredEventListener listener = new FilteredEventListener(); // Create a text extraction renderer LocationTextExtractionStrategy strat = listener.AttachEventListener(new LocationTextExtractionStrategy(),fontFilter); // Note: If you want to re-use the PdfCanvasProcessor, you must call PdfCanvasProcessor.reset() PdfCanvasProcessor parser = new PdfCanvasProcessor(listener); parser.ProcessPageContent(pdf.GetFirstPage()); // Get the resultant text after applying the custom filter String actualText = strat.GetResultantText(); Console.Out.WriteLine(actualText); pdf.Close(); }

As you can see, we create a LocationTextExtractionStrategy that takes our self-made filter based on the font. To extract text we use processPageContent().

Click this link if you want to see how to answer this question in iText 5.