Content parsing, extraction and redaction of text
iText can parse PDFs to extract the content of a page. As there are many different ways to create a PDF file, and as the text on a page usually isn't more than a bunch of characters drawn on a page, it's not trivial to extract text correctly.
How to extract text and anchor information from a PDF?
How to read text from a specific position?
Why is the text I extract from an English PDF page garbled?
Why can't I extract text added using a Type3 font correctly from a PDF?
How to use a text extraction strategy after applying a location extraction strategy?
How to remove text from a PDF?
How to create and apply redactions?
How to get the co-ordinates of an image?
What are the extra characters in the font name of my PDF?