Skip to main content
Skip table of contents

Content parsing, extraction and redaction of text

iText can parse PDFs to extract the content of a page. As there are many different ways to create a PDF file, and as the text on a page usually isn't more than a bunch of characters drawn on a page, it's not trivial to extract text correctly.

How to extract text and anchor information from a PDF?

How to read text from a specific position?

Why is the text I extract from an English PDF page garbled?

Why can't I extract text added using a Type3 font correctly from a PDF?

How to use a text extraction strategy after applying a location extraction strategy?

How to remove text from a PDF?

How to create and apply redactions?

How to get the co-ordinates of an image?

What are the extra characters in the font name of my PDF?

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.