Skip to main content
Skip table of contents

pdfOffice + pdfSweep Redaction Example

Background:

With the addition of pdfOffice to the iText 7 Suite portfolio comes a wide range of potential synergies and new frontiers for your company's document workflow, thanks to its capabilities for accurate conversion of MS Office documents to PDF.

In this example, we will demonstrate how a potential client may utilize pdfOffice + pdfSweep in order to convert a .docx file into a PDF, and then apply redaction to securely remove certain content. This is an especially relevant use-case for organizations dealing with documents containing personally identifiable information.

Code Snippet:

In the following code snippet, we take the first three paragraphs from Lewis Carroll's classic novel Alice's Adventures in Wonderland (also known as simply Alice in Wonderland) in a .docx as input. After using pdfOffice to convert our Microsoft Word document into a PDF, we then utilize PdfSweep to redact the name Alice throughout the document.

JAVA:

JAVA

JAVA
import com.itextpdf.kernel.colors.ColorConstants;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfReader;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.layout.Document;
import com.itextpdf.licensekey.LicenseKey;
import com.itextpdf.pdfcleanup.autosweep.ICleanupStrategy;
import com.itextpdf.pdfcleanup.autosweep.PdfAutoSweep;
import com.itextpdf.pdfcleanup.autosweep.RegexBasedCleanupStrategy;
import com.itextpdf.pdfoffice.OfficeConverter;

import java.io.*;
import java.util.regex.Pattern;

public class PdfOfficeRedactionExample {

    public static String DOC = "src/main/resources/Alice.docx";
    public static String PDF = "src/main/resources/Alice.pdf";
    public static String DEST = "output_redacted.pdf";

    public static void main(String args[]) throws IOException {
        LicenseKey.loadLicenseFile("src/main/resources/itext_trial.xml");

        OfficeConverter.convertOfficeDocumentToPdf(new FileInputStream(DOC), new FileOutputStream(PDF));

        PdfDocument pdfDoc = new PdfDocument(new PdfReader(PDF), new PdfWriter(new FileOutputStream(DEST)));
        Document doc = new Document(pdfDoc);

        ICleanupStrategy cleanupStrategy = new RegexBasedCleanupStrategy(Pattern.compile("Alice", Pattern.CASE_INSENSITIVE)).setRedactionColor(ColorConstants.PINK);
        PdfAutoSweep autoSweep = new PdfAutoSweep(cleanupStrategy);
        autoSweep.cleanUp(pdfDoc);

        doc.close();
    }
}


Resources:

Right-click the link below and select "Save link as..." to download

pdfOffice_pdfSweep.zip


JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.