How to convert colored images to black and white?
I'm trying to compress PDFs using iTextSharp. There are a lot of pages with color images stored as JPEGs (DCTDECODE)... so I'm converting them to black and white PNGs and replacing them in the document (the PNG is much smaller than a JPG for black and white format).
I've tried varieties of COLORSPACEs and BITSPERCOMPONENTs, but always get "Insufficient data for an image", "Out of memory", or "An error exists on this page" upon trying to open the resulting PDF... so I must be doing something wrong.
Posted on StackOverflow on Oct 27, 2014 by Jeff
The Question:
You have a PDF with a colored JPG. For instance: image.pdf
If you look inside this PDF, you'll see that the filter of the image stream is /DCTDecode
and the color space is /DeviceRGB
.
Now you want to replace the image in the PDF, so that the result looks like this: image_replaced.pdf
In this PDF, the filter is /FlateDecode
and the color space is change to /DeviceGray
.
In the conversion process, you want to user a PNG format.
The Example:
I have prepared an example for you that makes this conversion: ReplaceImage
I will explain this example step by step:
Step 1: finding the image
In my example, I know that there's only one image, so I'm retrieving the PdfStream
with the image dictionary and the image bytes in a quick and dirty way.
PdfDocument pdfDoc = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
PdfDictionary page = pdfDoc.getFirstPage().getPdfObject();
PdfDictionary resources = page.getAsDictionary(PdfName.Resources);
PdfDictionary xobjects = resources.getAsDictionary(PdfName.XObject);
PdfName imgRef = xobjects.keySet().iterator().next();
PdfStream stream = xobjects.getAsStream(imgRef);
I go to the /XObject
dictionary with the /Resources
listed in the page dictionary of page 1. I take the first XObject I encounter, assuming that it is an image, and I get that image as a PdfStream
object.
The code you shared is better than mine, but this part of the code isn't relevant to your question and it works in the context of my example, so let's ignore the fact that this won't work for other PDFs. What you really care about are steps 2 and 3.
Step 2: converting the colored JPG into a black and white PNG
Let's write a method that takes a PdfImageXObject
and that converts it into an Image
object that is changed into gray colors and stored as a PNG:
public static Image makeBlackAndWhitePng(PdfImageXObject image) throws IOException {
BufferedImage bi = image.getBufferedImage();
BufferedImage newBi = new BufferedImage(bi.getWidth(), bi.getHeight(), BufferedImage.TYPE_USHORT_GRAY);
newBi.getGraphics().drawImage(bi, 0, 0, null);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(newBi, "png", baos);
return new Image(ImageDataFactory.create(baos.toByteArray()));
}
We convert the original image into a black and white image using standard BufferedImage
manipulations: we draw the original image bi
to a new image newBi
of type TYPE_USHORT_GRAY
.
Once this is done, you want the image bytes in the PNG format. This is also done using standard ImageIO
functionality: we just write the BufferedImage
to a byte array telling ImageIO
that we want "png"
.
We can use the resulting bytes to create an Image
object.
Image img = makeBlackAndWhitePng(new PdfImageXObject(stream));
Now we have an iText Image
object, but please note that the image bytes as stored in this Image
object are no longer in the PNG format. As already mentioned in the comments, PNG is not supported in PDF. iText will change the image bytes into a format that is supported in PDF
Step 3: replacing the original image stream with the new image stream
We now have an Image
object, but what we really need is to replace the original image stream with a new one and we also need to adapt the image dictionary as /DCTDecode
will change into /FlateDecode
, /DeviceRGB
will change into /DeviceGray
, and the value of the /Length
will also be different. Let's write down the following method:
public static void replaceStream(PdfStream orig, PdfStream stream) throws IOException {
orig.clear();
orig.setData(stream.getBytes());
for (PdfName name : stream.keySet()) {
orig.put(name, stream.get(name));
}
}
The order in which you do things here is important. You don't want the setData()
method to tamper with the length and the filter.
Step 4: persisting the document after replacing the stream
I guess it's not hard to figure this part out:
replaceStream(stream, img.getXObject().getPdfObject());
pdfDoc.close();
Click How to convert colored images to black and white? if you want to see how to answer this question in iText 5.