How to get the UserUnit from a PDF file?
I have a bunch of PDF files that I read into a byte array one by one. I then pass these byte arrays to a PdfReader
instance. Now I want know the dimensions of each page in pixels. From what I've read so far it seems by PDF files work in points, a point being a configurable unit stored in some kind of dictionary in an element called /UserUnit
.
Loading my PDF file into a PdfReader
, what do I need to do to get this user unit for each page (apparently it can vary from page to page) so I can then get the page dimensions in pixels.
At present I have this code, which grabs the dimensions for each page in "points". I guess I just need the /UserUnit
value, and can then multiply these dimensions by that to get pixels or something similar.
PdfReader reader = new iTextSharp.text.pdf.PdfReader(file_content); for (int i = 1; i reader.NumberOfPages; i++) { Rectangle dim = reader.GetPageSize(i); int[] xy = new int[] { (int)dim.Width, (int)dim.Height }; page_data[objectid + '-' + i] = xy; }
Posted on StackOverflow on Jan 29, 2013 by Shawson
Allow me to quote from my book, iText in Action - Second Edition, page 9:
What is the measurement unit in PDF documents?
Most of the measurements in PDFs are expressed in user space units. ISO-32000-1 (section 8.3.2.3) tells us "the default for the size of the unit in default user space (1/72 inch) is approximately the same as a point (pt), a unit widely used in the printing industry. It is not exactly the same; there is no universal definition of a point."
In short, 1 in. = 25.4 mm = 72 user units (which roughly corresponds to 72 pt).
On the next page, I explain that it's possible to change the default value of the user unit, and I add an example on how to create a document with pages that have a different user unit.
Now for your question: suppose you have an existing PDF, how do you find which user unit was used? Before we answer this, we need to take a look at ISO-32000-1.
In section 7.7.3.3, entitled "Page Objects", you'll find the description of the /UserUnit
entry in Table 30, "Entries in a page object":
(Optional; PDF 1.6) A positive number that shall give the size of default user space units, in multiples of 1/72 inch. The range of supported values shall be implementation-dependent. Default value: 1.0 (user space unit is 1/72 inch).
This key was introduced in PDF 1.6; you won't find it in older files. It's optional, so you won't always find it in every page dictionary. In my book, I also explain that the maximum value of the UserUnit key is 75,000.
Now how to retrieve this value with iTextSharp?
You already have Rectangle dim = reader.GetPageSize(i);
which returns the MediaBox. This may not be the size of the visual part of the page. If there's a CropBox defined for the page, viewers will show a much smaller size than what you have in xy
(but you probably knew that already).
What you need now is the page dictionary, so that you can retrieve the value of the UserUnit key:
PdfDictionary pageDict = reader.GetPageN(i);
PdfNumber userUnit = pageDict.GetAsNumber(PdfName.USERUNIT);
Most of the times userUnit will be null
, but if it isn't you can use userUnit.FloatValue
.