How to get the UserUnit from a PDF file?
I have a bunch of PDF files that I read into a byte array one by one. I then pass these byte arrays to a PdfReader
instance. Now I want know the dimensions of each page in pixels. From what I've read so far it seems by PDF files work in points, a point being a configurable unit stored in some kind of dictionary in an element called /UserUnit
.
Loading my PDF file into a PdfReader
, what do I need to do to get this user unit for each page (apparently it can vary from page to page) so I can then get the page dimensions in pixels.
At present I have this code, which grabs the dimensions for each page in "points". I guess I just need the /UserUnit
value, and can then multiply these dimensions by that to get pixels or something similar.
PdfReader reader = new iTextSharp.text.pdf.PdfReader(file_content); for (int i = 1; i reader.NumberOfPages; i++) { Rectangle dim = reader.GetPageSize(i); int[] xy = new int[] { (int)dim.Width, (int)dim.Height }; page_data[objectid + '-' + i] = xy; }
Allow me to quote from my book, iText in Action - Second Edition, page 9:
What is the measurement unit in PDF documents?
Most of the measurements in PDFs are expressed in user space units. ISO-32000-1 (section 8.3.2.3) tells us "the default for the size of the unit in default user space (1/72 inch) is approximately the same as a point (pt), a unit widely used in the printing industry. It is not exactly the same; there is no universal definition of a point."
In short, 1 in. = 25.4 mm = 72 user units (which roughly corresponds to 72 pt).
On the next page, I explain that it's possible to change the default value of the user unit, and I add an example on how to create a document with pages that have a different user unit.
Now for your question: suppose you have an existing PDF, how do you find which user unit was used? Before we answer this, we need to take a look at ISO-32000-1.
In section 7.7.3.3, entitled "Page Objects", you'll find the description of the /UserUnit
entry in Table 30, "Entries in a page object":
(Optional; PDF 1.6) A positive number that shall give the size of default user space units, in multiples of 1/72 inch. The range of supported values shall be implementation-dependent. Default value: 1.0 (user space unit is 1/72 inch).
This key was introduced in PDF 1.6; you won't find it in older files. It's optional, so you won't always find it in every page dictionary. In my book, I also explain that the maximum value of the UserUnit key is 75,000.
Now how to retrieve this value with iText for C#?
You already have Rectangle dim = reader.GetPageSize(i);
which returns the MediaBox. This may not be the size of the visual part of the page. If there's a CropBox
defined for the page, viewers will show a much smaller size than what you have in xy
(but you probably knew that already).
What you need now in iText 7 is the page dictionary, so that you can retrieve the value of the UserUnit
key. The code is written in Java, but it can be easily converted to C#:
PdfDictionary pageDict = pdfDoc.getPage(i).getPdfObject();
PdfNumber userUnit = pageDict.getAsNumber(PdfName.UserUnit);
Most of the times userUnit will be null
, but if it isn't you can use userUnit.floatValue()
.
Click How to get the UserUnit from a PDF file? if you want to see how to answer this question in iText 5.