How To find internal links in a PDF file?
I am using ItextSharp for searching internal links in a PDF file. This is already done with External Links.
//Get the current page PdfDictionary PageDictionary = R.GetPageN(page); //Get all of the annotations for the current page PdfArray Annots = PageDictionary.GetAsArray(PdfName.ANNOTS); //Make sure we have something if ((Annots == null) || (Annots.Length == 0)) { Console.WriteLine("nothing"); } //Loop through each annotation if (Annots != null) { foreach (PdfObject A in Annots.ArrayList) { //Convert the itext-specific object as a generic PDF object PdfDictionary AnnotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(A); //Make sure this annotation has a link if (!AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK)) continue; //Make sure this annotation has an ACTION if (AnnotationDictionary.Get(PdfName.A) == null) continue; //Get the ACTION for the current annotation PdfDictionary AnnotationAction = AnnotationDictionary.GetAsDict(PdfName.A); // Test if it is a URI action (There are tons of other types of actions, // some of which might mimic URI, such as JavaScript, // but those need to be handled seperately) if (AnnotationAction.Get(PdfName.S).Equals(PdfName.URI)) { PdfString Destination = AnnotationAction.GetAsString(PdfName.URI); string url1 = Destination.ToString(); } } }
Posted on StackOverflow on Feb 22, 2014 by Ashwani
You've already done most of the work. Please take a look at the following screen shot:
Internal view of the PDF
You see the /Annots
array of a page. You are already parsing that array in your code and you skip all annotations that aren't of the /Subtype
/Link
or don't have an /A
key, which is excellent.
Currently you're only looking for values of /S
that are of type /URI
. You say you're already done with external links, but that's not true: you should also look for entries where /S
is /GoToR
(remote goto). If you want internal links, you need to look for /S
values equal to /GoTo
, /GoToE
, and (in the future) /GoToDp
. Maybe you also want to remove the /JavaScript
actions, because they can also be used to jump to a specific page.
Click this link if you want to see how to answer this question in iText 7?