How To find internal links in a PDF file?
I am using ItextSharp for searching internal links in a PDF file. This is already done with External Links.
//Get the current page PdfDictionary PageDictionary = R.GetPageN(page); //Get all of the annotations for the current page PdfArray Annots = PageDictionary.GetAsArray(PdfName.ANNOTS); //Make sure we have something if ((Annots == null) || (Annots.Length == 0)) { Console.WriteLine("nothing"); } //Loop through each annotation if (Annots != null) { foreach (PdfObject A in Annots.ArrayList) { //Convert the itext-specific object as a generic PDF object PdfDictionary AnnotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(A); //Make sure this annotation has a link if (!AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK)) continue; //Make sure this annotation has an ACTION if (AnnotationDictionary.Get(PdfName.A) == null) continue; //Get the ACTION for the current annotation PdfDictionary AnnotationAction = AnnotationDictionary.GetAsDict(PdfName.A); // Test if it is a URI action (There are tons of other types of actions, // some of which might mimic URI, such as JavaScript, // but those need to be handled seperately) if (AnnotationAction.Get(PdfName.S).Equals(PdfName.URI)) { PdfString Destination = AnnotationAction.GetAsString(PdfName.URI); string url1 = Destination.ToString(); } } }
Posted on StackOverflow on Feb 22, 2014 by Ashwani
You've already done most of the work.
In iText 7 for Java your code will be the following:
//Get the current page
PdfPage pdfPage = pdfDoc.getPage(page);
//Get all of the annotations for the current page
List annots = pdfPage.getAnnotations();
//Make sure we have something
if ((annots == null) || (annots.size() == 0)) {
System.out.println("nothing");
}
//Loop through each annotation
else {
for (PdfAnnotation a : annots) {
//Make sure this annotation has a link
if (a.getSubtype().equals(PdfName.Link))
continue;
//Make sure this annotation has an ACTION
if (a.getAction() != null) {
//Get the ACTION for the current annotation
PdfDictionary annotAction = a.getAction();
// Test if it is a URI action (There are tons of other types of actions,
// some of which might mimic URI, such as JavaScript,
// but those need to be handled seperately)
if (annotAction.get(PdfName.S).equals(PdfName.URI) ||
annotAction.get(PdfName.S).equals(PdfName.GoToR)) {
//do smth with external links
PdfString destination = annotAction.getAsString(PdfName.URI);
String url1 = destination.toString();
}
else if (annotAction.get(PdfName.S).equals(PdfName.GoTo) ||
annotAction.get(PdfName.S).equals(PdfName.GoToE)) {
//do smth with internal links
}
}
}
}
As you see, you don't need to get the array of annotations yourself and convert annotation object to the PdfDictionary
, as it was done in iText 5. Just use built-in methods.
Please take a look at the following screen shot:
Internal view of the PDF
You see the /Annots
array of a page. You are already parsing that array in your code and you skip all annotations that aren't of the /Subtype
/Link
or don't have an /A
key, which is excellent.
Currently you're only looking for values of /S
that are of type /URI
. You say you're already done with external links, but that's not true: you should also look for entries where /S
is /GoToR
(remote goto). If you want internal links, you need to look for /S
values equal to /GoTo
, /GoToE
, and (in the future) /GoToDp
. Maybe you also want to remove the /JavaScript
actions, because they can also be used to jump to a specific page.
Click this link if you want to see how to answer this question in iText 5.