How To find internal links in a PDF file?
I am using ItextSharp for searching internal links in a PDF file. This is already done with External Links.
//Get the current page
PdfDictionary PageDictionary = R.GetPageN(page);//Get all of the annotations for the current page
PdfArray Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);//Make sure we have somethingif((Annots ==null)||(Annots.Length==0)){
Console.WriteLine("nothing");}//Loop through each annotationif(Annots !=null){foreach(PdfObject A in Annots.ArrayList){//Convert the itext-specific object as a generic PDF object
PdfDictionary AnnotationDictionary =(PdfDictionary)PdfReader.GetPdfObject(A);//Make sure this annotation has a linkif(!AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK))continue;//Make sure this annotation has an ACTIONif(AnnotationDictionary.Get(PdfName.A)==null)continue;//Get the ACTION for the current annotation
PdfDictionary AnnotationAction =
AnnotationDictionary.GetAsDict(PdfName.A);// Test if it is a URI action (There are tons of other types of actions,// some of which might mimic URI, such as JavaScript,// but those need to be handled seperately)if(AnnotationAction.Get(PdfName.S).Equals(PdfName.URI)){
PdfString Destination = AnnotationAction.GetAsString(PdfName.URI);
string url1 = Destination.ToString();}}}
Posted on StackOverflow on Feb 22, 2014 by Ashwani
You've already done most of the work.
In iText 7 for Java your code will be the following:
//Get the current page
PdfPage pdfPage = pdfDoc.getPage(page);
//Get all of the annotations for the current page
List annots = pdfPage.getAnnotations();
//Make sure we have somethingif ((annots == null) || (annots.size() == 0)) {
System.out.println("nothing");
}
//Loop through each annotationelse {
for (PdfAnnotation a : annots) {
//Make sure this annotation has a linkif (a.getSubtype().equals(PdfName.Link))
continue;
//Make sure this annotation has an ACTIONif (a.getAction() != null) {
//Get the ACTION for the current annotation
PdfDictionary annotAction = a.getAction();
// Test if it is a URI action (There are tons of other types of actions,// some of which might mimic URI, such as JavaScript,// but those need to be handled seperately)if (annotAction.get(PdfName.S).equals(PdfName.URI) ||
annotAction.get(PdfName.S).equals(PdfName.GoToR)) {
//do smth with external links
PdfString destination = annotAction.getAsString(PdfName.URI);
String url1 = destination.toString();
}
elseif (annotAction.get(PdfName.S).equals(PdfName.GoTo) ||
annotAction.get(PdfName.S).equals(PdfName.GoToE)) {
//do smth with internal links
}
}
}
}
As you see, you don't need to get the array of annotations yourself and convert annotation object to the PdfDictionary, as it was done in iText 5. Just use built-in methods.
Please take a look at the following screen shot:

Internal view of the PDF
You see the /Annots array of a page. You are already parsing that array in your code and you skip all annotations that aren't of the /Subtype /Link or don't have an /A key, which is excellent.
Currently you're only looking for values of /S that are of type /URI. You say you're already done with external links, but that's not true: you should also look for entries where /S is /GoToR (remote goto). If you want internal links, you need to look for /S values equal to /GoTo, /GoToE, and (in the future) /GoToDp. Maybe you also want to remove the /JavaScript actions, because they can also be used to jump to a specific page.
Click this link if you want to see how to answer this question in iText 5.