Problem
This is a code I wrote to find all URL within a PDF file and replace the one that matches the ID that I passed as a parameter.
It works as intended, but as I am a beginner on Java, I am sure there are some best practices that I am not following.
I am using the PDFBox library.
Here’s the code:
Main.java
public class Main {
public static void main(String[] args) {
if (args.length < 4) {
System.err.println("Parameter missing from PHP");
} else {
Helper.getURL(args[0], args[1], args[2], args[3]);
}
}
}
Helper.java
public class Helper {
public static void getURL(String oldreportid, String newreportid, String oldpdf, String newpdf) {
PDDocument doc = null;
try {
doc = PDDocument.load(oldpdf);
List allPages = doc.getDocumentCatalog().getAllPages();
for (int i = 0; i < allPages.size(); i++) {
PDPage page = (PDPage) allPages.get(i);
List annotations = page.getAnnotations();
for (int j = 0; j < annotations.size(); j++) {
PDAnnotation annot = (PDAnnotation) annotations.get(j);
if (annot instanceof PDAnnotationLink) {
PDAnnotationLink link = (PDAnnotationLink) annot;
PDAction action = link.getAction();
if (action instanceof PDActionURI) {
PDActionURI uri = (PDActionURI) action;
String oldURL = uri.getURI();
String reportID = oldURL.substring(oldURL.lastIndexOf("=") + 1, oldURL.length());
if (oldreportid.equals(reportID)) {
String newURI = "http://www.test.com/test.php?T=MQ==&F=" + newreportid;
System.out.println("Page " + (i + 1) + ": Replacing " + oldURL + " with " + newURI);
uri.setURI(newURI);
}
}
}
}
}
doc.save(newpdf);
} catch (IOException e) {
e.printStackTrace();
} catch (COSVisitorException e) {
e.printStackTrace();
} finally {
if (doc != null) {
try {
doc.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
Solution
Naming
-
Helper
, although very common, is too generic as class name. Imagine it in a package with two dozens of other classes, how would you have a hint on what it is supposed to do by browsing the package? If it was named something likePdfAnalyzeHelper
orPdfProcessingHelper
, that would be better. -
Looking at the name of the method
getURL(args)
, I was expecting to find a return value instead ofvoid
, which was confusing. The prefixget*
in method names is often used for getters. Since this method does not return anything and executes some URL processing, that should be called, for example,replaceReportIdInUrls(args)
.
SRP
The single responsibility principle seems to be violated in this method. I see three distinct things done:
- Loading a PDF file.
- Reading the file and replacing values in URLs.
- Saving a new PDF file.
I’d suggest to split it into three distinct methods. It will be easier to maintain and reuse, if this is intended. Or, since the I/O with PDDocument
is very concise, remove it from the method body and change the signature to
public static PDDocument replaceReportIdInUrls(PDDocument doc,
String reportIdOld,
String reportIdNew);
try-with-resources
The I/O with finally { doc.close(); }
is plain old before-Java-7 style. It can be simplified for
try (PDDocument doc = PDDocument.load(oldpdf)) {
// method body
} catch (IOException | COSVisitorException e) {
// multicatch is also helpful!
}
This shortens the method for at least 12 lines!
Generics
The version of PDFBox that you use seems to be a legacy one, PDDocumentCatalog.getAllPages()
somehow returns an unbound List
.
In order to (at least) avoid compilation warnings, type bounds should be added, for example:
List<?> allPages = doc.getDocumentCatalog().getAllPages();
According to the javadoc, the list should contain PDPage
objects, so if you dare rely on it, you can try a cast to List<PDPage>
.
The same comment concerns the next List
of annotations. Strangely, the javadoc describes it as typed, so only <PDAnnotation>
should be added to the declaration.
For
Loops
The j
index of the inner loop is not reused, so it would be simpler to rewrite the loop:
for (PDAnnotation annotation : annotations) { ... }
and this will make useless the call annotations.get(j);
The same is about the outer loop, but since the i
index is reused for the output, it can be kept.
Validation
The line
String reportID = oldURL.substring(oldURL.lastIndexOf("=") + 1, oldURL.length());
is not secured against the case when the URL does not contain the “=” character. oldURL.lastIndexOf("=")
may return -1, in such case a StringIndexOutOfBoundsException will be thrown.
By the way, no validation at all is done for the method input params. What happens if one of them is null
or empty?