> Java > tutorials > javanotes > ExceptionHandling
> Java > tutorials > javanotes > ExceptionHandling
by ZweiBieren Exception Handling in PDFBox Java coffee cup logo
Source Explorer
Java Notes

* This button downloads an installer jar. To install the app, put the jar in a folder and double-click it. Double click the application's icon to run it.

Long ago, discussions about exception handling in PDFBox concentrated on "exposing internal state" and using a canned package to handle exceptions. To me, it seemed the discussion missed a key point: Each Exception is a hint that the code needs more thought.

Some data. For the then current version of PDFBox 2.0:

lines of code: 33352
number of "throw new" statements: 746
ratio: 2.2% of all lines are throw statements
(An additional 38 lines throw Exceptions from variables instead of from new... . These are sterile exceptions; they contain no information describing the error.)

In contrast, the com.javax source code has

646455 lines
2650 "throw new"
0.4% throw statements

and JPedal, another pdf library, has

208265 lines
364 "throw new"
< 0.2% throw statements

PDFBox has a throw-statement-density of more than five times that of javax and ten times that of JPedal. Why the difference?

I suggest that the difference is in the maturity of the code. Exceptions are excellent tools when initially composing a program. The basic flow can be written without burdening it with a plethora of error handling code.

After the code is written and working, among the next steps is to reconsider each Exception to see if the error can be handled more gracefully. Some exceptions will certainly remain, but many can be logged and then ignored, repaired, or proxied. (For logging to be viable, PDFBox must adopt a default log examiner that both programmers and end-users can easily access or ignore without delving into the mysteries of log4j. log4j can remain, as long as there is a simple default configuration file that makes te log accessible to a PDFBox log reader application.)

Lets look at some classes. The number of "throw new" instances in the top ten PDFBox classes

36  pdfbox/pdmodel/edit/PDPageContentStream.java 
32  xmpbox/xml/DomXmpParser.java 
28  pdfbox/pdfparser/BaseParser.java 
27  pdfbox/pdfparser/NonSequentialPDFParser.java 
22  fontbox/afm/AFMParser.java 
20  pdfbox/pdfwriter/COSWriter.java 
20  pdfbox/preflight/content/StubOperator.java 
18  pdfbox/pdmodel/font/PDFontDescriptorAFM.java 
17  fontbox/ttf/CMAPEncodingEntry.java 
17  pdfbox/pdfparser/ConformingPDFParser.java

That several of these are parsers suggests that PDFBox has parsers that abort at the first sign of trouble. If you have ever tried to use such a parser, you know the frustration of making a run and only fixing only one problem, over and over again.

PDPageContentStream.java

Most of the exceptions thrown in PDPageContentStream are for illegal values of inTextMode. For instance, calling drawString() while inTextMode is false or calling setLineDashPattern() while inTextMode is true. These would seem to be programmer errors. However, in many applications, as in LayerUtility, the content comes from some existing document beyond the programmer's control. Instead of an abrupt termination via exception, I would log the error and repair the stream by inserting the beginning or end of a text block as necessary. Instead of

if (inTextMode) throw new Exception ...

I would write

if (inTextMode) { endText(); LOG.warn("inventing an endText ... "); }

Then execution could continue and the client-programmer or end-user could check the log for errors.

DomXmpParser.java

This parser is for meta-data: title, that is, date, author, keywords, .... It seems extra galling to abort processing of a file just because the meta-data are incorrect. Since meta-data are mostly optional, the parser can log and skip the offending sections of the source file. Instead of

if (inner == null) throw new XmpParsingException( ...);
process inner ...

the program could say

if (inner == null) log.warn("...");
else { process inner ... }

BaseParser.java

Several of the exceptions in BaseParser look like this:

throw new IOException("expected='R' actual='"+ r +"' "+pdfSource);

This may be useful to the client-programmer, but what is the poor end-user to make of it? He or she may not have any idea that there are objects, let alone references to them, or indirect references indicated with R. And the information printed for pdfSource is mostly inane. The message needs to be better thought out. Moreover, this aborts the parse in the middle when there may be further progress to be made by repairing or skipping the error. On initial examination it looks like this throw could be replaced with a repair that just inserts the "R":

log.warn(...);  pdfSource.unread(r);

Before doing this, some thought needs to be given to what kinds of tokens can be expected to follow this indirect reference and whether they can begin with the value of r.

BaseParser also twice aborts when a PushBackInputStream overflows. Then the hapless user reads:

Could not push back xxx bytes in order to reparse stream. 
Try increasing push back buffer using system property
org.apache.pdfbox.baseParser.pushBackSize

Besides being incomprehensible to an ordinary user, this abort is not even necessary. PDFBox already overrides PushbackInputStream (note the subtle difference in capitalization). The override method could repair the situation by replacing the pushback buffer with a bigger one.

DateConverter.java

I have recently been involved in replacing the code for DateConverter.java. The old code threw an IOException when its argument string was not a recognizable date:

throw new IOException("Error converting date: " + text);

instead of throwing the exception we returned null, which had formerly been returned only if the argument was the empty string.

By returning a dummy value, the exception was proxied. Proxying--a form of repair--is possible when a method parses some text and returns an object for it. Null is a common proxy value, but methods may also return versions of the object they usually return. Perhaps an instance of the object with an error flag turned on.

Conclusion

Rather than argue over philosophy and tools, each instance of "throw new" should be examined and treated as will best fit the needs of programmer/clients and end-users.

 
Copyright © 2023 ZweiBieren, All rights reserved. Feb 1, 2023 21:35 GMT Page maintained by ZweiBieren