Modularizing a Large Commercial Code Base

Part I: Defining goals, picking features, asking questions

17.12.2019 | Boris Terzic & Nicolai Parlog

In 2017, we (painfully) migrated our entire 1.5 million LOC codebase consisting of 450 maven modules to Java 9. Now we are on 11 and plan to stay on this LTS version for a while. We have benefited from the few small new language features (for example, more seamless try-with-resources blocks and var), new and refined APIs (collection factories, Stream, Optional) as well as some improvements in the JVM itself (string performance improvements), but a few of us wanted to see if we couldn’t adopt the Java Module System. Our gut feeling was that the improved accessibility restrictions offered by the module system would benefit our ability to design and maintain our software.

Step 0: Getting third-party dependencies in shape

A sensible prerequisite for doing anything with modules is that your third-party dependencies at least have sensible module names. If they do not, they will have an automatically generated name, based on the jar file name and that can lead to unpredictable and brittle dependencies.

An intrepid developer took it upon himself to slowly move all our dependencies to a compatible state. This involved upgrading third-party party libraries, badgering maintainers of open source projects, providing patches and pull requests, and sometimes getting rid of dependencies that are no longer maintained.

Step 1: What do we need, what do we want to avoid?

After reaching that baseline, we needed to seriously consider the hurdles and ramifications of pursuing this any further. What did we expect to gain from using Java modules? Does it work with our tooling? Does it work with our deployment? Does it work with CI? We assembled a ragtag team of eager individuals and banged our heads together.

Module system features

Here’s what we thought about the module system’s core benefits.

Reliable configuration

(Having the module system verify that all dependencies are present.)

We have a well-oiled build pipeline and rarely have problems with missing/duplicate dependencies at run time.

Strong encapsulation

(Only being able to access another module’s API if it is exported.)

This is what we looked forward to the most! With it, we can effectively document public vs private API (within our own subprojects) in code and make sure what one developer wrote as a module-internal API is not reused across module boundaries by somebody else without considering the ramifications.

Another aspect is that the accessibility rules prevent inadvertent access to indirect dependencies. Within our own subprojects that is a recurring challenge that we can get rid of this way.

Improved Maintainability

A module-info.java in every module as a condensed view of the JAR is easier to review and analyze than package imports or Maven poms.

Scalable systems

(Creating custom JDK runtimes for reduced deplyoment size.)

Cadenza is pretty big, with about two hundred dependencies, and needs most, maybe even all of the JDK anyway, so we’re not expecting a big gain here.

Module system risks

It’s not all rainbows and unicorns, though. There are quite a few challenges that we expected:

dependencies that misbehave on the module path
complicated execution scripts by having to split module and class path
varying tool support that makes builds in IDEs and build tools behave differently

Conclusion

While discussing this, we quickly came to a consensus that we wanted to reap the benefits of the module system at compile time but we saw no pressing need for run-time modules. If possible we would build and develop using Java modules and at run time everything would still be on the class path.

We hoped that this would reduce the amount of issues we would run into while still getting all the benefits of being able to better specify our APIs and hiding our implementations. While not sure how well this works with various tools it seemed worth a shot.

Step 2: Usage guidelines

With that settled, we discussed what features we want to use and how:

Module names: We decided on a reverse FQDN notation, similar to package names. This seemed the most consistent and the least likely to cause problems down the road. Where possible, the module name would be a prefix of the package name, in cases where the package names are really old and terrible, we may choose something more fitting.
Plain requires and exports: We obviously need them to make this work at all.
Implied readability with requires transitive: One of the problems we want to fix is the inadvertent use of transitive dependencies. Unfortunately we have a few, err, “poorly modularized” artifacts that many others depend on and that would likely requires transitive many others, meaning hidden dependencies would be back in force. We hence decided against using this feature.
Optional dependencies with requires static: This only differs from requires at run time, so there’s no reason to use it.
Qualified exports with exports to: They allow something in between private and public APIs, which we felt was a mostly unnecessary dilution of the module system’s promises of strong encapsulation. We only allow it as a temporary solution for cases where subprojects that are being modularized are too closely coupled and decoupling would take too much time and hinder the modularization effort. Each use should be explained with a comment.
Reflective access with opens/opens to and services with uses/provides...with These are run-time concepts that we can ignore since we only want to use the module system at compile-time.

Step 3: More questions that need answers

We also came up with a set of questions we needed to answer before we could proceed:

How does Maven deal with modules? How does it deal with unit tests? Does Maven create synthetic test modules? Does it execute them on the class path?
Since we try to be IDE agnostic and in practice use both IntelliJ and Eclipse, we needed to answer similar questions for those tools.
How many split packages do we have? And how do we deal with them? Split packages are anathema to modules. You can not have the same package in multiple modules on the class path. This would not be a problem at runtime since we only use the class path, but it would bite us at compile time and maybe unit test time.
Will we shoot for a “big bang” migration or do it over time? Related to that: Should modularization leave behind “perfect modules”, which takes longer, or do we prefer a result closer to the current state, where some subprojects are “poorly modularized”?
How deep do we explore? / When do we just start doing it?

What answers did we come up with? Did we manage to do the migration? Find out in one of our next installments!

The title image was published by Braden Collum under the Unsplash License.