[JLBP-6]

Rename artifacts and packages together

When a library B in Java depends on another library A through the Maven repository system, library B needs two identifiers to find classes in library A:

  1. The Maven coordinates of library A, following the form group_ID:artifact_ID:version; for example com.google.guava:guava:26.0-jre. The Maven coordinates are used to locate the library’s files in a Maven repository. For each pair of group ID and artifact ID (hereafter referenced as “Maven ID”), the user’s build system (for example Maven or Gradle) selects exactly one version to put on the classpath. Different build systems use different rules for selecting from multiple versions.

  2. The fully-qualified class names of the classes in library A. These classes generally share a Java package (the package as defined in package statements, for example package com.google.common.collect). The classpath, which is formed from Maven artifacts and possibly non-Maven sources, is searched for each fully-qualified class name at runtime.

When breaking changes are introduced to a library between major version 1 and major version 2, a choice needs to be made: to rename or not rename? This question applies to both items listed above, the Maven ID (1) and the Java package (2).

Recommendations:

Consider the following renaming scenario:

Given this scenario, here are the possible combinations of renamings:

Given the consequences, maintainers should avoid case 2 (renaming the Maven ID while keeping the Java package the same) and case 3 (renaming the Java package while keeping the Maven ID the same). Among the remaining three cases, the impact of a Maven ID change is minuscule compared to the impact of a Java package rename, so the remaining discussion focuses only on the Java package rename.

The cost of diamond dependency conflicts due to not renaming has to be weighed against the cost of updating import statements everywhere the library is used. Let’s take examples from two extremes.

  1. A library with 10,000 references throughout 100 packages, and which has a function with one reference in a leaf of the dependency graph that is deleted between major version 1 and major version 2.

    In this case, moving 10,000 references to a new package in a large dependency tree would be a very expensive endeavor. In contrast, updating the one place that references the deleted function to use the new function is considerably less work and can be rolled out much more quickly. In this scenario, it is clearly superior to keep the same Java package.

  2. A library with 10,000 references throughout 100 packages, and a large refactoring breaks the surface of 5,000 of those references between major version 1 and major version 2.

    In this case, changing consuming code would be a large undertaking. Not all maintainers will feel it’s worth migrating to the new major version. If the library author decides to keep the same Java package, the ecosystem has to bifurcate to handle code paths requiring one major version or the other. Some projects keep using the old version. Other projects upgrade to the new version of the dependency. Therefore, there will be diamond dependency conflicts. In this scenario, it is clearly superior to rename the Java package, and treat the new major version as a new library.

Note that both of these examples are for a library with a large number of places that reference it. The fewer places that a library is referenced, and the closer to the leaves of the graph that the library is referenced, the less impact there is to the decision.

Examples in open source

Case 1 - Keep Java package name and Maven ID

Case 2 - Keep Java package name, rename Maven ID

Case 4 - Rename both Java package and Maven ID

Case 5 - Bundle old and new packages in the existing Maven ID