Wednesday, February 23, 2011

Understanding Maven Dependency Mediation (Part 1)





One of the most powerful features of Maven is certainly the management of dependencies. As far as I know, the lack of dependency management in other well-established build tools was actually one of the reasons to start development of Maven. The whole dependency management stuff relies on the fact that people declare all required dependencies for their project explicitly in the POM file. Maven reads this information and decides which jars must be loaded from the repositories and so on. Most of you know all this stuff well, so I won't repeat the basics here again.
However, sometimes the dependency management thing in Maven seems to have a "bad day". Unknown jars pop up in your war file or Maven uses a completely wrong version for your build process. If things like this happen, it is likely that a mechanism called dependency mediation decided to screw up your well-defined list of dependencies. Let's take a look at that beast.

As you know, Maven resolves all required jars for your build transitively from the entries in your POM. For example, if you define a dependency to commons-logging and commons-logging itself defines a dependency to log4j, the commons-logging.jar and log4j.jar will be added to your build process. More formally spoken, transitiveness means that if A->B and B->C then A->C. Transitive dependencies are very cool and they are the prerequisite that really all classes required by your project end up in the classpath, war or whatever.

However, transitive dependencies don't come for free. One of the problems Maven is faced with are conflicts in the dependency tree. The image shows such a conflict. The sample project defines two direct dependencies: One to commons-logging-1.1 and one to log4j-1.2.13. Now, because Maven transitively loads all dependencies that are defined for commons-logging-1.1, a second version of log4j (V1.2.12) pops up in the dependency tree. If this happens, the mechanism called dependency mediation kicks in. Its job is to decide which of the two log4j versions must be used for the build process. In the shown example, log4j-1.2.13 will be selected. But why?

A first, simple explanation would be that V1.2.13 is newer (larger) than 1.2.12 and that's why Maven uses it. To verify if this is true, we can perform a simple test: We change the version of our dependency to log4j from 1.2.13 to 1.2.11. Now, if our explanation is correct, Maven should select log4j-1.2.12 for the build. Try it in one of your projects, Maven still uses V1.2.11 of Log4J. Obviously the actual value of the version had no effect on the dependency mediation.

In fact, Maven knows several strategies to resolve conflicts in the dependency tree. In the above example, the most commonly used and simplest strategy has been chosen. It is triggered, if the dependency in your POM looks like this:

<dependency>
    <groupId>log4j</groupId>
    <artifactId>log4j</artifactId>
    <version>1.2.11</version>
</dependency>

The important thing is the version number. It is stated "as is" without any extra additions (I'll come back to this in a follow-up post). If you define version numbers in this syntax, Maven treats it as a recommended version. So what you're actually doing is to tell Maven "I would prefer V1.2.11 of Log4J, but hey, I can live with any other version, too".

If all versions that are in conflict in your dependency tree have been defined like this, Maven simply chooses the one with the smallest distance from the root of the tree. In our example, log4j-1.2.11 has a distance of 1 to the root and log4j-1.2.12 has a distance of 2. Consequently, Maven chose V1.2.11. The actual value of the version doesn't matter at all. This algorithm is as simple as can be - and it works in almost all cases. This is why Maven in most project situations behaves exactly like you expect it. You get the version you declare in your POM, because your dependencies have always the smallest distance to the root of the dependencies tree.

But beware. Let's say someone in the Apache commons-logging project decides for whatever reason to change the POM of commons-logging like this:

<dependency>
    <groupId>log4j</groupId>
    <artifactId>log4j</artifactId>
    <version>[1.2.12]</version>
</dependency>

Now, commons-logging doesn't define a recommended version 1.2.12 anymore, but a specific one. This is done by simply adding the square brackets around the version number. If this happens you suddenly will end up with log4j-1.2.12 in your build - without actually changing your own dependency to log4j-1.2.11. Confused? I'll explain this behavior soon in a follow-up post.

6 comments:

  1. Wasn't the distance resolution replaced by highest number resolution (except if the pom being build actually specifies it) in maven 2.0.9? Or was it in 3.0.0?

    ReplyDelete
  2. AFAIK, no. I verified the distance resolution in the source code of V2.2.1. I didn't check the source of 3.0, but at least it behaves exactly the same as V2.2.1. Honestly, I cannot imagine that this behavior can be changed easily without breaking a great number of builds out there.

    ReplyDelete
  3. Good post, thanks for sharing.

    Waiting for second part.

    ReplyDelete
  4. Hi there,

    A colleague of mine sent me the link to your blog saying "We can't even be sure of what jar is actually used ! So using maven is frightening to me !"
    In my short experience with maven I never had such issue so I read it with a lot of interest, and simply did the test with commons-logging 1.1 and log4j-1.2.11 using maven 3.0.2

    When I mvn compile with -X I have the following debug:

    [DEBUG] com.progiweb:test-dep:jar:0.0.1-SNAPSHOT
    [DEBUG] commons-logging:commons-logging:jar:1.1:compile
    [DEBUG] logkit:logkit:jar:1.0.1:compile
    [DEBUG] avalon-framework:avalon-framework:jar:4.1.3:compile
    [DEBUG] javax.servlet:servlet-api:jar:2.3:compile
    [DEBUG] log4j:log4j:jar:1.2.11:compile

    So no trace here of log4j 1.2.12 which is confirmed by the compilation classpath.

    The same test with maven 2.2.1 shows exactly what you say it "selects" the nearest dependency:
    [DEBUG] com.progiweb:test-dep:jar:0.0.1-SNAPSHOT (selected for null)
    [DEBUG] commons-logging:commons-logging:jar:1.1:compile (selected for compile)
    [DEBUG] log4j:log4j:jar:1.2.12:compile (selected for compile)
    [DEBUG] logkit:logkit:jar:1.0.1:compile (selected for compile)
    [DEBUG] avalon-framework:avalon-framework:jar:4.1.3:compile (selected for compile)
    [DEBUG] javax.servlet:servlet-api:jar:2.3:compile (selected for compile)
    [DEBUG] log4j:log4j:jar:1.2.12:compile (removed - nearer found: 1.2.11)
    [DEBUG] log4j:log4j:jar:1.2.11:compile (selected for compile)

    So it seems things are handled differently with maven 3.0, and the rewrite of dependency management is one of the highlighted new features of maven 3.0 so I'm not that surprised (or they simply fooled me by removing the debug log :)

    ReplyDelete
  5. I'll add that you can also force version by adding the dependency in your project <dependencyManagement>, here I added log4j 1.2.11 in dependency management and removed the version in the regular dependency

    <dependencies>
     <dependency>
      <groupId>commons-logging</groupId>
      <artifactId>commons-logging</artifactId>
      <version>1.1</version>
     </dependency>
     <dependency>
      <groupId>log4j</groupId>
      <artifactId>log4j</artifactId>
      </dependency>
    </dependencies>
    <dependencyManagement>
     <dependencies>
      <dependency>
       <groupId>log4j</groupId>
       <artifactId>log4j</artifactId>
       <version>1.2.11</version>
      </dependency>
     </dependencies>
    </dependencyManagement>

    (Some might say it's a duplicate declaration here but usually you declare dependencyManagement only in a top level parent project, but I agree maven is on the verbose side :)

    With maven 2.2.1 I have the following debug:

    [DEBUG] com.progiweb:test-dep:jar:0.0.1-SNAPSHOT (selected for null)
    [DEBUG] commons-logging:commons-logging:jar:1.1:compile (selected for compile)
    [DEBUG] log4j:log4j:jar:1.2.12:compile (applying version: 1.2.11)
    [DEBUG] log4j:log4j:jar:1.2.11:compile (selected for compile)
    [DEBUG] logkit:logkit:jar:1.0.1:compile (selected for compile)
    [DEBUG] avalon-framework:avalon-framework:jar:4.1.3:compile (selected for compile)
    [DEBUG] javax.servlet:servlet-api:jar:2.3:compile (selected for compile)
    [DEBUG] log4j:log4j:jar:1.2.11:compile (selected for compile)

    It replaces the version coming from commons-logging by the one from your dependencyManagement.

    ReplyDelete
  6. @Laurent: I'm still pretty sure that Maven 3.0.x and Maven 2.2.x behave exactly the same in terms of dependency mediation. In your example, both choose log4j-1.2.11 in the build, it's just the DEBUG output that differs. In other words: Maven 3.0.x doesn't look at actual value of the version, but still takes the nearest one (otherwise it should at least have chosen the bigger version 1.2.12, which is referenced from commons-logging).

    Things change completely when dependencyManagement is used. I'll cover this topic in a follow-up post soon. If you use dependencyManagement, you have complete control what's going on.

    ReplyDelete

Note: Only a member of this blog may post a comment.