Rant about Maven repositories
Every company doing Java development with Maven is probably hosting their own repositories. They’re probably also using a repository manager such as Nexus or JFrog as a proxy to cache 3rd party artifacts locally. At least we do. But what happens when moving to the cloud?
There are a couple of reasons to use local repository manager. First one is the obvious: You need a home for your artifacts. Second is network performance considering how much data Maven loads in the form of dependencies. And finally, the fact that a repository manager with its cache protects you from possible outages on Maven Central (and other repositories you depend on).
MAVEN CENTRAL WAS UNAFFECTED BY THE EC2 OUTAGE BECAUSE IT RUNS ON AN AS400 IN SOMEONE’S GARAGE— Dan Woods (@danveloper) June 9, 2016
But when you want to move to the cloud, as everybody’s doing right now, the cache becomes problematic. If you’re building your artifacts with a cloud service, say Bitbucket Pipelines, you probably don’t want to fetch every artifact from your local repository manager as that’d slow down your build to a crawl.
Just don’t use the mirror on the cloud, you say? Fair enough, but you need to
access your local artifacts somehow. So you add your local repository to your
pom.xml if you haven’t already done that. Using a mirror with Maven
makes adding repositories to
pom.xml obsolete so you can easily forget about
them, but anyways that sounds simple.
The next problem is that Maven starts fetching your artifacts from the Internet while also checking your local repository for everybody else’s. This again slows down the build as many requests end up with 404 Not Found. Fortunately Maven fetches dependencies concurrently so it’s not as bad as you might think. You may, however, not appreciate Maven revealing your artifact secrets by trying to fetch your local artifacts from Maven Central (or any other 3rd party repository).
There are a couple of possible solutions to the problem. Let’s discuss each of them separately.
Keep using the mirror, but exclude Maven Central
As described in the Guide to Mirror Settings you can also exclude repositories when configuring a mirror. This means you could keep using your local mirror for everything but Maven Central. Just make sure your mirror does not serve content available in the Central.
Depending on how much open source software you use, this may give you significant improvements regarding build performance.
<?xml version="1.0" encoding="UTF-8"?> <settings xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd"> <mirrors> <mirror> <id>nexus</id> <url>https://your-private-mirror</url> <mirrorOf>*,!central</mirrorOf> </mirror> </mirrors> </settings>
Unfortunately it still does not help solving the problem of Maven requesting artifacts from any repository (or mirror) until it finds what it’s looking for.
Move also your repository manager to the cloud
Moving your repository manager to the cloud might help a little regarding performance but may still slow down the build depending on where you’re running the build and hosting the manager. It does, however, solve the issue of Maven querying the Internet for local dependencies. Both Nexus and JFrog support routing where you can specify patterns for groupIds to look for (or not) in a remote repository.
The downside is that you need to setup the repository manager on the cloud. JFrog is available as a public cloud service, but unfortunately there doesn’t seem to be many other options. Also JFrog charges for data transfers so using it as a cache has an extra cost attached to it. If you know of alternatives, please let me know!
Also you need to consider whether or not you want to store your artifacts on the cloud. If not, the role of the repository manager is to act just as a mirror. If you do, however, then this might be the best option for you.
Utilize a routing proxy as part of your build
As we are moving ahead with our cloud strategy at my company, we’re moving projects to Bitbucket and migrating from local Bamboo builds to Bitbucket’s awesome CI tool called Bitbucket Pipelines.
The more projects I have pushed into Bitbucket and built with Pipelines the more frustrated I have become with Maven and its poor understanding of the source of the dependencies.
As I thought about various ways to solve the issue I came up with an idea to setup a local web server with some Maven knowhow as part of the build process to act as a proxy between Maven and the target repositories. Its sole purpose is to make sure Maven goes immediately to the right repository.
After all the logic is really simple: In addition to defining the URL for the remote repository, you also define which groupIds you know are available in that repository. Upon receiving a request the proxy uses that information to pick just the right target for the request.
The web server can be started in the beginning of the build and stopped when the build is completed. Besides configuration it’s fully stateless. You can use services in Bitbucket Pipelines or just start the server on background as any other process before launching Maven.
If you want (or need) to store your artifacts in the cloud, setting up a repository manager close to the build environment may be the best option. It will help you with both bandwidth and performance, and also control look ups for artifacts. If you don’t want to manage the repository yourself your options are somewhat limited.
On the other hand if you want to keep your artifacts to yourself, then you might want to consider the last option and setup a local proxy on the build environment to pass on requests to the right remote repository.
Thanks for reading and please leave a comment to let me know what you think!