~/posts/shrinking-elephants

Shrinking Elephants

Managing very large projects with Gradle and Intellij IDEA-based IDEs

Shrinking Elephants
$ cat content.md

Photo by Lili Koslowski on Unsplash.

So you want Intellij IDEA to sync 5 million lines of Kotlin code,1 spread across more than 2000 Gradle projects.

No, you don't.

In one of our largest repos, a Kotlin backend project, syncing it all takes an average of about 8.4 minutes, and that's being generous. Achieving that required a fully up-to-date build, with all dependencies already downloaded—meaning this was 8.4 minutes for what is essentially a no-op. This no-op sync used the full 24 GiB of heap provided to the Gradle daemon, and an additional 12 GiB heap provided to the IDE process, or 36 GiB in total. If this is the best you can offer your developers, do not expect them to be very productive, and do not expect to ship software quickly.

A more realistic scenario is triggering the first sync of the day after a git pull that updated dependencies. In that "cold sync" scenario, the same build described above took 24.7 minutes, more than 7 minutes of which was spent downloading over 26000 files (jars plus metadata) totalling nearly 4 GiB over the wire. This build maxed out the heap very quickly, and it wouldn't be hard to imagine a scenario where the build thrashed gc for an hour before finally OOMing.

While you may want the full project loaded into memory, you almost certainly can't bear the consequences: extremely long sync, sluggish editing, the spinning beachball of death…

Fortunately, our developers never have to do this (sometimes a build engineer might… for analysis purposes). The actual experience we target for our developers is benchmarked at 15 seconds, a 97% improvement over the baseline outlined above; and only using 3.5 GiB heap, a 75% improvement. How did we do it?

What is IDE sync?

Gradle Sync is how the IDE loads important information about your build (known as the project model) to enable smart editing features like autocomplete, code navigation, viewing sources for dependencies, etc. To make this work, IntelliJ injects a library into the Gradle daemon containing the model builders that construct the IntelliJ project models. IntelliJ then requests that the Gradle daemon build the project models using the Tooling API. Gradle will configure your build, execute the model builders requested by IntellIJ, then serialize the data back to the IDE. At this point, the project sync is now completed and many smart editor features will be available. IDE indexing is a separate process that takes place after sync to enable the rest of the editor features, which this post won’t cover.

The sync process is notoriously slow because of how much work needs to be done. The entire build must be configured (configure-on-demand is not applicable here), project dependencies and their sources need to be downloaded if they aren’t cached locally, and this work is largely performed in serial, leaving your CPU mostly idle. If any error occurs in the process, or any build.gradle(.kts), settings.gradle(.kts) or version catalog file is updated, the sync process must be restarted from scratch.

Benchmarks vs telemetry

Telemetry (that is, real data from real users) is the gold standard for tracking build performance. This is what makes Develocity such an indispensable tool for tracking and understanding our Gradle builds (about 2 million every week). Even with such a firehose of data, however, there are gaps. For purposes of this post, the biggest gap is the wall-clock time for an IDE sync (as described above). IDE-specific overhead contributes anywhere from 10% to over 50% of total sync time, so if we're only tracking the Gradle portion of the sync, we're missing the full picture.

We solved this problem just a couple of months ago with a new combination IDE and Gradle plugin we call build-sync-metrics.2 With this, we now have complete telemetry from our developers on the performance of the IDE in their local environments.

We can, for example, state the following, thanks to our telemetry:

Over the course of Q3, the Backend Build Team was able to reduce mean sync time for our backend developers from over 4 minutes to about 90 seconds, saving 2.5 minutes per sync. With over 100k syncs per year, this translates to 2-3 eng-years saved every year.

While this is incredibly powerful, there are limitations. This population data isn't really amenable to rapid experimentation, or experimentation across a large combination of dimensions, and is—by defintion—coming from an uncontrolled environment with an uncountably large number of exogenous inputs. There are, to borrow a phrase, unknown unknowns.

This is where benchmarks really shine. Essentially, they enable us to test possible "interventions" (described below) before we inflict them as A/B tests on our developers. They give us confidence that something might work, which saves time and potential pain. Once we see value from an intervention via a benchmark, we can start rolling it out to our users and see the impacts in real-time. We use gradle-profiler for creating these benchmarks.

The rest of this post makes exclusive use of benchmarks to describe what we've done to make use of the IDE a more pleasant experience for our developers. This allows us to make precise statements about the impacts of these interventions. Please note that all of these benchmarks are also backed up by real-world telemetry, which is necessarily fuzzier in nature.

Interventions

The primary cost of syncing a Gradle project is configuring its project model. In fact, sync duration appears to be slightly quadratic over the number of subprojects (or modules) in the project, according to our benchmarks:

A line chart showing sync duration vs project count, along with an interpolated quadratic curve with, with the equation y = 1.41 + 0.066x + 0.0000672x^2.

This chart was created using data from a real project, not something artificially generated.3 How this was done will be clear in a moment. As to why the data might be quadratic: that's hard to say. Discussions with Gradle engineers suggest it might have to do with the project structure, such as depth of the project graph, number of edges, etc. In other words, while this graph accurately reflects our project, it may not reflect others. It is almost certainly true that there is at least a linear relationship, however.

The chart above suggests that anything that can reduce subproject count is likely to improve sync performance. However, there are other design concerns that determine the number of subprojects:

  1. Project architecture is a first-class consideration, and we strongly believe that the project structure should reflect its architecture.4
  2. Sync performance is only one component of developer productivity. We must also consider build performance, which includes both project configuration and build execution. Weighing configuration too heavily will result in extremely slow builds that don't benefit at all from incremental or avoided execution.

We refer to the following practices as "interventions" because they're out-of-band of the normal build process. They are not first-class features of the build system, though they do use public APIs where relevant. They also require the ability to know something a priori about the software-under-development, and this prerequisite is neatly encapsulated by the requirement that the build be fully conventionalized prior to making any further intervention. Conventionalization is out-of-scope of this post, so please read the prior two posts in this series, Herding Elephants and Stampeding Elephants, for more information.

These interventions are listed in order of ease of implementation, more or less:

  1. In IDEA, enable the experimental Parallel Model Fetching feature.5
  2. Use a tool such as Spotlight to trim the project dependency graph to a subset of the full project set.
  3. Use a tool such as Fastsync to reduce the amount of work the Gradle dependency resolution engine has to do (by mangling the runtime classpath) during IDE sync.
  4. Take a hammer to your project graph and replace as many project dependencies with external module dependencies as possible. We call this "artifact-swap."

And finally, included here because of its impact, which is challenging to measure via benchmarks:

  1. Pre-fetch dependencies.

All together, these changes can reduce your IDE sync time by 97%, according to our benchmarks. A stacked bar chart showing sync duration vs intervention (cumulative, left to right). Cumulative impact of interventions.

There is a cost

While we firmly believe that these interventions' benefits outweigh their costs, we can't ignore those costs. There is obviously the implementation and maintenance cost—some of these improvements will require a dedicated build team (which you should have if your repos are large enough to benefit from these sorts of interventions). There's also a DX cost, since users to have to learn new workflows. These downsides are real, but not prohibitive. Let's continue!

Baseline

In order to contextualize the interventions, we must first describe the baseline scenario, which is quite simply a Kotlin JVM project with over 2000 Gradle subprojects. Syncing this scenario with Intellij IDEA 2025.2.3 and Gradle 9.1.0, absent any of the "interventions" described below, took an average of 8.4 min, or about 220 ms / project.

As noted in the introduction, all benchmarks are necessarily "warm" or "no-op" in nature. We run the sync three times as a warm-up, and then another eight times, averaging the result. Therefore these syncs should involve, among other simplifications, minimal network overhead (they might still make HEAD requests to validate the up-to-date-ness of dependencies).

The metrics provided below are the result of layering on each intervention cumulatively. We also have independent benchmarks validating that each intervention is successful on its own.

Parallel model fetching

Compared to the baseline scenario, enabling parallel model fetching reduced sync duration by 57%, to about 3.6 min, or about 95 ms / project.

This "intervention" barely qualifies as one, but must be mentioned due to the high impact for near-zero cost. It is an "incubating" feature in Intellij IDEA, and is disabled by default. To enable it, navigate to Settings > Build, Execution, Deployment > Gradle and check the "Enable parallel Gradle model fetching" box, as indicated in the screencap below. Note that this checkbox only affects fetching of IntelliJ models, and Android model builders are already fully parallelized.

Disclaimer: enabling this can result in random failures, but this is extremely rare and, in our opinion, worth it for the impressive performance benefit conferred, especially since the failures are trivially resolved by another sync. The failures seem to be a result of race conditions due to configuring the Gradle project model in parallel, which it simply isn't designed for (yet). One issue you can track related to this is IDEA-355309.

A screenshot from Intellij IDEA showing how to enable the parallel model fetching feature.

Spotlight

Compared to the baseline scenario, adding Spotlight to our build reduced sync duration by 82%, to about 1.5 min, or about 120 ms / project. (This is without parallel model fetching, to be clear.) Combined with parallel model fetching, the full reduction was 90%, or about 48% better than parallel model fetching alone.

Sometimes referred to as "focus", this intervention is required if you want to manage very large Gradle builds (meaning it has a few hundred projects). This intervention works by dramatically reducing the number of active projects Gradle must configure. Spotlight scans the project buildscript files to parse the project dependency tree without configuring any projects, then only includes any projects from the build that are in the dependency tree of the projects the user requested to load in the IDE. For the purposes of this post, we cut the number of projects from more than 2000 to about 750. This cutoff captures 95% of all syncs we observe from our developers. That is, only 5% of all syncs target a project set larger than this. The figures above make use of a Spotlight-like tool to "target" specific subsets of our larger monorepo.

If you're able to abide by a few constraints, as clearly outlined in the Spotlight documentation, you can benefit from this intervention today. We apply some automated enforcement of these rules to prevent issues from popping up.

Intransitive sync

Compared to the baseline scenario, layering on intransitive-sync reduced sync duration by 94%, to about 28 sec, or about 40 ms / project.

The original inspiration for this idea came from the MDX Android team, which manages the Square-Android Point-of-Sale repo (extremely large at almost 7000 projects). In discussions between this team and the Backend Build Team (which manages the aforementioned very large Kotlin JVM project, among other things), we realized that IDE-based development and code completion only needed access to the compile classpath—not the runtime classpath. We therefore disabled transitive dependency resolution for all runtime classpaths (runtimeClasspath, testRuntimeClasspath, and so on). We rolled this out to users over a period of about one month while ironing out minor kinks in the implementation, before finally enabling it universally. Note that this feature may break debugging and UI previews in Android Studio.

Artifact swap

Compared to the baseline scenario, layering on artifact-swap reduced sync duration by 97%, to about 15 sec, or about 84 ms / project. Note that the cost-per-project has gone up a bit, reflecting the fact that the sync-time vs project-count model is dominated by some constant overhead at low project count.

Spotlight still requires us to load all projects in the dependency tree of our target projects into the IDE. We go even further and swap out any projects that were not requested by a user with a precompiled jar artifact (or aar for android projects). Doing this lets us minimize the Gradle project list to only what the user has requested to be loaded in the IDE. Inspiration for this came from a Droidcon talk.

If a Gradle project is referenced in a build file, it must also be include-ed in settings and configured by Gradle, which is what we are trying to avoid. We start by ensuring that all project() references in build files are rewritten to maven coordinate of a precompiled artifact. In Groovy buildscripts we do this by using Groovy metaprogramming with a project plugin to override the project() function to do what we want instead of the default behavior. This project plugin is applied to every requested project in the build by a settings plugin. Projects using Kotlin buildscripts can actually inject a DSL override more directly using this One Weird Trick.

This process actually goes too far and may end up rewriting references to projects a user is working on to a maven coordinate, but we fix this in a second stage by applying dependency substitution to rewrite those references back to a project reference. This also makes any artifact that references a project artifact that a user currently has loaded be rewritten back to the active Gradle project.

This brief overview of artifact swap elides a lot of additional infrastructure used to publish project artifacts, prefetch those artifacts for improved performance (see below), IDE plugin support for navigating the swapped project, and tooling to reliably configure all of this. We hope to release an open source implementation of this in the near future so readers don’t have to draw the rest of the owl themselves.

Pre-fetch dependencies

This intervention is of a somewhat different nature from the others discussed above. It helps not only with syncs but with all builds, by downloading dependencies before the user requests them. Our data suggest that the middle quartile case (p25-75) dropped from nearly 2 minutes to about 20 seconds. That is, Gradle spends about 83% less time downloading dependencies, during IDE sync, post-intervention, according to Develocity.

We can't make statements on this based on benchmarks since those are, by definition and as noted above, fully warm—dependencies have already been fetched. The Develocity telemetry, however, gives us high confidence that this improves the local development experience.

Our pre-fetching works in two ways: via a git post-checkout hook, and also on a periodic schedule. In both cases, the background process gets the list of dependencies from a Gradle task registered by the Dependency Analysis Gradle Plugin:

shell
1./gradlew :computeAllDependencies

This generates a file at build/reports/dependency-analysis/allLibs.versions.toml. This version catalog is then used by Gradle in a special background process that is installed on all developer computers by Block’s MDM solution and Salt. This background process uses Gradle to resolve all dependencies (including transitive dependencies) needed by the build.

Future interventions

We've already achieved fairly amazing results, but as we know, the build engineer's work is never done. What else can we do? Three things come immediately to mind, and are part of our plans for 2026: we want to clean up our dependency graph, we want to cleanup our project structure, and we want to make our build fully compliant with Gradle's Isolated Projects feature.

We think that cleaning up our dependency graph (with the help of the Dependency Analysis Gradle Plugin, aka DAGP) would have several important benefits, not least of which is enhanced maintainability and easier debugging of build failures. But we also think it could have a positive impact on performance: eliminating unused external dependencies means less network overhead, and eliminating edges between local projects would enhance the effectiveness of interventions like Spotlight and Artifact Swap (discussed above).

DAGP is also able to identify projects applying the Android Gradle Plugin (AGP) that do not need to do so. AGP is a particularly “heavy” plugin, and reducing its use in the build has yielded performance wins for us previously. We also plan to refine our module structure by combining certain module types to reduce the overall project count in our build. Using Gradle’s test fixtures feature is another way that modules can be combined to reduce the overall count.

Isolated Projects, currently a pre-alpha feature of Gradle, promises to be a game-changer. It would enable fine-grained and incremental caching of the configuration phase of the build, which would make dependency updates (such as might affect only a single Gradle subproject) significantly less painful. We have done quite a bit of work already to be early adopters of this feature, including submitting patches to Android Studio, and Square’s Android build is already compliant with the isolated projects contract for sync. So far Isolated Projects has not delivered us any performance benefits, but Gradle is currently focused on validating the correctness of the implementation before focusing on performance in later 9.x releases.

Tony Robalik is a build engineer on Block's Backend Build team, and a Gradle Fellow. Josh Friend is a build engineer on Block's Android Developer Experience team and helps maintain what might be the largest Gradle build in the world. Thanks to Tim Mellor, Yissachar Radcliffe, and Gábor Pap for reviewing the draft.

Footnotes

  1. Although to be clear, there is no direct relationship between IDE sync and lines of code. This figure is provided only as a reference point to help indicate scale.

  2. Please note that the IDE plugin will not work out of the box for non-Block projects, but the code is straightforward and we hope it provides inspiration to other users. Contributions are also welcome.

  3. Via https://cdsap.github.io/ProjectGenerator/, for example.

  4. Although in practice it is probably more likely to reflect the org chart—for potentially very good reasons.

  5. Note that, for Android Studio users, this is the default behavior.

$