Your CI pipeline isn't ready for AI

In developing my AI codegen agent, I’ve surprisingly been far more frustrated by CI/testing systems than any unreliable LLM API.

I’ll often have a simple code change that’s ready to go, but it takes longer to build, review, and deploy than it took to write in the first place. This isn’t unique to my small startup—a lot of our customers apologize for the brittleness of their build pipelines when our agent starts generating pull requests.

It’s rare that I meet someone who truly loves their workflows, despite CI often being the single largest software expense outside production.

More troublingly, performance has not improved in CI at the same pace as on developer machines—it’s usually a lot slower to build our app in CI than it is to do it locally on my M1 laptop.

The fundamental problem is that pipelines are usually doing tons of repetitive work. Between pulling dependencies, rebuilding binaries, and running unchanged tests most pipelines spend 50%+ of their compute cycles on redoing ~identical work. This isn’t just costly and slow - it also causes flakes, since every unnecessary job introduces room for failure.

Of course, many different tools have been developed to solve this problem. Nx, Bazel, Nix, Turborepo, and even Docker offer different takes on the core task of caching old work and reusing it.

I’ve tried most of these solutions at some point (and I’m sure HN will suggest 5 more) but they all ultimately leave me unsatisfied.

They all offer different interfaces/languages—ranging from simple yaml to complex DSLs—but no tool really challenges the fundamental premise of asking you to define a graph of your build tasks. It’s still your job to specify what triggers each task or invalidates the cache.

Given enough effort, you can eventually define the optimal graph that will perfectly reuse caches and dependencies. But this is frustratingly slow to do—especially since the iteration loop typically involves repeatedly rerunning the entire graph. The more powerful/optimized you want to make your build pipeline the more you typically have to embrace the complex details of your chosen platform—and the further you get from a simple local dev setup. It’s not surprising that many teams end up neglecting their pipelines.

Even more frustratingly, in crafting my build graph I feel like I am often teaching the computer things it already knows.

Take this classic Dockerfile optimization to avoid reinstalling dependencies when they haven’t changed:

 WORKDIR /app
-COPY . . 
+COPY package.json yarn.lock .    # Copy only package management files
 RUN npm install
+COPY . .                         # Copy over project files
 RUN npm build                    # Run build

I understand why it’s needed with Docker’s layering approach, but in an ideal world this shouldn’t be required—yarn is already smart enough to not do ~anything if I haven’t modified my package requirements. That just isn’t transparently available to Docker.

Similarly, compiler authors put tremendous effort into supporting incremental compilation that most build tools seem to throw away. Every time I configure a new pipeline it seems like I’m putting in a lot of effort to replicate the logic that tool authors have already developed—often in a bespoke and brittle DSL.

I’m not sure I’m ready to abandon CI entirely in favor of local development, but it does seem like something needs to change to support a world where developers can move much faster with AI. Otherwise we’re going to have backlogs of hundreds of brilliant AI-generated pull requests all waiting for testing.

This comment resonates with the frustrations many developers face in CI/CD environments. The disparity between local and CI performance often leads to inefficiencies that feel counterintuitive, especially when leveraging modern tools and methodologies meant to streamline workflows.

The repetition in CI pipelines, especially with dependency management and task execution, can indeed feel like a waste of resources. It’s frustrating when tools that are designed to optimize these processes still require extensive manual configuration. As you pointed out, the current systems often seem to overlook the inherent capabilities of the tools we're already using, like Yarn or compiler optimizations.

The complexity of defining a build graph can be a barrier to maintaining efficient CI/CD workflows. If the tools could be more intelligent about understanding dependencies and caching, it would not only save time but also reduce the flakiness you mentioned, which can be a significant pain point.

Perhaps the future of CI/CD needs to focus on smarter systems that can learn from historical data, adapt over time, and require less manual specification. A shift towards tools that can inherently understand what has changed and what can be reused without extensive configuration would certainly help alleviate some of these frustrations and align better with the pace of development we see with AI advancements.

It's an interesting challenge, and as we strive for efficiency, innovation in this space could lead to major improvements in how we manage our development pipelines.

Your CI pipeline isn't ready for AI

Comments (1)

More from this blog

The return of personalized computing and the coming build vs. buy convergence

Command Palette

Comments (1)

More from this blog