Your CI pipeline isn't ready for AI
In developing my AI codegen agent, I’ve surprisingly been far more frustrated by CI/testing systems than any unreliable LLM API.
I’ll often have a simple code change that’s ready to go, but it takes longer to build, review, and deploy than it took to write in the first place. This isn’t unique to my small startup—a lot of our customers apologize for the brittleness of their build pipelines when our agent starts generating pull requests.
It’s rare that I meet someone who truly loves their workflows, despite CI often being the single largest software expense outside production.
More troublingly, performance has not improved in CI at the same pace as on developer machines—it’s usually a lot slower to build our app in CI than it is to do it locally on my M1 laptop.
The fundamental problem is that pipelines are usually doing tons of repetitive work. Between pulling dependencies, rebuilding binaries, and running unchanged tests most pipelines spend 50%+ of their compute cycles on redoing ~identical work. This isn’t just costly and slow - it also causes flakes, since every unnecessary job introduces room for failure.
Of course, many different tools have been developed to solve this problem. Nx, Bazel, Nix, Turborepo, and even Docker offer different takes on the core task of caching old work and reusing it.
I’ve tried most of these solutions at some point (and I’m sure HN will suggest 5 more) but they all ultimately leave me unsatisfied.
They all offer different interfaces/languages—ranging from simple yaml to complex DSLs—but no tool really challenges the fundamental premise of asking you to define a graph of your build tasks. It’s still your job to specify what triggers each task or invalidates the cache.
Given enough effort, you can eventually define the optimal graph that will perfectly reuse caches and dependencies. But this is frustratingly slow to do—especially since the iteration loop typically involves repeatedly rerunning the entire graph. The more powerful/optimized you want to make your build pipeline the more you typically have to embrace the complex details of your chosen platform—and the further you get from a simple local dev setup. It’s not surprising that many teams end up neglecting their pipelines.
Even more frustratingly, in crafting my build graph I feel like I am often teaching the computer things it already knows.
Take this classic Dockerfile optimization to avoid reinstalling dependencies when they haven’t changed:
WORKDIR /app
-COPY . .
+COPY package.json yarn.lock . # Copy only package management files
RUN npm install
+COPY . . # Copy over project files
RUN npm build # Run build
I understand why it’s needed with Docker’s layering approach, but in an ideal world this shouldn’t be required—yarn is already smart enough to not do ~anything if I haven’t modified my package requirements. That just isn’t transparently available to Docker.
Similarly, compiler authors put tremendous effort into supporting incremental compilation that most build tools seem to throw away. Every time I configure a new pipeline it seems like I’m putting in a lot of effort to replicate the logic that tool authors have already developed—often in a bespoke and brittle DSL.
I’m not sure I’m ready to abandon CI entirely in favor of local development, but it does seem like something needs to change to support a world where developers can move much faster with AI. Otherwise we’re going to have backlogs of hundreds of brilliant AI-generated pull requests all waiting for testing.

