Platform March 20, 2026 12 min read

Convention Over Guessing: Why Your Codebase Isn’t Ready for AI Agents

The fix isn’t better prompts. It’s the engineering foundations you’ve been deferring.

The codebase is the prompt

Run a swarm of AI coding agents in parallel against a backend with a complete OpenAPI spec, strict TypeScript, shared type libraries, uniform linting, and CI on every push, and the agents don’t need hand-holding. They read the spec, import the types, follow the lint rules, and push code that CI validates automatically.

Point the same agents, the same model, and the same prompts at a codebase without those foundations, and the output is dramatically worse. Agents invent their own types, guess at API response shapes, duplicate each other’s work, and produce code that compiles but breaks at runtime.

The difference isn’t the agents. It’s the codebase they’re working in. The foundations that exist before anyone types a prompt determine whether the agents are productive or just fast at producing tech debt.

Convention over guessing

Convention over configuration was the insight that made Rails productive. You didn’t configure where models lived, how routes mapped to controllers, or how the database connected. You followed conventions, and the framework did the rest.

AI agents need the same thing, but at a different level. They don’t need framework conventions. They need codebase conventions: consistent type systems, explicit API boundaries, uniform lint rules, predictable project structure. When these exist, an agent can read a type definition and write correct code against it. When they don’t, the agent guesses. And guessing at scale means every agent produces the same mistakes independently.

This is the core thesis: the engineering foundations that make human developers productive are the same foundations that make AI agents effective. The difference is that humans can compensate for missing conventions with tribal knowledge. Agents can’t. Every gap in your codebase conventions becomes a category of errors that agents will produce reliably and at volume.

Type safety is the agent’s guardrail

Strict typing is the single highest-leverage investment for AI-assisted development. Not because agents can’t write untyped code, but because types turn runtime surprises into compile-time errors.

When an agent generates a function that returns the wrong shape, strict types catch it before the code runs. When an agent calls an API endpoint and destructures the response incorrectly, the type checker flags it. When two agents working in parallel produce incompatible interfaces, the build fails immediately instead of silently until someone tests the integration path manually.

Shared type libraries compound this further. Instead of each agent (or each developer) defining their own version of a User or Order type, everyone imports from one canonical source. One definition, one place to update, zero drift.

The strongest version of this is generating types directly from your API spec. The frontend and backend share a single source of truth for every data shape. When an agent writes a component that consumes an API response, the types are already there. No guessing, no "I think the user object has an email field." The spec says it does, the type enforces it, the compiler verifies it.

A complete API spec is a contract, not documentation

Most teams treat their API spec as documentation that gets updated when someone remembers. That’s fine for human developers who can read the actual endpoint code when the docs are stale. It’s fatal for AI agents.

A complete OpenAPI spec (or tRPC, which gives you this by construction) is three things in one file: documentation that describes what the API does, a type system that defines every request and response shape, and a test oracle that lets you verify generated code against the contract.

AI agents use all three. They read the spec to understand which endpoints exist. They use the types to generate correctly shaped requests and handle responses. And CI can validate that generated code actually conforms to the spec, catching drift before it reaches production.

The key word is complete. A partial spec is worse than no spec because agents will treat it as authoritative. If your spec covers 60% of your endpoints, agents will generate correct code for those 60% and hallucinate the rest with high confidence. Either commit to maintaining the spec fully, or use a framework like tRPC that makes the spec a byproduct of writing the code.

Monorepos give agents (and humans) the full picture

The monorepo debate usually centers on developer experience and build tooling. For AI-assisted workflows, the argument is simpler: agents work best when they can see everything.

In a monorepo, an agent working on the frontend can read the backend’s type definitions, API routes, and validation logic without leaving the repository. Shared type libraries live next to the services that consume them. The linter configuration is uniform, so every file in the repo follows the same rules. CI runs against the entire dependency graph, so a change in a shared library triggers tests in every consumer.

But it’s not just types and lint. Git history matters. In a monorepo, git log and git blame tell a coherent story across the entire stack. An agent (or a human) can trace a change from the API handler through the shared types to the frontend component that renders the data, all in one commit history. In a polyrepo setup, that same investigation requires jumping between repositories, matching commit timestamps, and hoping the deploy order was correct.

This is the context advantage that compounds. Every new service, every new shared library, every new team member benefits from having the full codebase in one place with one history. AI agents amplify this advantage because they can process the full context in a way humans can’t. An agent given a monorepo with 500 files will find the right type definition in seconds. The same agent given five separate repos will ask you which one to look in.

CI, observability, and preview infra: the feedback loop that makes agents autonomous

Type safety, API specs, and monorepos give agents the information they need to write correct code. But writing code is only half the job. The other half is knowing whether the code works.

This is where CI/CD, observability, and preview infrastructure create a compound effect that’s greater than the sum of its parts.

CI is the agent’s test suite. When an agent opens a PR, CI runs the linter, the type checker, the unit tests, and the integration tests. If anything fails, the agent gets structured feedback: which check failed, on which file, with what error message. A well-configured CI pipeline turns "does this code work?" from a subjective human judgment into a binary signal the agent can act on.

Preview environments are the agent’s sandbox. When CI passes, a preview environment deploys the change automatically. Now the agent (or the human reviewing its work) can verify the change against real infrastructure without touching staging or production. Preview environments make it safe to let agents experiment, because every experiment is isolated.

Observability closes the loop. Structured traces and logs from preview environments tell you not just whether the code runs, but how it behaves. Did the new API call add 200ms of latency? Is the database query missing an index? Is the error rate higher than the baseline? These are questions that logs and traces answer automatically, without anyone manually testing every edge case.

Each of these alone is valuable. Together, they create a feedback loop where an agent can write code, get it validated by CI, see it deployed to a preview environment, and observe its runtime behavior, all without a human in the loop until the final review. That’s what turns AI agents from "fancy autocomplete" into autonomous contributors.

Stop repeating yourself. Write a rule.

Here’s a pattern we see constantly: a team catches agents (or junior developers) making the same mistake over and over. Using console.log instead of the structured logger. Calling a deprecated API endpoint. Importing from a barrel file that causes circular dependencies. Wrapping a database call without the standard error handler.

The typical response is to add it to the contributing guide, mention it in code review, or write a comment in the codebase. The agent will never read the contributing guide. The comment might be in a different file. The mistake keeps happening.

The better response: write a custom lint rule. Or an ast-grep pattern. Or a code action. Turn the convention into an automated check that runs on every commit. The next time any agent (or any human) makes that mistake, CI catches it with a clear error message explaining what to do instead.

The compounding effect is real. Every custom rule you add is a convention that never needs to be explained again. Your coding agent can even write the rule for you. Spot a repeated mistake in a PR, describe the pattern, and let the agent produce a lint rule or structural search pattern that catches it going forward. You’ve just turned a code review comment into permanent institutional knowledge.

This is what scales. Not better prompts, not longer context windows, not more detailed contributing docs. Automated, enforceable conventions that apply equally to every contributor, human or otherwise.

The platform investment reframed

Teams often defer these foundations because they feel like overhead. Strict types slow you down. Maintaining an API spec is busywork. Monorepo tooling is complex. CI takes time to set up. Preview environments cost money.

All of this is true in isolation. But the calculus changes completely when you factor in AI-assisted development. Every foundation you invest in today multiplies the effectiveness of every AI tool you adopt tomorrow.

This isn’t a speculative bet on future tooling. The tools exist now. Code-generation agents, automated PR review, test generation, migration assistants. The teams that get outsized value from these tools are the teams with codebases that are typed, linted, spec-driven, and continuously validated. The teams that struggle are the ones asking AI to work in codebases where humans already struggle.

Convention over configuration made individual developers productive. Convention over guessing makes AI agents productive. The platform you build today determines whether AI is a multiplier or a source of noise.