AI Governance & Safety April 30, 2026 · 9 min read

The Virtue of Laziness: Why AI-Generated Code Is Making Systems Larger, Not Better

Four independent studies confirm AI coding tools increase code churn, bug rates, and system complexity. Bryan Cantrill's argument that LLMs lack the 'virtue of laziness' is not a hot take; it is a testable hypothesis with growing empirical support. Here is the governance framework to manage it.

By Vikas Pratap Singh

#ai-governance #code-quality #ai-agents #software-complexity #engineering-leadership

Executive Briefing

What this covers: The empirical case that AI coding tools increase system complexity, not just productivity. Four independent studies (GitClear, Uplevel, CodeRabbit, Google DORA), Bryan Cantrill's 'virtue of laziness' thesis, historical precedent from prior constraint removals, and a complexity governance framework with metrics and practices.
Who should read it: Engineering leaders, principal architects, AI Governance teams, and anyone responsible for codebase health in organizations adopting AI coding tools.
Key takeaway: AI coding tools amplify whatever incentive structure you already have. If you measure velocity, they will make you faster. If you measure simplicity, they will make you simpler. Most organizations only measure velocity. The complexity is accruing unchecked.
Bottom line: Every prior removal of an engineering constraint (cheap storage, cheap compute, cheap bandwidth) expanded systems without improving them. AI-generated code is following the same pattern. You need a complexity governance framework alongside your productivity metrics.

A Demo Built in One Sitting

A few weeks ago I built a data AI agent with Claude Code in a single sitting. The one-million-token context window felt like license to brainstorm, implement, and ship in one pass. The prototype worked. Then I ran a code review and security audit on the repo using my own custom skills, and the result was uncomfortable: dead code in three modules, abstractions wrapping a single call, patterns repeated where one helper would have done. Nothing was technically broken. The system was just bigger than it needed to be.

I caught it because I went back. If I had handed the repo to a collaborator before that second pass, the first question would have been: “Why is the sloppiness so obvious from a quick audit?” The honest answer: nothing in the loop had pushed back. I asked, the model produced. Asking cost me nothing. Producing cost the model nothing. So we both kept going.

That is, in miniature, what Bryan Cantrill calls “the peril of laziness lost.” To see why the framing matters, it helps to start with a much older idea about what makes a great programmer.

The Programmer’s Strangest Virtue

Larry Wall, the creator of Perl, once named three virtues of a great programmer: laziness, impatience, and hubris. Not as jokes. As genuine design principles.

Laziness, in Wall’s framing, is “the quality that makes you go to great effort to reduce overall energy expenditure.” It is the force that makes a developer write a reusable function instead of copying the same logic four times. It is the pressure that produces clean APIs, thoughtful abstractions, and the relentless question: do we actually need this?

This sounds like a personality quirk. It is actually a constraint. Human time is finite. Because our hours are limited, we are forced to optimize. We compress, abstract, and simplify not because simplicity is an aesthetic preference, but because we literally cannot afford to maintain the alternative. The constraint of limited time produces system quality as a side effect.

What happens when you remove that constraint?

Cantrill’s Thesis

Bryan Cantrill, CTO of Oxide Computer, published an essay on April 12, 2026 titled “The Peril of Laziness Lost”. His core argument:

“LLMs inherently lack the virtue of laziness. Work costs nothing to an LLM.”

And the consequence:

“LLMs will make systems larger, not better: appealing to perverse vanity metrics, perhaps, but at the cost of everything that matters.”

Cantrill’s point is precise. When you ask a human developer to add a feature, the developer weighs the cost of writing and maintaining that feature against the value it provides. When the cost is high (their time is limited), they find ways to reuse existing code, simplify the interface, or push back on unnecessary scope. “The best engineering is always borne of constraints,” Cantrill writes, “and the constraint of our time places limits on the cognitive load of the system.”

When you ask an LLM to add a feature, it adds the feature. It does not weigh the maintenance burden. It does not ask whether the existing module could be restructured to handle this case more elegantly. It does not push back and say “this would be simpler if we changed the API.” Generating code costs the LLM nothing. There is no constraint to produce quality.

This is not a hot take. It is a testable hypothesis. And four independent studies now support it.

The Pattern Is Not New

Before examining the AI evidence, it helps to see the broader pattern. Every time engineering has removed a binding constraint, systems expanded rather than improved.

Cheap storage (2000s). When hard drive costs collapsed from roughly $12/GB in 2000 to pennies by the 2010s, organizations stopped curating data and started hoarding it. The result was not better data management. It was data swamps: sprawling lakes of unstructured, ungoverned, undocumented data that cost more to manage than the storage itself. An entire Data Governance industry exists partly because cheap storage removed the constraint that forced people to think about what to keep.

Cheap compute (2010s). When cloud computing made CPU cycles effectively unlimited, brute-force approaches replaced algorithmic elegance. Microservices architectures proliferated, often creating distributed complexity that exceeded the monolith they replaced. Amazon Prime Video’s widely-discussed 2023 consolidation of a Step Functions-based pipeline into a single service cut infrastructure costs by 90%: a blunt indicator of how much hidden complexity had accrued when orchestration felt free.

Cheap bandwidth (2010s-2020s). When mobile bandwidth became abundant, web pages ballooned. HTTP Archive data shows the median desktop page grew from 669KB in 2012 to 2,312KB by 2022. A 246% increase in ten years. JavaScript bundles expanded to fill the available pipe. The constraint that once forced developers to be selective about what shipped to the browser was gone, and the systems grew accordingly.

The through-line: when a resource becomes cheap, consumption increases faster than the improvement it enables. Economists call this Jevons’ paradox. Engineers call it Tuesday.

Cheap code generation (2024-present). AI coding tools now make writing code nearly free. The pattern predicts exactly what the studies are finding: code volume expanding to fill the reduced cost, with systems getting larger without getting better.

The shape of the pattern, drawn out across four eras:

The first three eras are settled history. The fourth is the open case the rest of the article addresses.

Four Studies, One Direction

Study 1: GitClear (211 million lines of code)

GitClear’s 2024 and 2025 analyses are the largest known structured datasets on AI-assisted code quality: 211 million changed lines authored between 2020 and 2024.

The headline finding: code churn (lines changed within two weeks of being written) is projected to double compared to the 2021 pre-AI baseline, tracking closely with rising Copilot adoption rates across the dataset.

More telling than the churn is what replaced it. The percentage of changed lines associated with refactoring (code that was restructured or improved) dropped from 25% in 2021 to less than 10% in 2024. Meanwhile, copy-pasted (cloned) code rose from 8.3% to 12.3%.

What this looks like in practice. More code is being written. Less code is being refactored. More code is being duplicated instead of abstracted. The system is getting larger without getting better. Cantrill’s prediction, measured at scale.

Study 2: Uplevel (800 developers)

The Uplevel study tracked nearly 800 enterprise developers, comparing 351 developers with Copilot access against 434 without.

The result: a 41% increase in bug rate for the Copilot group. PR throughput (the number of pull requests merged) was unchanged. PR cycle time decreased by only 1.7 minutes.

Read that again. More bugs. Same throughput. Nearly identical merge times. The tool made it easier to write code that was wrong.

Study 3: CodeRabbit (470 pull requests)

CodeRabbit’s 2025 analysis compared 320 AI-coauthored PRs against 150 human-only PRs:

Metric	AI-generated	Human-written	Ratio
Issues per PR	10.83	6.45	1.7x
Critical issues	1.4x higher	Baseline	1.4x
Logic errors	1.75x higher	Baseline	1.75x
XSS vulnerabilities	2.74x higher	Baseline	2.74x

AI-generated PRs were also 1.88x more likely to introduce improper password handling and 1.82x more likely to implement insecure deserialization. CodeRabbit notes a methodological caveat: at scale, they could not guarantee all human-labeled PRs were purely human-authored, which means the actual gap could be wider. The security implications alone justify a governance response.

Study 4: Google DORA (3,000 developers)

Google’s 2024 DORA Report, based on survey responses from roughly 3,000 professionals across industries, found that a 25% increase in AI adoption correlated with:

1.5% decrease in delivery throughput
7.2% reduction in delivery stability

Developers reported feeling more productive. The delivery metrics said otherwise.

For practitioners: The 2025 DORA Report reversed this finding, showing a positive relationship between AI adoption and delivery performance. The correction is not a refutation of the thesis. It is confirmation that the default trajectory (expand, degrade) can be overcome with deliberate effort. But the effort is required. Without it, you get the 2024 results.

The Second Maintainer Problem

Every line of code has two costs: the cost to write it and the cost to maintain it. AI has collapsed the first cost to near zero. It has done nothing to reduce the second.

John Ousterhout’s A Philosophy of Software Design distinguishes between tactical programming and strategic programming. Tactical programming is focused on getting features working quickly. Strategic programming prioritizes the long-term design of the overall system, investing time to produce clean designs and fix problems. Ousterhout recommends spending 10-20% of development time addressing complexity.

AI coding tools, as currently used, are overwhelmingly tactical. They optimize for the current prompt, not for the six-month horizon when a different engineer has to debug the code at 3 AM.

When a human developer writes code, they anticipate the maintenance cost because they know someone (possibly them) will have to live with it. That anticipated pain is the constraint that drives simplicity. An LLM feels no such anticipation. The maintenance cost is externalized to the humans who come after.

This is Cantrill’s “laziness lost.” The human constraint that produced clean systems is absent from the tool, and nothing has replaced it.

A Complexity Governance Framework

If AI coding tools are expanding systems without improving them, organizations need to govern complexity the same way they govern Data Quality or security: with explicit metrics, thresholds, and review processes.

Metrics to track alongside velocity:

Metric	What it measures	Signal
Code churn rate	Lines changed within 14 days of writing	High churn means code was written too fast to be right
Duplication rate	Percentage of near-duplicate code	Proxy for missing abstractions (GitClear: rose from 8.3% to 12.3%)
Cyclomatic complexity trend	Decision branch density over time	Growing complexity without deliberate architectural decisions
Dependency count growth	New external libraries per quarter	Each dependency is a maintenance and security liability
Review-to-merge ratio	Comments per PR before merge	Declining ratio may signal rubber-stamping of AI output
Bug rate per KLOC	Production defects relative to code volume	The outcome metric that matters most

What this looks like in practice. Start with two metrics: code churn rate and duplication rate. Both are measurable with existing tools (GitClear, SonarQube, Code Climate). Establish a pre-AI baseline, then track the trend after AI tool adoption. If churn doubles and duplication rises while velocity metrics improve, you are experiencing Cantrill’s prediction in real time.

Governance practices:

Complexity budgets. Set a ceiling for cyclomatic complexity per module. AI can generate as much code as it wants, but it must stay within the budget. This reintroduces the constraint: “Is this new code replacing old code, or just piling on?”
Mandatory simplification cycles. For every N sprints of AI-accelerated feature work, dedicate one sprint to AI-assisted simplification. Use the same tools to reduce what they expanded. Ousterhout’s 10-20% rule applies.
Architecture decision records for AI-generated patterns. When AI introduces a new pattern (a new service, a new abstraction layer, a new dependency), require the same ADR documentation you would for a human-proposed change.
“Second maintainer” reviews. Before merging AI-generated code, apply one test: “Would a new hire understand this in six months?” If the answer is uncertain, simplify before shipping.

The Deeper Point

The virtue of laziness is a specific instance of a broader principle: constraints drive quality.

In writing, word limits force clarity. In product design, budget constraints force prioritization. In Data Governance, storage limits forced curation. In architecture, resource limits forced elegance.

When you remove a constraint, you do not automatically get better outcomes. You get more outcomes. And more is not better. It is just more.

The AI industry’s dominant framing of “removing constraints” as progress deserves scrutiny. Sometimes the constraint was the thing producing the quality all along.

This does not mean AI coding tools are harmful. It means they are powerful, and powerful tools require governance. A chainsaw is more productive than a handsaw. It is also more dangerous. You do not respond to the danger by banning chainsaws. You respond by requiring safety training, protective equipment, and operational procedures. Complexity governance is the safety training for AI coding tools.

Do Next

Priority	Action	Why it matters
This week	Establish pre-AI baseline metrics for code churn and duplication	You cannot measure degradation without a baseline
This month	Run a complexity audit on your three largest AI-assisted codebases	Identifies whether the pattern is already present in your organization
This quarter	Implement complexity budgets per module with automated enforcement	Reintroduces the constraint that AI removes by default
Ongoing	Dedicate 10-20% of sprint capacity to strategic simplification	Ousterhout’s rule: invest proportionally in complexity reduction
Cultural shift	Redefine “productive” to include simplification, not just generation	The metric you optimize is the outcome you get