And The Timeless Fight Against Entropy Goes On

Image by Zoltan Tasi at Unsplash - https://unsplash.com/@zoltantasi

Way back in the 1970s, Manny Lehman and László Bélády, two computer scientists, formulated what became known as Lehman's Laws (no idea why Bélády missed out on the name check). The laws formalise what most software engineers know in their bones: managing software is, to quote Jerry Maguire, an up-at-dawn, pride-swallowing siege that they never fully tell you about.

Although, to be fair, some engineers never stop going on about it. Your experience may vary.

Entropy

The moment a codebase surpasses more than a few hundred lines, the forces of external demand for change and adaptation and new features compete for attention against opposing forces of decay and disorder. The system evolves. Complexity increases. Quality declines.

Entropy is a default in software, and fighting it is deliberate work.

Lehman's Laws say this more formally, but the gist is that a system in use must be continually modified or it will become less useful and that this work will increase its complexity (and complexity tends to make quality hard to maintain) unless active work is done to reduce it.

Fast forward to 2016, and Erik Bernhardsson developed the git-of-theseus project and made the concept visible, showing how codebases both shed old code and grow over time. Note: if you want to try this on a repository of your own, there's an updated clone called Better Git of Theseus.

Who Will Think of the CFO?

Why talk about this at all? Because if you are in a business that uses any developed code, everything about Lehman's Laws means cost. Thousands on engineers, thousands on tool support, thousands on AI products, thousands on hosting, re-architecting, testing, meetings, delays, lack of features, lost customers, downtime, the list goes on. The bigger and older the code, the greater the impact. It never stops and it never will.

In more positive news, in the last 20 years, good tech teams have perfected approaches to keep on top of this entropy. Short delivery cycles, continuous small change scope per release, more automated testing than you can shake a stick at, cloud, clean code practices, red-green refactor. It's hard to prevent all wasted costs, but with discipline, it's possible to make change and maintain order in the same time window. Without it, today's change is a future cost time bomb.

Contrary to what many doomers might say, LLM-generated code isn't always terrible. And it isn't junior quality vs senior. Task completion is pretty good and getting better (though, for balance, this is at the expense of a much higher token burn).

But code reuse in LLM-generated code is not great. A peer-reviewed paper from January this year called "More Code, Less Reuse" showed that AI code is structurally disposed to get harder to maintain, mostly because LLM agents frequently disregard reuse opportunities, which produces higher redundancy than human-developed code. Worse, the paper concludes that the surface-level plausibility of AI code masks this redundancy, which leads to a silent accumulation of technical debt.

Redundancy. Entropy. Tech Debt. Unnecessary cost. Lots of cost.

The code will work. The code may not have bugs. And that's part of the problem. Because that will get a pass in most pipelines. But over time, thousands of tiny bits of redundant code are being smuggled into production.

Again, for balance, let's cross-examine that point. Code will work. Code will have no bugs. Code will pass through most pipelines. So why should anyone care? It was fast and cheap to make. Surely, most businesses would take works-with-no-bugs if the production of the code was cheap.

Well, as Lehman and Bernhardsson pointed out. That code will attract entropy over time. Changes, fixes, improvements, patches. Complexity will naturally increase. Changes will not be applied uniformly to all redundant copies. The context window issue that causes the redundancy applies to changes as much as it does to creation. And now the software will have bugs. Bugs and more tech debt.

Gitclear research found that, as AI coding increased, refactored (entropy-fighting) code declined by more than half, duplicate code blocks increased by 8x, and copy/pasted code increased by 48%. That's not to say LLMs caused it, only that one is happening with the other. And also Gitclear sell code quality tooling. But whilst direct causation cannot be proven, it reads as thorough research, and it's a fairly safe bet that some of that increase in entropy is down to LLM use.

But What About Code Reviews?

Surely, the answer is just to keep a human in the loop and double down on code reviews, as one might do with code from a junior developer.

Well, no. The "More Code, Less Reuse" paper also found that reviewers expressed a more neutral-to-positive sentiment toward AI code than human code (even where code entropy was higher). And, in a document on better ways to review pull requests by Tenki, there's a reasonable suggestion as to why this might happen. With human code, a human has to read a ticket, do some research on the codebase, and then submit their code PR. Another human reads the ticket and the code differences, assuming the submitter understands what they wrote and that they already did the work of reusing and building upon previous code.

What's actually happening is the submitter accepted code that looked right and moved on. The reviewer accepts it on the same basis.

Volume & Permission Fatigue

And humans are also human. We get fatigued. Anyone who's been asked by Claude for permission to run an npm command for the 4th time very quickly starts selecting the "yes, and stop asking" option. Toolmakers are adding deliberate friction; we are overruling it. Anthropic even calls the flag `--dangerously-skip-permissions`

GitHub reported recently that Copilot code review had processed over 60M reviews (a 10x increase in one year). Agent reviews are becoming the norm because tired humans can't keep up.

Unsupervised Shipping

We are entering a world of unsupervised shipping. Open-source communities have reacted to provenance issues and the sheer review burden by banning LLM-created pull requests or, in some cases, banning outside contributions entirely. In the rush to get more done faster, the closed-source world is moving to agent reviews.

If code was reviewed in a forest of agents and no one was there to see it, was it reviewed?

Good question. Why not simply create agents to review the code? After all, they never get tired, and they also don't care about volume of change.

Three reasons.

Reason One: Agents are probabilistic. There are ways to find duplicate code (e.g. Rabin-Karp), and these are all deterministic with no false positives (false positives would bring humans back into the loop). An agent alone might find them, and then again, it might not. Agents in this context are a definite downgrade, a bit like using agents as compilers.

Reason Two: If context limitations lead to code entropy, then the same will apply at review time. It's the same lack of capability being asked to review itself. And there's good evidence, which is fairly intuitive when you think about it, that models find other models' output very plausible. If humans and agents both find it plausible, where is the critical thinking happening?

Reason Three: Who polices the police? If agent-1 writes the code and agent-2 reviews the code, whose reviewing the agent-2? Another agent? A human? No, because we already determined there's too much there to review, and agent-2 is generating just as much review fodder as agent-1. When the choice is trust or check, the default at scale will be trust.

Right. Where are we then?

Everyone wants more stuff, which means there's always going to be demand for more code. No one sets out to write bad code, and no one enjoys maintaining it. LLMs don't create bad code per se, and they don't create junior-level code. They do, though, create code with hidden entropy. More code keeps coming, and more code means more entropy, which requires more discipline, which LLMs may help with, but they can't solve the very problem they are creating, which means more cost to deal with later.

AI hasn't removed the need for Lehman's anti-entropy effort; it has moved the place where that effort is needed and increased the amount of effort required. And the evidence shows that the effort required is not being applied.

Not to be too dramatic, but for a lot of organisations, this is a time bomb being assembled in plain sight.

Help?!

OK. A few things to start doing immediately.

Register, but ignore, the small vibe apps for now. Someone in marketing made a small app that helps them do their job. It's not without risk. The LinkedIn posts on the algal bloom of Shadow IT are no doubt right around the corner, but that's out of scope for this post. Security is the biggest risk there. No doubt, apps on the public internet hooked into what were once highly protected APIs will keep security awake at night, but entropy is less of an existential risk.
Read that Tenki link and build a review process that implements all of it. This isn't a post about any one tool, but tools are your friend here.
Make sure reviews include looking for new helpers that duplicate existing ones under different names, code that hallucinates correctness (e.g. compiles, passes tests, but misses acceptance criteria in the ticket) and tests that were removed or weakened to help them go green.
Embrace Domain Driven Design. AI-assisted development is improved by small, discrete code modules that encapsulate their own context. Small bounded units fit context windows, reviews, and testing. The aim here is a tidy unit of modularity. And do this manually. Run in-person workshops (e.g. event storming) which help build a collective understanding of how the business works and how the architecture maps to it. Create, then share, then update and maintain these artefacts. Having a reference model of what you are aiming for will help human and AI alike.

Our summary is this: this is the most interesting era software development has ever seen. It's also the most risky. Everything has changed, and yet everything is still the same. There are no magic products, and be wary of anyone who wants to take your money to tell you there are. Building what customers want will always be hard. That's why it's so rewarding.

If you'd like to hear more about this or any other topics related to implementing AI with discipline, just drop us a line.