What do we consider good architecture in the LLM era?

We have developed principles, methodologies and tools for architecture or code design at any level, because of course, making the right choices will make the codebase more maintainable, less error-prone, easier to understand and extend. But for who, for humans or for agents?

If you happen across my reviews, discussions, PRs in OpenHands in the past year and a half, you’d find me saying a lot that “x is good for humans, and it’s good for LLMs”. X could be anything from variable naming to code design choices at any level of abstraction.

It’s not always true, though, that what is good for humans is good for LLMs. I think a lot about the code design choices that are better for LLMs. Locality of behavior, for one.

Locality of behavior comes from Richard Gabriel’s Patterns of Software, where he simply describes it as:

“The primary feature for easy maintenance is locality: Locality is that characteristic of source code that enables a programmer to understand that source by looking at only a small portion of it.”

Gabriel was writing about what makes code maintainable — the ability to look at a small piece and understand it without chasing references across the codebase.

When you prioritize locality of behavior, that comes at the expense of other code design principles, like DRY. You are effectively saying, repeat yourself here or there if that helps confining behavior on the same page in your editor. So the other pieces of code that do a bit of the same thing are also in front of your eyes when you look at other behaviors.

This does apply to humans, there’s a point at which you don’t want to DRY, because maybe the pieces of behavior become too spread out and their flow is not easy to read and follow.

But I think it’s doubly true for LLMs. Agents are able to follow execution paths, but there’s some depth or complexity around it where they get lost. This essentially depends on the LLM capabilities.

I was star-struck when I saw this capability increase beyond average human levels, about this time last year, with Gemini 2.0 Flash-Thinking. It was capable of amazing code exploration depth! It helped me through OpenHands codebase, which at the time has some 5 million tokens. A small model, yes, a Flash, and yet, it got it. It got it every single time that I can remember. Sonnet, at least until 4.5, did not. It kept rushing and guessing badly.

It seems it’s changing, though. Gemini 2.5 Pro had that ability too, and later, GPT-5.0 and its siblings could take hours and still keep track.

So what does this mean for architecture? I’m not sure. When models couldn’t follow deep execution paths, locality mattered more — keep things close, even if you repeat yourself in some ways. Now that the best models can follow the path, maybe the old trade-offs apply again.

Or maybe not. There’s something to be said for code that’s easy to read in one screen, regardless of who — or what — is doing the reading. Humans still review. Humans still debug at 2 a.m. And context overload is real for both humans and LLMs.

I don’t have an answer. But I think about it a lot: the code design that’s good for humans, good for LLMs, and good for the humans who work with LLMs. They’re not always the same thing.