LLMs are reshaping source code
Coding with LLMs is like using a forklift in a warehouse. It lets you do things in bulk, but you have to adapt the environment for it. To use a forklift, you have to palletize everything, remove barriers, keep the floors free of garbage, and leave wide aisles between racks. LLMs can tolerate some amount of mess, because they are smart, but they still benefit greatly from LLM-friendly code organization.
I am personally observing the following changes in my source code. Sorry for the long laundry list, but I couldn't come up with any sensible grouping. I have at least sorted the list by how profound the changes are.
Standardization
Frontier LLMs are big and they know a lot, but they still perform better with popular languages and libraries. Niche libraries and internal utility code are best used only where necessary.
Small projects
The project does not have to fit in LLM's context window and most project's don't. But LLM performance does drop as project size increases. The smaller you can make your project, the better LLMs will perform. If your project is growing so much that LLMs struggle, because they can only see a small fraction of the project at a time, it might be more productive to spin off some of the code into an independent library.
Modular code
Even if the project is small, it is unlikely to entirely fit in the context window. Only part of the project will be loaded in the context at a time. LLM's success rate then depends on how much information is it missing. If the loaded subset is sufficiently self-contained, the LLM will be able to do its job well even with this partial context. Modularity is advantageous even if it costs some code duplication.
Duplication
Human software developers are creatively lazy. We write utilities for everything. The downside is that the code then looks like it was subjected to fractal compression. It's short but complicated and densely interlinked. LLMs on the other hand prefer simple, shallow, and isolated code even at the cost of duplication. This balloons code, but every part of the code becomes simpler.
Tests
Tests are very cheap to write with LLMs. There aren't many excuses left to leave code untested. Tests give you confidence in product quality and they serve as a safety net for LLM mistakes.
Documentation
LLMs don't have the context and empathy to write user-facing documentation, but they can write decent internal documentation and the first draft of user documentation. Documentation is now so cheap, it is reasonable to expect it everywhere. Inline documentation dilutes context, but the improved comprehension is worth it. Documentation allows the LLM to understand what given internal API does without seeing all functions in its call graph, so there's less pressure to fit everything in the context.
Overviews
I have developed a neat technique for dealing with context window limits. I instruct the LLM to create and maintain directory overview files that summarize what's in the files and subdirectories of the current directory. In the root directory of the project, there are project overview files that summarize the whole project from several angles. Overview files allow the LLM to understand the parts of the project it cannot see. While overview files could be generated dynamically by agentic tools, I find it more practical to commit them into the repository.
Clean code
Did you notice how clean the modern factories are? The more automation there is, the cleaner the factory. It turns out that mess is expensive. It slows everyone down, causes accidents, damages equipment, and contaminates products. Similar phenomenon can be seen in code. Messy code causes LLMs to make mistakes, increases API costs, slows down and confuses reasoning, and encourages the LLM to reproduce the mess in new code. LLMs perform better in clean code. Fortunately, they can also do most of the cleanup.
Low code density
This is somewhat controversial. Dense code lets you fit more of it in the context. But low density code is easier to write and understand and it tends to have cleaner diffs and easier reviews. Easy reviews and higher LLM reliability are currently more important for productivity, so low-density code wins. You can always use a tool to generate a terse symbol map or source code map and include it in the context.
Conventions
Hallucinations in LLMs are treated as a flaw, but you can turn them into a feature. If your code is highly predictable, LLMs will be able to produce correct code even when they cannot see the called code. LLMs also tend to imitate existing code. If you establish clear conventions, LLMs will be more likely to generate correct and clean code that you find acceptable.
Short files
There is only so much space in LLM context window. You don't want to waste it by loading a huge file when only a small part of it is relevant. You don't want RAG-style shredding into chunks nor similarly destructive agentic line range loads either. You want the LLM to see the whole file as a logical unit of information. I therefore try to keep files short, ideally about 100 lines (1,000 tokens). Of course, some files end up being 10 lines long while the largest ones are over 500 lines long, but as long as the median tends towards 100 lines, the LLM will handle the project well.
Interfaces
I find myself introducing interfaces even if they have only one implementation. Interfaces are not suitable everywhere, but wherever applicable, they let you separate concepts from their implementations. LLMs then have to look only at the interface file to make good use of that part of the project.
Concrete imports
While wildcard imports are convenient, they make it hard for LLMs to guess which file does given symbol come from. Concrete imports take up more space in the context, but they let LLMs determine qualified symbol names without having to see the whole project.
It's not just about speed
I find these changes beneficial to the overall quality of the project. Many of them are actually old software engineering practices. LLMs really push you to improve the code into releasable quality.