Tasks for LLMs should be broad and shallow
When talking about limited intelligence of contemporary LLMs, I briefly mentioned that LLMs benefit from broad and shallow tasks. Here I expand on why I think so and what that means in practice, especially in the context of software development.
Just to be clear:
- Broad means making a lot of individual changes, touching many lines of code, easily over 100 lines.
- Shallow means that every individual change is small and simple.
Why broad and shallow?
- Diffs: Shallow tasks result in relatively clean, readable diffs.
- Reliability: Shallow tasks do not put that much pressure on LLM intelligence. Results are more reliable and code review can be reduced to skimming.
- Cost: Broad tasks essentially pack a large number of subtasks into single LLM request, which makes the per-subtask cost very low. This lets us use larger models with longer context.
- Overhead: Broad tasks spread constant overhead (issuing the request, LLM latency, ...) over more subtasks.
- Comfort: Shallow tasks are cognitively undemanding on the reviewer (me), which makes the work less tiring and encourages me to hurry.
- Consistency: LLMs tend to respond consistently to the whole task, so any stylistic or logical issues can be addressed once across the whole task.
- Autonomy: Broad tasks give the LLM enough autonomy to show its strengths, accept responsibility, and contribute to productivity.
- Scalability: Broad and shallow tasks scale to better LLMs, which can tolerate less detailed specification and need less thorough review.
What about tasks with uneven difficulty?
I am talking about software development, which necessarily involves tasks that have a wide spread of difficulty within the same task. That's not ideal, because LLMs get overwhelmed in the most complex parts of the code and that's where all the bugs are going to be.
To deal with this, I level the task description. I add more detail to the specification where the task has too much depth. Conversely, I let the LLM guess defaults where the task is shallow. During code review, I pay more attention to code that is difficult and only skim over trivial changes.
Difficult tasks in programming cannot be avoided entirely. Sometimes you have to do something manually. In the end, programming always benefits from a larger, smarter model.
Examples
Here are a few examples of broad and shallow programming tasks:
- Write unit tests.
- Write basic API documentation.
- Transform the code in ways that go beyond what IDE refactoring can do but that are still within reach of LLMs.
- Write an implementation of an existing interface according to clear and simple requirements.
- Imitate existing code to implement a similar or related feature.
More complicated tasks usually have a specification with 5-20 bullet points, each containing a clear command or constraint. System prompt is structured similarly. Formulating the specification as a list also lets me attach smaller unrelated tasks to the main task without having to issue another LLM request.