Why context stuffing?
The two most popular ways to populate the context window of LLMs are RAG and agents. I am a big fan of context stuffing though. Context stuffing tries to preload as much information as possible, ideally everything there is, into the context window before text generation starts.
Context stuffing has several advantages:
- Cost: Context stuffing uses only cheap input tokens and it processes them only once whereas agentic retrieval consumes output tokens, especially when reasoning, and it reprocesses input tokens at every turn.
- Context size: Since context window limit is more economical than technical these days, context stuffing makes larger contexts practical.
- Prompt caching: The cost advantage is even greater when prompt cache can be employed effectively. Context stuffing can produce context with highly stable prefix, which is amenable to prompt caching if the cache has long enough TTL.
- Latency: Contemporary LLMs are very inefficient at producing output tokens. Not only does it increase cost, it also tremendously slows the LLM down. Context stuffing produces pure context composed of input tokens only, which can be processed by LLMs in seconds.
- Context health: Agents tend to poison their context with incorrect assumptions and decisions, which damages decision-making in subsequent steps. Since context stuffing uses only pristine data, it cannot possibly poison the context.
- Reliability: If you can just stuff (nearly) everything in the context, there's zero chance that the agent will fail to retrieve something relevant. Even when context stuffing can load only 50% of the required information, that's still a 50% drop in retrieval failures.
- In-context learning: Keeping similar project files in the context gives the model hints about how to write new content. In a well-maintained project, imitation of existing content improves output quality.
- Awareness: Agentic retrieval requires the LLM to know upfront, which files will be needed, but relevance of a particular file sometimes becomes apparent only after reading it. Even when the file is not directly relevant, it can still subtly influence output, for example by promoting consistency.
Some people fear that overloading the context will make the LLM distracted, but reasoning LLMs can always move relevant bits of context to the end where LLM's attention is strongest. If I were to point to a real weakness of context stuffing, it would be large projects. Rule-based retrieval (recent, mentioned, core, and related files) becomes ineffective when the context is only a small fraction of total project size. For large projects, it is more effective to use agentic retrieval. Some amount of context stuffing however still helps.