Robert Važan

Hallucinations are a misunderstanding

What's commonly referred to as "hallucinations" in generative machine learning models, particularly language models, is actually a misunderstanding of how these models work. When the model hallucinates, it's trying to tell you that "there's supposed to be something here, but I am not quite sure what, so let's put an example here to illustrate what I mean." The model is not lying nor is it broken. It's just brainstorming ideas.

Image generators do not paint using a brush; they direcly imagine the picture. They cannot pause to contemplate the next stroke. The process is immediate and unfiltered. It is far more reminiscent of human imagination than of the act of painting. It's like having a window into someone's mind. It's less obvious with language models, but here too, language models do not speak; they think. Many people will tell you that language models do not think, but thinking is actually all they do. They instead cannot speak, because that would require ability to remain silent and to hide thoughts whereas output of language models is involuntary and unfiltered like an endless lucid dream. Thinking is a better metaphor for what language models are doing. Now let me ask you, how many of your thoughts are realistic, relevant, and correct?

What's more, language models can only think once. They lack the process of iterative refinement that requires numerous consecutive thoughts. How often is your first thought correct?

At a very high level of abstraction, knowledge work consists of gathering verified facts and then using these facts to formulate new hypotheses. Every output of a language model is such a hypothesis synthesized from available information. This is how language models differ from search engines, which merely enumerate existing knowledge related to the query. There are attempts to allow language models to present reliable quotes, but that just embeds the search engine in the language model without addressing the core issue: no matter how well you research your hypothesis, you will inevitably have to test it if it is to be promoted into a new verified fact. All hypotheses are inherently unreliable, and so are those produced by language models.

Language models are brainstorming tools. They produce promising ideas that need more iteration to mature. Of course, if you ask an easy question, you can get correct answer right away, but anything hard will need refinement. Nothing stops you from using the language model itself to evaluate and improve ideas, but you have to keep in mind that it's an iterative process. You cannot ever hope to get 100% correct answers to hard questions on first try.

Iteration is actually how serious business applications of language models deal with hallucinations. The same model is asked to look at the task from multiple angles, gather information, evaluate and improve the solution, and finally vet the result using multitude of checks. Completing a single task can involve dozens of model runs in a complex workflow. The purpose of the workflow is to implement a robust AI system on top of an unreliable language model, so that the AI system can be left running unattended.

Such iterative process could be conceivably implemented in interactive chatbots too, but it would be very costly and it would be challenging to make it respond fast enough for interactive use. It's also hard to devise workflow that would be right for every use case that users can come up with. It's much easier to just expose the bare language model and let the user iterate in a chat session.

Instead of expecting chatbots to produce perfect output, which is actually detrimental to creative use cases, it would be better to communicate to the user which parts of the output are unreliable. Language models (and probably all generative models) have an internal sense of confidence in their output. Aside from internal states, confidence manifests as narrow output distribution whereas uncertainty results in wide output distribution. However represented, this confidence signal is not currently exposed on user interface level in any way.

You could conceivably train the model to use cautious language when it is not sure, but that would result in an impractical, verbose output full of defensive language. I find it more appealing to use highlighting to draw attention to passages where the model is uncertain. It's however not that easy to implement, because token-level confidence usually dips only on the first few tokens of the hallucination. You would need a mechanism to propagate the confidence signal to dependent downstream tokens. Another issue is that choosing between two synonyms is not as important as choosing between yes and no.

I hope I casted some light on the issue of hallucinations. Instead of being a flaw looking for a fix, hallucinations are better understood as an inherent property of generative models, which have to come up with a reasonable hypethesis by thinking about the matter once. Such process cannot ever be reliable. Instead of looking for a mythical hallucination-free model, learn to live with hallucinations. In interactive chat, treat the conversation as informal brainstorming and iterate on proposed ideas. To enable automation, implement a robust workflow that encourages the model to critically evaluate and improve proposed output. In the future, we will hopefully have user interfaces that draw attention to unreliable parts of the output.