It’s almost as if these large language models are extremely unsuited to the primary application being imagined for them: knowledge retrieval.
The choice of the term “hallucination” to describe what is happening with these language models is fundamentally misleading, and feels calculated to obscure their probabilistic nature.
Characterizing the fabricated output as “hallucination” means, by implication, that correct output is “knowledge”—that the incorrect results are aberrations, not the result of the exact same blind processes that sometimes produce factually sound results.
This comment ignores some linguistic history. Historically, if you allowed a language model to “hallucinate,” you allowed it to generate text (rather than, e.g., assign a probability to text that comes from another source). So in a the technical sense of the term, all generations of an LM are hallucinations (not just the ones one doesn't like).
You could argue that "hallucination" is a more accurate description - these systems literally have no mechanism to separate facts from lies - they have no intent to lie or tell the truth and can't represent those concepts.
Humans recognize hallucinations as wrong because they have systems in the brain that say "that can't have been real".
LLMs can't recognize lies because they don't have referents for "real".