Nordia News

AI Hallucinations: A Business Guide

By Kjell Steffner
Published: 02.07.2026 | Posted in Insights

Last updated 2 July 2026 by attorney-at-law Kjell Steffner

In short

  • AI hallucinations are false statements that an AI tool presents as fact, delivered in the same fluent, confident tone as its correct answers.
  • Peer-reviewed research found general chatbots gave wrong answers to 58 to 88 per cent of specific legal questions, and even specialised legal AI tools erred 17 to 33 per cent of the time.
  • The danger is not that AI makes mistakes. It is that the mistakes look exactly like the correct answers, and the tools rarely warn you when they are guessing.
  • Businesses have already paid for this, through court sanctions, tribunal awards against a chatbot’s misinformation, and a partial refund of a six-figure consulting fee.
  • The risk is manageable with a verification routine and basic AI governance, and the companies that build one keep the productivity gains without inheriting the errors.

What is an AI hallucination, and why does it happen?

AI hallucinations are outputs that a generative AI tool states as fact but that have no basis in fact. In business settings this means invented sources, fabricated case law, misquoted contract clauses, non-existent policies and statistics that were never published. The word suggests a malfunction. It is not. A large language model works by predicting the most plausible next word based on patterns in its training data, and it runs exactly the same process whether the result happens to be true or false. When the model lacks the facts, it fills the gap with something that reads correctly.

This mechanism explains the defining feature of the problem. A hallucinated answer is not hesitant or garbled. It arrives polished, specific and well structured, with the same authoritative tone the tool uses when it is right. There is no built-in signal that separates knowledge from invention, which is why the errors slip past intelligent, careful readers.

How often do AI hallucinations occur? The numbers

The most rigorous evidence comes from two large empirical studies by researchers at Stanford. The first, Large Legal Fictions, published in the Journal of Legal Analysis in 2024, asked general-purpose models over 800,000 direct, verifiable questions about real court cases. The models answered falsely between 58 per cent of the time (GPT-4) and 88 per cent of the time (Llama 2). The second, Hallucination-Free?, published in the Journal of Empirical Legal Studies in 2025, tested professional legal research tools whose vendors had marketed them as eliminating hallucinations. The specialised tools still hallucinated between 17 and 33 per cent of the time. They were markedly better than the general chatbots, and still wrong far too often to be relied on without checking.

Study What was tested Error rate found
Large Legal Fictions (2024) General chatbots (GPT-4, GPT-3.5, PaLM 2, Llama 2) on verifiable questions about real court cases 58% to 88%
Hallucination-Free? (2025) Specialised legal AI research tools built on retrieval technology (Lexis+ AI, Westlaw AI-Assisted Research, Ask Practical Law AI) 17% to 33%
Hallucination-Free? (2025) GPT-4 on the same professional research queries, for comparison 43%

Two caveats belong next to these figures. The tests were run on 2023 and 2024 model versions, and newer models measurably improve with each generation, so the exact percentages will keep moving. The direction of the finding has not moved. Every credible study to date lands on the same conclusion, that no current tool is reliable enough to skip verification in high-stakes work. The figures also come from legal questions because law offers objectively checkable answers, but the underlying mechanism is identical whether the tool is summarising a contract, a market report or a technical standard.

Why the confidence is the real danger

If AI tools flagged their own uncertainty, hallucinations would be a minor nuisance. They do not. The Stanford researchers behind Large Legal Fictions documented two behaviours that should shape how every business uses these tools. First, the models could not reliably predict when their own answers were fabricated. Second, when the researchers asked questions built on a false premise, the models tended to accept the premise and elaborate on it rather than correct it. An AI assistant will often agree with your mistaken assumption and then build a fluent, detailed answer on top of it.

The most famous illustration comes from the first major court case on the issue. In Mata v. Avianca, decided in New York in 2023, a lawyer used ChatGPT for research and received six judicial opinions that did not exist, complete with plausible citations and reasoning. Before filing, he asked the chatbot whether the cases were real. It assured him they were. The court imposed a 5,000 dollar sanction and required the lawyers to notify every judge whose name appeared in the fabricated opinions. The lesson generalises well beyond law firms. Asking an AI tool to verify its own output is not verification, it is asking the same faulty process to grade itself.

Three human tendencies make this worse in practice. Polished, structured output triggers our instinct to trust it, an effect that grows the better the writing gets. Confirmation bias means an answer that supports what we already hoped is examined less critically. And verification carries a real cost in time, which quietly disappears from the business case that justified the tool. Any company measuring AI productivity gains without measuring verification effort is overstating the gains.

What have AI hallucinations already cost businesses?

The costs are no longer hypothetical, and they are not confined to one industry or one jurisdiction.

Courts worldwide are sanctioning unverified AI output. Since Mata v. Avianca, a continuously updated research database maintained by academic Damien Charlotin has logged well over a thousand court decisions across the United States, Canada, the United Kingdom, Australia and many other jurisdictions in which fabricated AI-generated material became an issue. The pattern in the sanction decisions is consistent. Courts rarely punish the use of AI itself. They punish the failure to verify, and the lack of candour once the fabrication is discovered.

Companies are liable for what their customer-facing AI says. In Moffatt v. Air Canada, a Canadian tribunal held the airline liable in 2024 after its website chatbot invented a refund policy that did not exist. A customer relied on it, was refused the refund, and sued. The airline argued that the chatbot was a separate entity responsible for its own statements. The tribunal rejected that outright and treated the chatbot as part of the company’s website, no different from any other page it publishes. The damages were small. The principle was not. A company answers for its AI’s statements as its own.

Professional deliverables are being clawed back. In 2025, Deloitte agreed to a partial refund on a report produced for the Australian government at a cost of roughly 290,000 US dollars, after a researcher identified AI-generated hallucinations in it, including references that did not hold up. For any business that sells analysis, advice or documentation, this is the commercially uncomfortable precedent. Hallucinated content in a paid deliverable is a quality defect the customer can price.

Which business areas are most exposed?

The exposure is highest wherever AI output feeds a decision or a commitment without an independent check in between. Five areas deserve particular attention.

  • Contract review and drafting. A tool can miss the one clause that matters, flag harmless clauses as risks, or summarise an obligation in a way that quietly reverses its meaning. The output reads like diligent analysis either way.
  • Compliance and regulatory questions. Models invent rules that do not exist and omit exceptions that do, and they are weakest precisely where the law is newest or most local, because that is where their training data is thinnest.
  • Customer-facing chatbots. After Moffatt v. Air Canada, an invented policy or price communicated by your bot is, in practice, your representation.
  • Board papers and decision material. A fabricated statistic or misattributed market claim that reaches a board deck acquires institutional authority that is very hard to unwind.
  • Procurement and vendor due diligence. AI-assisted summaries of supplier documentation can state certifications, terms or security postures the underlying documents do not support.

How should a company verify AI-assisted work?

The answer is not to ban the tools. Used well, they deliver real productivity gains, and in most organisations they are already embedded in everyday software whether anyone decided or not. The answer is a verification discipline proportionate to the stakes, anchored in a few rules that anyone can follow.

Treat every factual claim in AI output as unverified until a human has checked it against the source, and make the source itself the standard, not the AI’s summary of it. Never accept the tool’s own confirmation of its accuracy, since the Avianca case shows exactly what that assurance is worth. Match the level of checking to the consequences, so that an internal brainstorm needs none while a contract, a filing or a customer commitment needs full verification. Assign ownership, because a document with an AI-assisted section still has exactly one accountable author, and that author is human. And keep a record of what was AI-assisted and who verified it, which is rapidly becoming the norm courts and regulators expect.

These rules only hold if they are anchored in something the organisation has actually adopted. That is the role of AI governance, a short policy, an inventory of approved tools, clear user guidelines and training. We have set out the six building blocks in our companion piece, a practical AI governance checklist for companies.

Where does this leave AI in business?

The confident error is a property of the technology, not a passing defect, and it will remain part of the picture even as the error rates fall. That is not an argument against using AI. It is an argument for using it the way businesses use every other powerful tool, with controls that match the risk. The companies that get this right will capture the speed and coverage the tools genuinely offer, while their competitors alternate between naive trust and panicked prohibition.

For the broader question of whether AI will replace legal professionals altogether, see our colleague’s article Why AI will not replace an attorney. This piece asked where AI quietly fails, what those failures already cost, and how to keep them out of your decisions. If your organisation is adopting AI tools, reviewing AI-assisted deliverables or wondering what its chatbot might be promising customers, Nordia Law can help you map the legal, contractual and governance risks before they become incidents.

Sources

About the author

Kjell Steffner  ·  Attorney

Partner, Nordia Law Oslo. Technology and IT law, data protection and commercial contracts.

Kjell Steffner advises Norwegian and international businesses on technology and IT contracts, data protection and the practical governance of AI. He is Nordia Law’s AI-responsible partner in Oslo and leads the firm’s work on safe and compliant AI adoption.

Read more about the author

Kjell Steffner
Partner, Oslo kjs@nordialaw.com +47 905 11 901

Related News