AI in legal practice is broken
What general counsel and managing partners need to know before the next hallucinated citation costs your firm a client

79% of legal professionals now use AI in their work.[1] General-purpose large language models hallucinate on legal queries between 69% and 88% of the time.[2] A global database now tracks over 486 documented cases of AI-fabricated citations in court filings, with 128 lawyers implicated and counting.[3] The firms buying AI tools and hoping for the best are generating malpractice exposure at scale. The firms getting results treat AI as an information infrastructure problem, not a productivity shortcut.
Key Points
AI adoption among US law firms nearly tripled from 11% in 2023 to 30% in 2024, with firms of 500+ attorneys adopting at 47.8%.[4]
Lessons Learned
Audit every AI-generated citation before filing. No tool, including RAG-based legal research platforms, is hallucination-free.
What happens when law firms treat ChatGPT like Westlaw?
The legal profession has a word for fabricated authority that looks real: ghost citations. These are AI-generated case names, docket numbers, and holdings that follow the exact formatting conventions of legitimate case law but refer to decisions that never happened. They often include real judge names assigned to fabricated opinions, real court identifiers attached to nonexistent dockets, and plausible-sounding holdings that support whatever argument the prompt requested.
The danger is not that AI is bad at law. The danger is that AI is good enough at mimicking legal language to fool the people who should know better. A fabricated citation in a brief is not a typo. It is an assertion to a court that binding authority exists. When that authority turns out to be fictional, the attorney who signed the filing has violated Rule 11 of the Federal Rules of Civil Procedure, their ethical duty of candor, and their client's trust.
How do you know your firm has a legal AI problem?
Stanford's RegLab published two peer-reviewed studies that put numbers on this. The first, in January 2024, tested general-purpose models (GPT-4, Claude, Llama) on legal queries and found hallucination rates between 69% and 88%.[2] The models performed worst on lower court case law, the most recent decisions, and the oldest still-applicable precedent. Most did no better than random guessing on precedential relationships between cases.
The second study, published in May 2024 and updated later that year, tested the legal-specific RAG tools that Westlaw and LexisNexis market as hallucination-free alternatives. The results were better, but nowhere close to the claims. Lexis+ AI produced incorrect information more than 17% of the time. Westlaw's AI-Assisted Research hallucinated more than 34% of the time in the updated study.[5] These are the premium products, built on proprietary legal databases, marketed to firms that bill $1,000 an hour. A mid-sized litigation team running 50 queries a day through a tool with a 17% hallucination rate generates roughly 8-9 unreliable results daily. Over a month, that is 170-190 answers that require manual verification, or they enter the firm's work product unchecked.
Why is the AI hallucination problem in law getting worse?
The Charlotin database, maintained by a researcher at HEC Paris, tracked the progression from a handful of cases in 2023 to over 486 documented instances worldwide by late 2025.[3] An estimated 712 judicial decisions worldwide now address AI hallucinations. 90% of them were issued in 2025 alone.[6] Three factors drive the acceleration.
First, adoption outpaces education. The ABA's 2024 survey found that AI adoption nearly tripled year-over-year, from 11% to 30%.[4] At the same time, 53% of firms have no AI policy or their attorneys are unaware of one.[1] Attorneys are experimenting with tools they do not understand in matters where the stakes are measured in careers and client outcomes.
Second, general-purpose tools are replacing legal-specific ones. Clio's 2025 Legal Trends Report found that only 40% of legal professionals now use legal-specific AI, down from 58% in 2024.[1] The shift toward free or low-cost general-purpose models, ChatGPT, Gemini, Perplexity, puts attorneys further from the verified legal databases that reduce hallucination risk.
Third, the AI tools are confident even when they are wrong. Stanford's research found that LLMs "often lack self-awareness about their errors and tend to reinforce incorrect legal assumptions and beliefs."[2] When attorney Steven Schwartz asked ChatGPT if the cases it generated were real, ChatGPT doubled down: "Yes, Varghese v. China Southern Airlines is a real case" and "can be found on legal research databases such as Westlaw and LexisNexis."[8] The tool does not know what it does not know.
What do lawyers and judges say about AI in legal practice?
The ABA's 2024 survey of 512 attorneys found that 74.7% identified accuracy as their most pressing concern about AI implementation. Reliability followed at 56.3%, and data privacy at 47.2%.[4] Only 17.4% said they did not know enough about AI to answer the survey's question about perceived benefits, down from the prior year, suggesting the profession is aware of the risk even as it adopts the tools.
Bloomberg Law's 2025 State of Practice Survey documented a stark expectations gap. In 2024, nearly 39% of law firm attorneys expected AI to accelerate adoption of alternative fee arrangements. By 2025, only 9% reported actually seeing that shift.[9] Across every category surveyed, the percentage of lawyers observing AI-driven changes was smaller than the percentage who had predicted them the year prior. The hype arrived ahead of the results.
From the bench, patience is running out. A California Court of Appeal published its opinion in Noland v. Land of the Free as "a warning," stating that no filing submitted to any court should contain citations the responsible attorney has not personally read and verified.[10] Judges have moved from surprise to exasperation to formal sanctions regimes in under three years.
What does it cost when AI gets it wrong in court?
The consequences have escalated from warnings to career-altering penalties. Each case that follows shares a common pattern: an attorney used AI without a verification pipeline, and the absence of that infrastructure converted a time-saving tool into a professional liability.
Mata v. Avianca: the case that started it all
In June 2023, Judge P. Kevin Castel of the Southern District of New York sanctioned attorneys Steven Schwartz and Peter LoDuca of Levidow, Levidow & Oberman $5,000 for submitting a brief containing six entirely fabricated cases generated by ChatGPT. The fabricated cases included fictitious airlines, invented docket numbers, and what Judge Castel described as legal analysis that was "gibberish."[8] The memorable detail: when Schwartz asked ChatGPT to confirm the cases existed, it assured him they were real and could be found on Westlaw and LexisNexis. He asked a second time. ChatGPT apologized for any confusion and doubled down. Schwartz testified he was "operating under the false perception that this website could not possibly be fabricating cases on its own."[8] The attorneys were also ordered to write letters to every judge whose name appeared on a fake opinion, attaching the sanctions order and transcripts.[11]
ByoPlanet v. Johansson: the $86,000 ceiling
In August 2025, the Southern District of Florida imposed nearly $86,000 in sanctions against plaintiffs' counsel in ByoPlanet International v. Johansson and Gilstrap, the largest AI-related sanction in legal history at the time of issuance.[6] The attorney admitted to using ChatGPT and other AI tools to draft complaints, motions, and appellate briefs across eight related cases. The court cited "repeated, systemic and bad-faith misuse of generative AI, despite multiple warnings."[6] This was not a one-off accident. It was a practice pattern. The sanctions were large enough to trigger insurance scrutiny, partner-level exposure, and reputational damage that extends well beyond the dollar figure.
Chicago Housing Authority: ghost citations in a multi-million-dollar verdict
In the summer of 2025, attorneys for the Chicago Housing Authority cited the Illinois Supreme Court case Mack v. Anderson in a post-trial motion to reconsider a multi-million-dollar verdict. Mack v. Anderson does not exist.[12] At a special hearing, the responsible attorney said she did not think ChatGPT was capable of creating false precedent. She was removed from her position because, at the time, using AI was against her employer's policy.
What do the patterns tell us?
The Charlotin database documents 324 hallucination cases in US courts alone, with 128 lawyers and 2 judges implicated.[3] In the first two weeks of August 2025, three separate federal courts sanctioned lawyers for AI hallucinations.[13] One involved an attorney who used a well-known legal research database that still produced fabricated citations.[13]
The common thread across every documented failure: the attorney did not have a system for verifying AI output before it reached the court. The tool was not the variable. The absence of a verification pipeline was.
What happens when every attorney in the firm is using unverified AI?
The individual sanctions cases are visible. The systemic costs are not. When a firm deploys AI without data infrastructure, the damage compounds silently across the entire practice.
What do practitioners actually report?
Embroker's 2024 survey of over 200 American lawyers found that 41% reported concerns about data privacy related to AI adoption.[14] The concern is not abstract. Publicly available LLMs process prompts on external servers. Feeding client-privileged documents into a general-purpose model is not a configuration issue. It is a potential breach of attorney-client privilege with no technical remediation.
Larger firms have tried to solve this by subscribing to providers like Harvey or Luminance, or by building in-house models. But the gap between large firms and small firms is widening. The Federal Bar Association found that firms with 51+ attorneys use AI at roughly double the rate of smaller firms.[15] The price of firm-ready AI systems is the barrier. For the firms that cannot afford enterprise tools, the alternative is general-purpose models with sensitive data redacted, a process that adds enough friction to negate the efficiency gains AI was supposed to deliver.
How often does the error actually occur?
Stanford's first study found that general-purpose LLMs hallucinate on legal queries between 69% and 88% of the time.[2] Models performed worst on questions about a court's core holding, hallucinating at least 75% of the time.[2] On precedential relationships between cases, "most LLMs do no better than random guessing."[2]
A 17% error rate from a legal-specific RAG tool across a team of 20 attorneys running 10 queries each means roughly 34 unreliable results per day. Over a 250-day work year, that is 8,500 answers that are wrong, misgrounded, or incomplete. Some of those answers will make it into briefs, memos, and client advisories. The ones caught internally cost billable hours. The ones caught by opposing counsel cost reputations. The ones caught by judges cost careers.
What happens when nobody fixes the data pipeline?
The compounding problem in legal AI is not that errors occur. It is that errors feed forward. A hallucinated case enters a brief. The brief informs an internal memo. The memo shapes a client advisory. The client advisory influences a business decision. At each step, the fabricated authority gains a layer of institutional credibility that makes it harder to detect and more damaging when it surfaces.
What does the data say about AI failure rates at enterprise scale?
Bloomberg Law's 2025 analysis found that across every operational category surveyed, the vast majority of law firm respondents reported "no change" from AI implementation.[9] The anticipated productivity gains, billing practice shifts, and workload improvements predicted in 2024 have not materialized for most firms. The gap between expectations and outcomes is the clearest signal that the problem is not a tool problem. It is an infrastructure problem.
Adoption keeps climbing regardless. A global Thomson Reuters survey found that the share of legal organizations actively integrating generative AI rose from 14% in 2024 to 26% in 2025, with 45% of firms using it or planning to make it central to their workflow within one year.[14] Firms are doubling down on tools that have not demonstrated enterprise-level reliability. The investment thesis is hope, not evidence.
How does the problem compound over time?
Consider the feedback loop. An attorney uses AI to draft a research memo. The memo contains a misgrounded citation, a real case cited for a proposition it does not actually support. This is the type of hallucination that even RAG tools produce 17% of the time.[5] The memo enters the firm's knowledge management system. Six months later, another attorney retrieves that memo for a related matter and relies on the citation without re-checking the original source. The misgrounded authority now appears in two matters. If it enters a client-facing document, the error propagates outside the firm entirely.
This is the hidden cost. Not the $5,000 sanction. Not the nearly $86,000 penalty. The silent accumulation of unreliable work product across the firm's institutional memory. Without a pipeline that verifies citations, grounds AI output in authoritative sources, and flags degraded or superseded authority, every AI interaction is a potential seed of compounding error.
Why is the real problem your knowledge infrastructure, not your AI model?
The standard narrative in the legal AI market frames the problem as a model quality issue. Use a better model. Use a legal-specific model. Use a model with RAG. Stanford's research disproves this framing. Their evaluation of legal-specific RAG tools found that even the most sophisticated retrieval-augmented systems still hallucinate at rates between 17% and 33%.[5] The same underlying model technology, when combined with different data infrastructure, produces wildly different outcomes.
Harvey AI illustrates the infrastructure argument from the other direction. The company, valued at $8 billion as of December 2025, counts 50 of the top AmLaw 100 firms as customers and surpassed $100 million in annual recurring revenue.[16] Harvey builds on the same foundation models (OpenAI, Anthropic) that generate hallucinations when used without infrastructure. The difference is the data layer: firm-specific document collections, proprietary knowledge bases, and citation-grounding pipelines that tether AI output to verifiable legal authority.
When Allen & Overy (now A&O Shearman) first trialed Harvey in November 2022, 3,500 lawyers used it for around 40,000 queries. David Wakeling, head of the firm's Markets Innovation Group, was clear about the constraint: "You must validate everything coming out of the system. You have to check everything."[17] The firm treated the tool as infrastructure that required a verification layer, not as an oracle that replaced attorney judgment.
The contrast is instructive. The sanctioned attorneys in every documented case used AI without infrastructure. They asked a general-purpose model to do legal research, received fabricated output, and filed it without verification. The firms succeeding with AI built or purchased pipelines that retrieve from verified legal databases, ground citations in primary authority, and flag output that cannot be traced to a source. Same models. Different data architecture. Opposite outcomes.
What does a successful legal AI implementation look like?
Harvey's approach breaks down into visible components. The platform customizes models on firm-specific data by ingesting proprietary documents that remain private to each firm.[18] It provides source attribution so attorneys can trace any AI-generated statement back to the document it came from.[17]
Gina Lynch, Paul Weiss's chief knowledge and innovation officer, noted that the firm was not using hard metrics like time saved to assess productivity gains, because "the time and effort needed to carefully review the output" made simple efficiency claims misleading.[17] That framing is the correct one. The value is not that AI makes lawyers faster. The value is that a properly built pipeline makes AI output trustworthy enough to use.
How much does broken legal AI cost?
The visible costs are documented and growing. Sanctions have escalated from $5,000 in 2023[8] to nearly $86,000 in 2025.[6] A California court imposed $10,000 on an attorney and also declined to award attorneys' fees to opposing counsel who failed to detect the fabricated citations, establishing that detection is becoming an expected professional competency.[10] A Colorado attorney received a 90-day suspension.[3]
The invisible costs are larger. Every fabricated citation caught before filing required attorney time to detect and correct. Every AI-generated draft that requires paragraph-by-paragraph verification against primary sources negates the time savings the tool was supposed to provide. The ABA found that firms cite "saving time/increasing efficiency" as the primary perceived benefit of AI at 54.4%.[4] If verification overhead consumes the efficiency gain, the tool becomes an expensive liability.
Malpractice insurance adds another dimension. Professional liability policies do not always cover losses from AI tool failures.[7] Insurers argue that blindly relying on AI output does not constitute a "professional service" under policy definitions, which means the claim falls outside coverage. Munich Re has begun offering AI-specific coverage, but with due diligence requirements on the AI systems used.[7] In the broader insurance market, some property policies have begun capping AI-related losses at $500,000 even on $10 million face-amount policies, a trend that is extending to professional liability.[7] For a firm handling complex litigation, a single uncaught hallucination in a high-stakes matter could generate losses that exceed available coverage.
How do you build a legal AI pipeline that actually works?
The pattern across every failure case and every success story points to the same conclusion: the differentiator is not the model. It is the data pipeline. The firms getting results from AI invested in three things that most implementations skip.
1. Verified retrieval, not open-ended generation
Most legal AI failures occur because an attorney asked a general-purpose model to generate research without connecting it to a verified legal database. The model draws from training data, which includes outdated, incomplete, or misremembered legal authority. A proper pipeline retrieves from curated, current legal sources before the model generates anything. At Tricky Wombat, we build retrieval pipelines that ground every response in primary authority. The system does not generate a legal proposition unless it can point to the source document that supports it. When the source cannot be identified, the system says so rather than fabricating one.
2. Citation verification as a system requirement, not a human afterthought
The verification step in most legal AI workflows is "the attorney checks it." That is not a system. It is a hope. Human review degrades under volume and time pressure, the exact conditions that drive AI adoption in the first place. A pipeline-level solution runs automated citation verification before output reaches the attorney. Every case name, docket number, and holding is checked against authoritative databases. Citations that cannot be verified are flagged, not passed through. Tricky Wombat's pipeline treats citation verification as infrastructure. Every reference is traced to its source, checked for currency, and validated against the cited proposition before delivery.
3. Domain-specific knowledge architecture, not generic embeddings
General-purpose RAG systems retrieve documents by semantic similarity. Legal reasoning requires more: hierarchical authority (binding vs. persuasive), jurisdictional relevance, temporal currency (good law vs. overruled), and procedural context. A document that is semantically similar to a query could be from the wrong jurisdiction, superseded by a later decision, or relevant to a different procedural posture. Tricky Wombat builds legal knowledge architectures that encode these distinctions into the retrieval layer. The system understands that a 2024 Ninth Circuit opinion on the same topic does not override a 2019 Supreme Court holding, and it structures its retrieval accordingly.
The operational principle across all three steps: the pipeline improves continuously. New case law enters the knowledge base as it is published. Citations are re-verified against current authority. Retrieval performance is measured against known-correct answers. The system that launched six months ago is less accurate than the system running today, because the infrastructure learns from every query.
The bottom line
Every AI failure in a courtroom follows the same pattern. An attorney treated a language model as a research tool. It is not one. A language model is a text generation system that produces plausible-sounding output based on statistical patterns. When connected to verified legal databases through a properly built pipeline, it becomes useful. When disconnected from that infrastructure, it becomes a liability generator.
The legal profession is entering a period where not using AI will become as risky as using it carelessly. Thomson Reuters' chief product officer has said that "in a few years, it will be almost malpractice" for lawyers not to use AI.[19] LegalWeek 2026, held this month in New York, is asking the question directly in its session titled "If You're Not Using AI, Are You Committing Malpractice?"[20] Harvey AI CEO Winston Weinberg is on the agenda arguing that the bar has permanently moved for client expectations around AI-enabled legal services.[20]
The firms that win in this environment are the ones that understood, before the sanctions hit, that AI is not a tool you buy. It is an infrastructure you build.
▶References (20)
- ↩Clio, "2025 Legal Trends Report," October 2025. https://www.2civility.org/2025-clio-legal-trends-report/
- ↩Dahl, M., Magesh, V., Suzgun, M., Ho, D.E., "Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models," Stanford RegLab and HAI, January 2024. https://law.stanford.edu/2024/01/11/hallucinating-law-legal-mistakes-with-large-language-models-are-pervasive/
- ↩Charlotin, D., AI Hallucination Cases Database, HEC Paris, October 2025. Referenced in: Cronkite News, "As more lawyers fall for AI hallucinations, ChatGPT says: Check my work," October 28, 2025. https://cronkitenews.azpbs.org/2025/10/28/lawyers-ai-hallucinations-chatgpt/
- ↩American Bar Association, "2024 Legal Technology Survey Report," March 2025. https://www.americanbar.org/groups/law_practice/resources/tech-report/2024/2024-artificial-intelligence-techreport/
- ↩Magesh, V., Surani, F., Dahl, M., Suzgun, M., Manning, C.D., Ho, D.E., "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools," Stanford RegLab, Journal of Empirical Legal Studies, 2025. https://reglab.stanford.edu/publications/hallucination-free-assessing-the-reliability-of-leading-ai-legal-research-tools/
- ↩VinciWorks, "When AI hallucinates and lawyers pay: The $86K legal wake-up call," January 6, 2026. https://vinciworks.com/blog/when-ai-hallucinates-and-lawyers-pay-the-86k-legal-wake-up-call/
- ↩Braff, D., "Does your professional liability insurance cover AI mistakes? Don't be so sure," ABA Journal, February-March 2025. https://www.abajournal.com/magazine/article/does-your-professional-liability-insurance-cover-ai-mistakes-dont-be-so-sure
- ↩Mata v. Avianca, Inc., 678 F.Supp.3d 443 (S.D.N.Y. 2023). https://en.wikipedia.org/wiki/Mata_v._Avianca,_Inc.
- ↩Bloomberg Law, "AI in Law Firms: 2024 Predictions; 2025 Perceptions," August 2025. https://news.bloomberglaw.com/bloomberg-law-analysis/analysis-ai-in-law-firms-2024-predictions-2025-perceptions
- ↩Noland v. Land of the Free, L.P., California Court of Appeal, 2nd District, September 2025. https://www.lawnext.com/2025/09/a-new-wrinkle-in-ai-hallucination-cases-lawyers-dinged-for-failing-to-detect-opponents-fake-citations.html
- ↩Seyfarth Shaw LLP, "Update on the ChatGPT Case: Counsel Who Submitted Fake Cases Are Sanctioned," June 2023. https://www.seyfarth.com/news-insights/update-on-the-chatgpt-case-counsel-who-submitted-fake-cases-are-sanctioned.html
- ↩HeplerBroom, "Practical Guidelines for How Attorneys Can Use AI," 2025. https://www.heplerbroom.com/blog/an-illinois-court-responds-to-hallucinated-cases
- ↩Jones Walker LLP, "From Enhancement to Dependency: What the Epidemic of AI Failures in Law Means for Professionals," August 2025. https://www.joneswalker.com/en/insights/blogs/ai-law-blog/from-enhancement-to-dependency-what-the-epidemic-of-ai-failures-in-law-means-for.html
- ↩Best Law Firms / Thomson Reuters, "What's Really Stopping Law Firms From Going All in on AI," December 2025. https://www.bestlawfirms.com/articles/the-ai-adoption-curve-in-law/7196
- ↩Federal Bar Association / AffiniPay, "The Legal Industry Report 2025," April 2025. https://www.fedbar.org/blog/the-legal-industry-report-2025/
- ↩TechCrunch, "Legal AI startup Harvey confirms $8B valuation," December 4, 2025. https://techcrunch.com/2025/12/04/legal-ai-startup-harvey-confirms-8b-valuation/
- ↩Harvey (software), Wikipedia. https://en.wikipedia.org/wiki/Harvey_(software)
- ↩Fortune, "Harvey raises $300 million at $5 billion valuation," June 23, 2025. https://fortune.com/2025/06/23/harvey-raises-300-million-at-5-billion-valuation-to-be-legal-ai-for-lawyers-worldwide/
- ↩ABA Journal, "Will generative AI ever fix its hallucination problem?", 2024. https://www.americanbar.org/groups/journal/articles/2024/will-generative-ai-ever-fix-its-hallucination-problem/
- ↩LegalWeek New York 2026 Agenda, March 9-12, 2026. https://www.event.law.com/legalweek/2026-agenda
By Tricky Wombat
Last Updated: Mar 29, 2026