Your AI support bot isn't stupid
How the companies winning at AI customer support invest in their knowledge pipeline, not a bigger model

A man asks a car dealership's AI chatbot to sell him a $58,195 Chevy Tahoe for one dollar. The chatbot agrees. "That's a deal," it writes back. "No takesies backsies." The screenshot racks up twenty million views. The dealership's chatbot had no product catalog, no pricing rules, no knowledge base of any kind. It was a language model with a car dealer's logo and nothing underneath.[1] That failure was not a prank gone wrong. It was the most visible symptom of a structural problem now repeating across every industry deploying AI without knowledge discipline. Seventy-eight percent of all stored data is unstructured,[2] growing at 16% annually toward 10.5 zettabytes.[2] Knowledge workers lose 35% of their productive time, 21% searching for information, 14% recreating knowledge that already exists somewhere, costing a Fortune 500 company with $9 billion in revenue an estimated $2.4 billion per year in lost enterprise value.[3] Sixty-four percent of customers say they'd prefer companies didn't use AI for customer service at all.[4] The AI model isn't the problem. The knowledge pipeline feeding it is.
Key Points
The same AI model (Gemini 1.0 Pro) produced 50% accuracy with basic retrieval and 87% accuracy with an optimized knowledge pipeline on 30 postoperative clinical queries — a 37-percentage-point swing from pipeline design alone.[5]
Lessons Learned
Audit your knowledge base for contradictions before connecting any AI system to it. Synthesia discovered during beta that conflicting articles were the primary cause of bad answers — not the model.[9]
What is the AI customer support knowledge pipeline problem?
Every major AI customer support failure in the last two years shares a root cause that has nothing to do with model intelligence. Air Canada's chatbot had access to the correct bereavement fare policy and still told a grieving customer the opposite.[15] New York City's MyCity chatbot ingested 2,000+ official government pages and told business owners to break the law — for two years.[16] A Chevrolet dealership's chatbot, connected to no product knowledge at all, agreed to sell a $58,195 SUV for one dollar.[1]
The pattern has a name in the AI engineering literature: knowledge pipeline failure. It is the gap between what an organization knows and what its AI can reliably retrieve, validate, and deliver. Researchers at IEEE/ACM identified seven distinct failure points in retrieval-augmented generation (RAG) systems — the architecture behind nearly every AI support product on the market — and found that four of those seven originate in the retrieval and knowledge layer, not in the language model itself.[17]
This is the AI equivalent of building a brilliant reference librarian, then locking them in a room full of outdated, contradictory, and misfiled documents. The librarian isn't the problem. The library is.
How do you measure whether your AI knowledge pipeline is broken?
The clearest signal comes from a 2025 peer-reviewed study in JMIR that held the AI model constant and varied only the knowledge retrieval approach.[8] RAG systems connected to curated, domain-specific knowledge bases produced 0% hallucination rates using GPT-4 and 6% using GPT-3.5. The same models using Google Search as a retrieval layer hallucinated at 6% and 10%, respectively. Conventional chatbots without structured retrieval hallucinated approximately 40% of the time.[8]
The Vectara Hallucination Evaluation Model leaderboard, which benchmarks major LLMs on factual consistency, shows best-case hallucination rates between 0.7% and 1.5% — and worst-case rates above 9.2%.[18] A 1% hallucination rate sounds negligible. Multiply it by 1,000 employees asking 10 questions a day and you get 100 fabricated answers every single day circulating as institutional knowledge.
A second peer-reviewed study, published in 2025 in PMC/NIH, tested four chunking configurations of the same model (Gemini 1.0 Pro) on 30 postoperative clinical queries.[5] The basic retrieval configuration produced 50% accuracy. The optimized pipeline — same model, better retrieval architecture — produced 87% accuracy (p = 0.001) and achieved the highest relevance at 93%.[5] The model was the constant. The pipeline was the variable.
Why is the knowledge pipeline problem getting worse?
Three forces are compounding simultaneously.
First, data volume. IDC projects that stored data will grow from 5.5 zettabytes to 10.5 zettabytes by 2028 at a 16% compound annual growth rate, with 78% of it unstructured.[2] Every month your knowledge base grows, the ratio of useful signal to noise gets worse — unless you have active governance.
Second, deployment speed. Eighty-five percent of customer service leaders plan to pilot conversational generative AI in 2025.[10] But 61% of those same leaders report a backlog of knowledge articles awaiting edits, and one-third have no formal process for revising outdated content.[10] They are deploying AI on top of knowledge debt.
Third, institutional knowledge loss. Forty-two percent of workplace knowledge is unique to individual employees, not captured in any system.[19] When those employees leave, the knowledge leaves with them — and the AI has no way to know what it doesn't know. Panopto estimates this costs large businesses $47 million per year in lost productivity, based on a survey of 1,001 U.S. workers.[19] That figure is from 2018; the problem has only scaled with AI-driven workflows that amplify whatever knowledge (or lack of it) exists.
What do customers and agents actually say about AI support?
The customer verdict is unambiguous. Sixty-four percent of customers would prefer companies didn't use AI for customer service, and 53% say they'd switch to a competitor over it.[4] In a 2025 Kinsta/Propeller Insights survey of 1,011 U.S. consumers, 93.4% said it was important to reach a human agent, 84% said humans are more accurate than AI, and 49.6% said they'd cancel a service entirely because of AI-driven support.[20]
The agent verdict is equally clear but points in a different direction. In Salesforce's seventh edition State of Service report, surveying 6,500 professionals across 40 countries, 71% of agents with AI tools said AI creates growth opportunities — but 30% of all service interactions are now handled by AI, projected to hit 50% by 2027.[21] Agents aren't opposed to AI. They're drowning in the consequences of bad implementations. In the sixth edition, 58% of underperforming agents reported toggling between multiple screens to find information, 77% reported increasing workloads, and 69% said they couldn't balance speed with quality.[22]
The Zendesk CX Trends 2026 report, surveying 11,297 people across 22 countries, found a credibility gap that captures the entire problem in two numbers: 95% of consumers say it's important that AI explain its reasoning, but only 37% of companies provide explanations.[23] Customers don't trust AI because AI hasn't earned trust. It hasn't earned trust because the knowledge pipeline feeding it doesn't produce trustworthy answers.
---
What happens when AI customer support gets the knowledge pipeline wrong?
The failure cases are not hypotheticals. They are tribunal rulings, viral humiliations, and six-figure government write-offs. Each maps to a distinct failure mode in the knowledge pipeline, and together they form a taxonomy of what goes wrong when organizations treat AI deployment as a model problem rather than a knowledge problem.
Air Canada: the chatbot that contradicted its own source
In November 2022, Jake Moffatt's grandmother died. He went to the Air Canada website and asked the chatbot about bereavement fares. The chatbot linked him directly to Air Canada's bereavement fare policy page — then told him he could book a full-price ticket and apply for a retroactive bereavement discount within 90 days. The actual policy, on the page the chatbot linked to, said no retroactive claims were accepted.[15]
Moffatt booked the ticket. Applied for the discount. Was denied. When he took the case to the British Columbia Civil Resolution Tribunal, Air Canada argued that its chatbot was "a separate legal entity" responsible for its own accuracy. The tribunal rejected this outright. In a February 2024 ruling, the tribunal held Air Canada responsible and ordered it to pay Moffatt C$812.02, stating that "it makes no difference whether the information comes from a static page or a chatbot."[15]
The knowledge existed. The retrieval worked. The synthesis failed. This is the most dangerous pipeline failure mode because it looks correct — it even cites its source — while delivering the opposite of what the source says.
New York City's MyCity: two years of illegal advice at taxpayer expense
In October 2023, New York City launched MyCity, an AI chatbot built on over 2,000 pages of official city guidance, designed to help small business owners navigate regulations.[16] By March 2024, The Markup had documented the chatbot advising business owners that they could take a cut of employee tips (illegal under federal and state law), refuse to accept Section 8 housing vouchers (a violation of the NYC Human Rights Law), accept cash exclusively to avoid audit trails (tax evasion), and change locks on tenants without notice (illegal eviction).[16]
The chatbot ran for over two years before Mayor Mamdani shut it down in February 2026, calling it "functionally unusable."[16] The project cost approximately $600,000.[16] The failure mode was different from Air Canada's: MyCity ingested the correct documents but lacked validation logic to check whether its synthesized advice contradicted the law it was supposed to explain.
Chevrolet of Watsonville: the $1 Tahoe
In December 2023, Chris Bakke posted a screenshot to social media showing a conversation with the Chevrolet of Watsonville dealership chatbot.[1] He had asked the chatbot to agree to sell him a 2024 Chevy Tahoe — base MSRP approximately $58,195 — for one dollar. The chatbot agreed. "That's a deal, and that's legally binding," it replied, adding, "no takesies backsies."[1]
The chatbot was built by Fullpath, whose CEO Aharon Horwitz publicly acknowledged the incident.[1] The post was viewed over 20 million times. No lawsuit was filed — unlike Air Canada, there was no tribunal ruling — but the reputational damage was immediate and viral. The failure mode here was the simplest of all: the chatbot had no product knowledge, no pricing guardrails, and no grounding in any source material. It was a language model with a car dealership's logo and nothing else.
DPD: the chatbot that wrote poetry about how terrible DPD is
In January 2024, customer Ashley Beauchamp contacted DPD's chatbot about a missing parcel. A system update had severed the chatbot's connection to its knowledge base. Without grounding, the chatbot swore, wrote a poem about how terrible DPD was, and criticized the company at length.[24] Beauchamp posted the exchange to X, where it accumulated over 1.6 million views.[24] DPD disabled the AI element immediately.[24]
This is the pipeline failure mode that most people laugh about — until they calculate the customer lifetime value of every person who saw it.
What do the numbers say about the pattern?
These are not four unlucky companies. They are the visible tip of a systemic pattern. McKinsey surveyed 440 customer care leaders and found that companies leading with AI saw 40% improvement in customer experience — but laggards saw only 12%.[25] The difference was not model selection. It was foundational AI maturity: 67% of leaders had invested in foundational data and AI infrastructure versus 16% of laggards.[25]
The Forrester 2024 U.S. Customer Experience Index reported that customer experience quality declined for a third consecutive year, with 39% of brands declining.[26] Qualtrics calculated that bad customer experiences put $3.8 trillion in global sales at risk — a $119 billion increase year over year — and that 53% of consumers cut spending after a single bad experience.[27]
The race to deploy AI in customer support is producing worse outcomes at industrial scale.
---
How does the AI knowledge pipeline problem compound across an enterprise?
The damage from a broken knowledge pipeline doesn't stay in the support queue. It metastasizes.
When a chatbot gives a wrong answer, the customer who received it doesn't flag it in your knowledge base — they just leave. The wrong answer stays in the system. The next customer gets the same wrong answer. If an agent corrects the answer manually in a ticket, that correction lives in the ticketing system, not in the knowledge base the AI reads. The knowledge base remains wrong. The AI keeps serving the wrong answer. The agent corrects it again. And again. This is the feedback loop that turns a single knowledge error into thousands of hours of wasted agent time.
What do users actually complain about?
The complaints are remarkably consistent. The Salesforce State of Service reports, across two editions surveying a combined 12,000+ professionals, surface the same theme: agents cannot find what they need.[21] [22] Fifty-eight percent of underperforming agents toggle between multiple screens looking for answers.[22] The Zendesk CX Trends 2025 report, surveying 5,100 consumers and 5,400 business leaders across 22 countries, found a 70% gap between AI deployment ambition and customer-facing AI readiness.[28] The gap is not technical capability — the models are powerful enough. The gap is knowledge infrastructure.
How often does the technical failure actually occur?
More often than the headline cases suggest. The Vectara leaderboard shows that even the best-performing LLMs hallucinate at 0.7%–1.5% baseline rates, with models under production pressure reaching 9.2%.[18] RAG reduces hallucination by approximately 71% compared to ungrounded generation,[18] but the remaining error rate still compounds. A company handling 5,000 AI-assisted interactions daily at a 2% error rate produces 100 incorrect answers per day. Over a quarter, that's 9,000 wrong answers — some of them now embedded in customer expectations, agent workarounds, and downstream documentation.
---
What happens when nobody fixes the knowledge pipeline?
The industry-level numbers are stark. MIT Media Lab's Project NANDA, analyzing 300 public AI deployments alongside interviews and surveys, found that 95% of generative AI pilots fail to deliver measurable profit-and-loss impact.[6] Only 5% of custom enterprise AI tools reach production.[6] Gartner predicted in July 2024 that 30% of generative AI projects would be abandoned after proof of concept by end of 2025, citing a survey of 822 business leaders in which only 15.8% reported revenue increases and only 15.2% reported cost savings from their AI initiatives.[29]
The primary cause, across every major analyst report, is not model capability. It is data.
What does the data say about AI project failure rates?
Gartner's May 2024 survey found that only 48% of AI projects reach production, with the average prototype-to-production timeline stretching to eight months.[30] Informatica's CDO Insights 2025 report identified the top obstacles: 43% cite data quality, 43% cite data governance, and 35% cite lack of AI-ready data.[31] These are not three separate problems. They are the same problem: the knowledge pipeline is not engineered for the demands AI places on it.
Data quality failures cost enterprises an average of $12.9 million per year, according to Gartner's survey of enterprises already purchasing data quality software — meaning the actual figure for enterprises that haven't invested in data quality is likely higher.[32] And the damage isn't static. Redundant, obsolete, and trivial (ROT) data — estimated at 70–90% of enterprise data stores[31] — dilutes retrieval accuracy. Every document you add to a knowledge base without governance makes every future retrieval slightly worse.
How does the knowledge pipeline problem compound?
The compounding loop works like this. An outdated help article stays in the knowledge base. The AI retrieves it and gives a customer wrong information. The customer contacts a human agent, who corrects the answer verbally but doesn't update the source article. The AI continues serving the outdated answer. A second agent notices the pattern and writes a workaround in the internal wiki — which the AI can't access because it's in a different system. A third agent, overwhelmed by volume, doesn't check the AI's answer and passes it through. Now the wrong answer has been confirmed by a human agent, making it harder to identify as wrong in any future audit.
This is the 1-10-100 rule in action, formalized by Labovitz and Chang and applied to data quality by Thomas Redman: it costs $1 to fix a knowledge error at entry, $10 to fix it downstream in a business process, and $100 to fix it after it has caused a failure.[33] Air Canada, New York City, and DPD are textbook $100-tier outcomes. The $1 fix — maintaining accurate, structured, non-contradictory knowledge — was never implemented.
---
Is the AI model the real problem, or is it the data?
The AI model was never the bottleneck. The knowledge pipeline is.
Andrew Ng, in his NeurIPS 2021 keynote and a subsequent 2022 IEEE Spectrum interview, argued that "if 80 percent of our work is data preparation, then ensuring data quality is the important work of a machine-learning team."[34] His research showed that systematic data-centric approaches could triple model performance with no model changes at all.[34]
The evidence since then has only strengthened his argument. Barnett et al., in a peer-reviewed paper at IEEE/ACM's CAIN 2024 conference, identified seven distinct failure points in RAG systems and concluded that "validation of a RAG system is only feasible during operation" — meaning you can't test your way to quality in a lab.[17] Four of those seven failure points originate in the knowledge retrieval layer, not the generation layer.[17]
IBM's Institute for Business Value, in its June 2025 report "From AI Projects to Profits," found that only 16% of organizations have scaled AI across the enterprise.[35] The organizations that succeed treat data governance as a prerequisite, not an afterthought. The organizations that fail — the 84% — treat it as someone else's problem.
The reframe is simple. Every company evaluating AI customer support asks: "Which model should we use?" The evidence says the right question is: "What is the state of the knowledge we're feeding it?"
What does a successful AI support implementation look like?
The same RAG technology class that produced the Air Canada, NYC, DPD, and Chevrolet failures also produced these outcomes — when paired with disciplined knowledge pipeline engineering.
Synthesia experienced a 690% spike in support contacts (from 40,000 to 316,000) after rapid growth.[9] Rather than hiring proportionally, they deployed Intercom's Fin AI agent. During beta, they discovered their knowledge base "was clearly contradicting itself" — articles gave different answers to the same question depending on which one the AI retrieved. They rebuilt the knowledge base from the ground up before full deployment. The result: 55% AI resolution rate, 96% reduction in resolution time, 98.3% self-serve rate, no additional headcount.[9]
Breathe started at a 56% AI resolution rate and spent nine months improving it — not by changing models, but by restructuring knowledge.[36] They removed GIFs and videos that confused the AI parser. They used a snippet-based feedback loop: every wrong answer was traced back to the source article, which was then revised. Resolution climbed from 56% to 82% to 88%. Customer satisfaction held at 85–90%.[36]
Databox moved from a 30% to 55% AI resolution rate with zero model changes — pure knowledge pipeline optimization — and attributed a 40% revenue increase to the freed capacity.[37]
These are not different technologies from the failure cases. They are the same technology with different knowledge discipline.
Erik Brynjolfsson, Danielle Li, and Lindsey Raymond published a landmark study in the Quarterly Journal of Economics in May 2025, analyzing 5,172 customer support agents.[38] The average productivity improvement from AI assistance was 15%. But the improvement for the least experienced agents was 30%.[38] The most experienced agents saw smaller gains — and in some cases, slight quality declines.[38] The interpretation that reinforces the thesis: the AI's knowledge pipeline surfaced answers that were better than a novice's intuition but worse than an expert's institutional knowledge. Pipeline quality was the ceiling.
---
What does broken AI customer support actually cost?
The direct cost comparison is dramatic. Human-handled support tickets cost $2.70–$60 each, depending on channel and complexity. AI-resolved tickets cost $0.50–$2.37 — roughly a 12× reduction.[7] For a company handling 10,000 tickets per month at an average human cost of $15, shifting 50% to AI at $1.50 each saves $67,500 monthly, or $810,000 annually.
But the savings evaporate when the AI gives wrong answers. Each wrong answer generates at minimum one additional human interaction to correct it — and often more, as the customer's trust resets to zero and the subsequent conversation takes longer than a first-contact resolution would have.
Gartner projected in August 2022 that conversational AI would reduce contact center labor costs by $80 billion by 2026.[39] It simultaneously projected that agentic AI will autonomously resolve 80% of common customer service issues by 2029.[39] These projections assume knowledge pipelines that work. The 95% failure rate for enterprise AI pilots[6] suggests most organizations will not capture this value.
Forrester's Total Economic Impact study of a correctly implemented AI support platform (Sprinklr Service) documented 210% ROI over three years with under six months to payback, $2.1 million in savings, a 50% to 98% improvement in response rate, and $336,000 saved by retiring legacy tools.[40] Those returns are real — for the organizations that get the knowledge pipeline right. For the ones that don't, the investment becomes another line item in the 30% of generative AI projects abandoned after proof of concept.[29]
The hidden cost is the one that doesn't appear on any dashboard: the cost of bad data propagating through an AI system. Thomas Redman estimated in Harvard Business Review that bad data costs the U.S. economy $3.1 trillion per year, and that the average knowledge worker spends 50% of their time dealing with data quality issues.[33] That was 2016. The introduction of AI into every knowledge workflow has not reduced that figure. It has amplified it.
Capital One and Forrester's 2024 Enterprise Data Leader Survey of 500 data leaders found that 73% cited data quality as the primary barrier to AI success.[31] Not model capability. Not compute cost. Not talent. Data quality.
---
How do you fix the AI customer support knowledge pipeline?
The fix is not a better model. It is not a bigger context window. It is not fine-tuning. NVIDIA's research team built enterprise chatbots and reported that fine-tuning "didn't help much" — pipeline engineering dominated every improvement cycle.[14] The fix is a knowledge ingestion pipeline that treats every document, article, and past ticket as a source that must be cleaned, structured, validated, and continuously maintained.
Gartner's Kim Hedlin, in a December 2024 survey of 187 customer service leaders, put it directly: "Service and support leaders cannot ignore existing issues with knowledge management."[10] That survey found that 61% have a backlog of knowledge articles to edit and one-third have no formal process for revising outdated articles.[10]
At Tricky Wombat, we build the knowledge ingestion pipeline that sits between your raw content and your AI system. The pipeline is where accuracy is won or lost. Here's what it must get right.
1. Source ingestion that cleans and structures, not just imports
Most AI support systems ingest knowledge bases as flat document dumps — every article, every FAQ, every legacy help page poured into a vector store without distinction. Emil Sorensen, studying 100+ production RAG deployments at kapa.ai, found that teams that "dump their entire knowledge base" consistently underperform teams that curate primary sources.[13] The reason is straightforward: vector similarity search retrieves the most textually similar chunk, which is often an outdated or redundant version of the answer, not the authoritative one.
Tricky Wombat's ingestion pipeline identifies contradictions, duplicates, and outdated content before anything enters the retrieval layer. Source documents are decomposed using sentence-boundary chunking — which a 2025 study by Horváth et al. found outperforms more complex semantic chunking methods[11] — and tagged with metadata including recency, source authority, and topic scope. Content that contradicts other content is flagged for human review, not silently ingested.
2. Retrieval architecture that validates, not just fetches
The retrieval step is where Air Canada's failure occurred. The system found the right document. It synthesized the wrong answer. A retrieval pipeline that only fetches similar content without validating consistency between the retrieved chunks and the generated answer will reproduce this failure mode at scale.
NVIDIA's FACTS framework identifies 15 control points in a RAG pipeline, most of which are retrieval and validation steps, not model parameters.[14] The PMC/NIH clinical study demonstrated that adding re-ranking and relevance validation to the same model pushed accuracy from 50% to 87%.[5] Tricky Wombat implements hybrid retrieval — combining sparse keyword search with dense vector search and a re-ranking layer — because research shows this architecture outperforms either approach alone. A re-ranker as small as 4.3 million parameters can outperform models 100× its size when the retrieval pipeline is correctly architected.[11]
3. Continuous monitoring that catches drift before customers do
Knowledge pipelines decay. Products change. Policies update. A help article that was accurate in January is misleading by April. Research on temporal knowledge in RAG systems shows that recency-aware retrieval configurations dramatically outperform static embeddings on time-sensitive queries.[11] Without active re-indexing, retrieval accuracy degrades as the knowledge base grows stale.
The RAGAS evaluation framework provides three core metrics for monitoring RAG system quality: faithfulness (does the answer match the source?), answer relevance (does the answer address the question?), and context relevance (did the system retrieve the right sources?).[12] Faithfulness is the metric that catches Air Canada–type failures, where the answer contradicts the cited source. Tricky Wombat runs continuous evaluation against these metrics, re-processes source material on a defined schedule, and verifies citations in every generated response. The system doesn't just answer — it proves its answer from the source material, and flags when it can't.
The result is a knowledge pipeline that improves over time rather than degrading — because every customer interaction becomes a signal about knowledge quality, not just a ticket to be closed.
---
The bottom line
Air Canada had the right document. NYC MyCity had 2,000 pages of official guidance. Chevrolet of Watsonville had nothing at all. DPD had a system update that severed the pipeline overnight. Four companies, four distinct failure modes, one shared root cause: no engineering discipline around the knowledge that the AI depends on.
The organizations getting this right — Synthesia, Breathe, Databox — are not using different AI. They are using the same model architectures with fundamentally different knowledge pipelines: curated, structured, continuously maintained, and measured against retrieval-specific metrics. The 37-percentage-point accuracy swing between a basic and optimized pipeline on the same LLM[5] is not a marginal improvement. It is the difference between a system that resolves 55% of tickets autonomously and one that agrees to sell a $58,195 SUV for a dollar.
Gartner's multi-agent system inquiries surged 1,445% from Q1 2024 to Q2 2025.[41] The AI agents market is projected to grow from $7.84 billion to $52.62 billion at a 46.3% CAGR.[42] Agentic AI is projected to resolve 80% of common customer service issues by 2029.[39] Every one of those projections assumes a knowledge pipeline that works. The organizations that build that pipeline will capture an $80 billion cost reduction.[39] The ones that don't will join the 95% of enterprise AI pilots that delivered nothing — and their chatbots will keep writing poetry about how terrible the company is.
---
FAQ (10)
- What is a knowledge pipeline in AI customer support?
A knowledge pipeline is the end-to-end system that ingests, cleans, structures, indexes, retrieves, and validates the information an AI support agent uses to answer questions. It includes document parsing, chunking strategy, embedding generation, vector storage, retrieval logic, re-ranking, and faithfulness validation. Barnett et al. identified seven failure points in this pipeline, with four originating in the retrieval and knowledge layer rather than the AI model itself.[17]
- Why does AI customer support hallucinate wrong answers?
AI hallucination in customer support occurs when the language model generates plausible-sounding information that isn't grounded in the source knowledge base. A JMIR peer-reviewed study found that RAG systems with curated knowledge bases reduced hallucination to 0% (GPT-4), while conventional chatbots without structured retrieval hallucinated approximately 40% of the time.[8] The primary cause is retrieval failure — the system either fetches the wrong source, fetches no source, or synthesizes an answer that contradicts its own source material.
- How much does AI customer support cost per ticket compared to human agents?
AI-resolved support tickets cost $0.50–$2.37 each, compared to $2.70–$60 for human-handled tickets, representing roughly a 12× cost reduction.[7] However, these savings depend on answer accuracy. Incorrect AI answers generate additional human contacts that can exceed the original cost. Forrester documented 210% ROI over three years for a correctly implemented AI support platform.[40]
- What percentage of enterprise AI projects fail?
MIT Media Lab's Project NANDA found that 95% of enterprise generative AI pilots fail to deliver measurable P&L impact.[6] Separately, Gartner predicted in July 2024 that 30% of generative AI projects would be abandoned after proof of concept by end of 2025, based on a survey of 822 business leaders.[29] The primary cited obstacles are data quality (43%), data governance (43%), and lack of AI-ready data (35%), per Informatica CDO Insights 2025.[31]
- Do customers trust AI customer support?
No — not yet. Gartner's July 2024 survey of 5,728 customers found that 64% would prefer companies didn't use AI for customer service, and 53% said they'd switch to a competitor over it.[4] A Kinsta/Propeller Insights survey of 1,011 U.S. consumers found that 84% believe humans are more accurate than AI, and 49.6% would cancel a service entirely because of AI-driven support.[20] Trust rebuilds when AI can explain its reasoning — but only 37% of companies currently provide that capability.[23]
- What is retrieval-augmented generation (RAG) and why does it matter for customer support?
RAG is an AI architecture that retrieves relevant documents from a knowledge base and uses them to ground the language model's response. It reduces hallucination by approximately 71% compared to ungrounded generation, according to Vectara's benchmarks.[18] Gartner included RAG in its 2024 Hype Cycle for Generative AI, and NVIDIA's FACTS framework identifies 15 control points in a RAG pipeline where accuracy can be improved — most of which are in the retrieval layer.[14]
- How do you measure whether an AI knowledge pipeline is working?
The RAGAS evaluation framework, published at EACL 2024, provides three core metrics: faithfulness (does the answer match the source?), answer relevance (does the answer address the question?), and context relevance (did the system retrieve the right sources?).[12] Faithfulness is the metric that catches cases like Air Canada, where the AI cited the correct policy page but stated the opposite of what it contained.
- What is the Gartner prediction for AI in customer service by 2029?
Gartner predicts that agentic AI will autonomously resolve 80% of common customer service issues without human intervention by 2029, per a March 5, 2025 press release.[39] The same analysis projects a 30% reduction in customer service operational costs. These projections assume functional knowledge pipelines — a condition that 85% of customer service leaders' organizations do not currently meet, given their knowledge management backlogs.[10]
- How does knowledge base quality affect AI accuracy?
Directly and dramatically. A 2025 PMC/NIH study showed that the same LLM (Gemini 1.0 Pro) swung from 50% to 87% accuracy based solely on retrieval pipeline design.[5] Synthesia discovered during AI deployment that their knowledge base "was clearly contradicting itself" and had to rebuild it entirely before achieving a 55% resolution rate.[9] Andrew Ng's research found that data-centric approaches can triple model performance without any model changes.[34]
- What is the market size for AI agents in customer service?
MarketsandMarkets projects the AI agents market will grow from $7.84 billion to $52.62 billion by 2030 at a 46.3% CAGR.[42] The broader AI for customer service market is projected to reach $47.82 billion by 2030 at a 25.8% CAGR, from $12.06 billion in 2025.[43] Gartner reported a 1,445% surge in multi-agent system inquiries from Q1 2024 to Q2 2025.[41] ---
▶References (43)
- ↩Futurism, "Car Dealership AI Chatbot Agrees to Sell Tahoe for $1," December 2023. https://futurism.com/the-byte/car-dealership-ai
- ↩IDC, "StorageSphere Forecast, 2024," 2024. https://my.idc.com/getdoc.jsp?containerId=US52554924
- ↩Bloomfire, "The Value of Enterprise Intelligence 2025," (sponsored content published in HBR), April 24, 2025. https://hbr.org/sponsored/2025/04/how-knowledge-mismanagement-is-costing-your-company-millions
- ↩Gartner, "Gartner Survey Finds 64 Percent of Customers Would Prefer That Companies Didn't Use AI for Customer Service," July 9, 2024. https://www.gartner.com/en/newsroom/press-releases/2024-07-09-gartner-survey-finds-64-percent-of-customers-would-prefer-that-companies-didnt-use-ai-for-customer-service
- ↩PMC/NIH, "Comparative Evaluation of Advanced Chunking for Retrieval-Augmented Generation in Large Language Models for Clinical Decision Support," 2025. https://pmc.ncbi.nlm.nih.gov/articles/PMC12649634/
- ↩MIT Media Lab Project NANDA, "The GenAI Divide," July 2025. Covered by Fortune, August 18, 2025. https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/
- ↩Lorikeet, citing LiveChatAI analysis, "Customer Service Cost Per Ticket," 2025. https://www.lorikeetcx.ai/articles/customer-service-cost-per-ticket
- ↩Nishisako, Higashi, Wakao, "RAG Hallucination Rates in Clinical Settings," JMIR, September 11, 2025. https://cancer.jmir.org/2025/1/e70176
- ↩Intercom, "Synthesia Case Study," 2024. https://fin.ai/customers/synthesia
- ↩Gartner, "85 Percent of Customer Service Leaders Will Explore or Pilot Customer-Facing Conversational GenAI in 2025," December 9, 2024. https://www.gartner.com/en/newsroom/press-releases/2024-12-09-gartner-survey-reveals-85-percent-of-customer-service-leaders-will-explore-or-pilot-customer-facing-conversational-genai-in-2025
- ↩Horváth et al., "Evaluation of RAG Retrieval Chunking Methods," Superlinked VectorHub, 2025. https://superlinked.com/vectorhub/articles/evaluation-rag-retrieval-chunking-methods ; Grofsky, "Solving Freshness in RAG," arXiv:2509.19376, September 2025. https://arxiv.org/abs/2509.19376
- ↩Es et al., "RAGAS: Automated Evaluation of Retrieval Augmented Generation," EACL 2024. https://arxiv.org/abs/2309.15217
- ↩Emil Sorensen, "RAG Best Practices," kapa.ai blog, November 2024. https://www.kapa.ai/blog/rag-best-practices
- ↩Akkiraju et al., "FACTS: A RAG Framework for Enterprise," arXiv:2407.07858, NVIDIA, July 10, 2024. https://arxiv.org/abs/2407.07858
- ↩CBC News, "Air Canada Chatbot Lawsuit — Moffatt v. Air Canada (2024 BCCRT 149)," February 2024. https://www.cbc.ca/news/canada/british-columbia/air-canada-chatbot-lawsuit-1.7116416
- ↩The Markup, "NYC AI Chatbot Telling Businesses to Break the Law," January 30, 2026. https://themarkup.org/artificial-intelligence/2026/01/30/mamdani-to-kill-the-nyc-ai-chatbot-we-caught-telling-businesses-to-break-the-law
- ↩Barnett et al., "Seven Failure Points When Engineering a Retrieval Augmented Generation System," IEEE/ACM CAIN 2024. https://arxiv.org/abs/2401.05856
- ↩Vectara, "Hallucination Evaluation Model Leaderboard," continuously updated. https://github.com/vectara/hallucination-leaderboard
- ↩Panopto, "Workplace Knowledge and Productivity Report," July 2018. https://www.panopto.com/company/news/inefficient-knowledge-sharing-costs-large-businesses-47-million-per-year/
- ↩Kinsta / Propeller Insights, "AI vs Human Customer Service Survey," April 2025. https://kinsta.com/blog/ai-vs-human-customer-service/
- ↩Salesforce, "State of Service, 7th Edition," November 13, 2025. https://www.salesforce.com/news/stories/state-of-service-report-announcement-2025/
- ↩Salesforce, "State of Service, 6th Edition," 2024. https://www.salesforce.com/resources/articles/state-of-service-inside-customer-service-trends/
- ↩Zendesk, "CX Trends 2026 Report," November 2025. https://www.zendesk.com/newsroom/press-releases/contextual-intelligence-becomes-the-new-standard-for-exceptional-customer-experience-in-2026/
- ↩Time, "AI Chatbot DPD Curses, Criticizes Company," January 18, 2024. https://time.com/6564726/ai-chatbot-dpd-curses-criticizes-company/
- ↩McKinsey, "Building Trust: How Customer Care Leaders Pull Ahead with AI," February 23, 2026. https://www.mckinsey.com/capabilities/operations/our-insights/building-trust-how-customer-care-leaders-pull-ahead-with-ai
- ↩Forrester, "2024 US Customer Experience Index," June 17, 2024. https://www.forrester.com/press-newsroom/forrester-2024-us-customer-experience-index/
- ↩Qualtrics XM Institute, "2025 Consumer Trends," November 19, 2024. https://www.qualtrics.com/news/bad-customer-experiences-put-nearly-4-trillion-at-risk-in-global-sales/
- ↩Zendesk, "CX Trends 2025 Report," November 20, 2024. https://www.zendesk.com/newsroom/articles/2025-cx-trends-report/
- ↩Gartner, "Gartner Predicts 30 Percent of Generative AI Projects Will Be Abandoned After Proof of Concept by End of 2025," July 29, 2024. https://www.gartner.com/en/newsroom/press-releases/2024-07-29-gartner-predicts-30-percent-of-generative-ai-projects-will-be-abandoned-after-proof-of-concept-by-end-of-2025
- ↩Gartner, "Generative AI Is Now the Most Frequently Deployed AI Solution in Organizations," May 7, 2024. https://www.gartner.com/en/newsroom/press-releases/2024-05-07-gartner-survey-finds-generative-ai-is-now-the-most-frequently-deployed-ai-solution-in-organizations
- ↩
- ↩Gartner, "Data Quality Topics," 2020 survey. https://www.gartner.com/en/data-analytics/topics/data-quality
- ↩Thomas Redman, "Bad Data Costs the U.S. $3 Trillion Per Year," Harvard Business Review, September 2016. https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year
- ↩Andrew Ng, NeurIPS 2021 Data-Centric AI Workshop; IEEE Spectrum interview, 2022. https://neurips.cc/virtual/2021/workshop/21860
- ↩IBM Institute for Business Value, "From AI Projects to Profits," June 9, 2025. https://www.ibm.com/thought-leadership/institute-business-value/en-us/report/agentic-ai-profits
- ↩Intercom, "Breathe Case Study," 2024. https://www.intercom.com/customers/breathe
- ↩Intercom, "Databox Case Study," 2024. https://fin.ai/customers/databox
- ↩Brynjolfsson, Li, Raymond, "Generative AI at Work," Quarterly Journal of Economics, Vol. 140, Issue 2, May 2025. https://academic.oup.com/qje/article/140/2/889/7990658
- ↩Gartner, "Conversational AI Will Reduce Contact Center Agent Labor Costs by $80 Billion by 2026," August 31, 2022; "Agentic AI Will Autonomously Resolve 80 Percent of Common Customer Service Issues by 2029," March 5, 2025. https://www.gartner.com/en/newsroom/press-releases/2025-03-05-gartner-predicts-agentic-ai-will-autonomously-resolve-80-percent-of-common-customer-service-issues-without-human-intervention-by-20290
- ↩Forrester, "Total Economic Impact of Sprinklr Service," May 2024. https://www.sprinklr.com/blog/total-economic-impact-customer-service/
- ↩Gartner, "Multi-Agent Systems," 2025. https://www.gartner.com/en/articles/multiagent-systems
- ↩MarketsandMarkets, "AI Agents Market," 2025. https://www.marketsandmarkets.com/Market-Reports/ai-agents-market-15761548.html
- ↩MarketsandMarkets, "AI for Customer Service Market," February 2025. https://www.marketsandmarkets.com/Market-Reports/ai-for-customer-service-market-244430169.html
By Tricky Wombat
Last Updated: Mar 29, 2026