RAG as a service is the rational default for enterprise AI
How managed retrieval platforms reach production faster than in-house builds, without a second engineering team

Additional documents available for download
Enterprises put $37 billion into generative AI in 2025, yet only 17% of organizations attribute more than 5% of EBIT to it.[1][2] The technology works. Retrieval-augmented generation has crossed from experiment to production baseline, a $1.94 billion market in 2025 growing at 38.4% a year.[3] The gap between what companies spend and what they earn back is not a model problem. It sits one layer down, in the retrieval infrastructure that decides whether the same language model returns a cited answer or a confident fabrication. That layer is expensive, specialized, and unforgiving to build in house. For most enterprises, RAG as a service is the rational choice.
What does RAG as a service actually decide for an enterprise?
Retrieval-augmented generation gives a language model access to your documents at query time. The model retrieves relevant passages from a knowledge base, then generates an answer grounded in what it found. RAG as a service is the managed version: a provider operates the ingestion, retrieval, access control, and evaluation infrastructure, and you connect your data and configure the workflow. The alternative is building and running that infrastructure yourself.
The category stopped being experimental in 2025. Practitioner consensus marked the year RAG became a production baseline, with the field converging on a blunt phrase: naive RAG fails in production.[4] The market reflects the shift. RAG reached $1.94 billion in 2025 and is projected to hit $9.86 billion by 2030 at a 38.4% compound annual growth rate, with cloud-based deployments capturing the majority of market share.[3] This is infrastructure spending, not pilot money.

Tricky Wombat made with Google Gemini 3.1 Flash Image, Jun 2026
SourceWhat does the data show about whether RAG works?
It works, and it pays back, when the infrastructure underneath is sound. Worker access to AI rose 50% in 2025, and 66% of organizations reported productivity and efficiency gains.[5] Buyers are converting at rates that traditional software never saw: 47% of enterprise AI deals reach production, against 25% for conventional SaaS.[1] These are serious deployments, not science projects.
The persistent gap is quality, and it shows up most clearly where accuracy is non-negotiable. A 2025 Stanford Law School study ran legal research queries through leading RAG-powered legal AI tools and documented hallucination rates of 17% to 33%, including fabricated cases with plausible-sounding names, dates, and reasoning.[6] These tools had a knowledge base. They retrieved from it. They still invented law one to three times out of every ten queries. Adding documents to a model does not, by itself, produce a reliable answer. The retrieval and verification layer is what does.
What are practitioners reporting?
Developers are using these tools and trusting them less. The 2025 Stack Overflow survey of roughly 49,000 respondents across 177 countries found 84% using or planning to use AI tools and 52% reporting a positive productivity effect.[7] Trust moved the other way. Confidence in AI accuracy fell from 40% to 29% year over year, and the top frustration, cited by 66%, was AI solutions that are almost right but not quite.[7] Almost right is the signature of a retrieval layer that surfaces plausible but wrong context.
A 2025 interview study of enterprise RAG practitioners ranked their priorities, and the order is telling: data protection, answer quality, and security topped the list, while fairness and bias concerns ranked last.[^7-arxiv] Practitioners who run these systems in production worry about reliability and access control, not the model. That hierarchy is the whole argument in miniature.
What does production RAG look like in the real world?
The cases that matter are not demos. They are organizations with large document estates, regulated data, and real stakes, that either unlocked latent value or recovered measurable time. Three of them show what solved retrieval infrastructure delivers, and the common thread is that none of them won on model choice.
Morgan Stanley: a 350,000-document library that sat 80% invisible
Morgan Stanley's wealth management division held a research library of 350,000 documents, and for years most of it was effectively unreachable. Document retrieval efficiency sat at 20%, which meant that for most queries the knowledge base might as well not have existed, and a question requiring deep synthesis could consume 30 minutes. The firm partnered with OpenAI and deployed a RAG-based assistant on secure proprietary infrastructure rather than building the retrieval stack from scratch. Retrieval efficiency rose from 20% to 80%, accessible document capacity expanded from 7,000 to more than 100,000, and adoption reached 98% of wealth management advisors.[9][10][8] The advantage was never the model. The proprietary knowledge base already existed. A purpose-built, maintained retrieval platform is what made its value accessible, and the firm got there without standing up a data-science department to own the pipeline.

Tricky Wombat made with Google Gemini 3.1 Flash Image, Jun 2026
SourceA major European bank: ROI in two months
A major European bank ran an audit and compliance function that consumed heavy manual labor, creating backlogs and high headcount cost. Rather than build retrieval and provenance tooling internally, the bank deployed a managed RAG platform with full citation and provenance trails, configured to its audit workflows. It saved over EUR 20 million across three years and freed the workload equivalent of 36 full-time employees, and it reached positive ROI two months after deployment.[11] Two months to payback is faster than many enterprise IT projects clear procurement. That speed is the structural advantage of a managed platform: the work was workflow configuration, not foundational engineering. The bank's name is not disclosed because the case is vendor-published, so read the figures as reported rather than independently audited, but the shape of the result is consistent across the managed deployments in this category.
NHS England: 43 minutes a day across 30,000 workers
NHS England faced administrative burden eating hours of staff time each week and identified automation as its highest-leverage path to a 2% annual productivity target. NHS trusts do not have machine-learning engineering departments, so building a custom retrieval stack was never realistic. The trial deployed Microsoft 365 Copilot across 90 organizations and more than 30,000 workers, working against the knowledge base already living in Microsoft 365 and within NHS data governance rules. Staff recovered an average of 43 minutes per working day, the equivalent of five weeks a year, and acute-trust productivity rose 2.7% between April 2024 and March 2025, beating the 2% target with AI cited as a key enabler.[12] The arithmetic is the memorable part. 43 minutes across 30,000 workers is more than 21,000 staff-hours recovered every working day, before the rollout reached half its target scale.
What pattern emerges across these cases?
Three different industries, three different data estates, one shared variable. None of these organizations won by selecting a better model. Morgan Stanley's edge was a proprietary library that retrieval made reachable. The European bank's edge was provenance-tracked workflow automation it configured rather than coded. NHS England's edge was meeting staff where their documents already lived. In every case the model was a commodity available to any competitor through an API, and the result came from the retrieval and governance layer in between. That layer is exactly what a managed service operates on your behalf. The market read the same pattern: 76% of enterprise AI use cases are now purchased rather than built, up from 53% in a single year.[1]

Tricky Wombat made with Google Gemini 3.1 Flash Image, Jun 2026
SourceWhat happens at the organizational level when RAG scales?
Zoom out from individual wins and the dominant pattern is a gap between usage and value. Enterprise AI spending hit $37 billion in 2025, and 88% of organizations now use AI in at least one function, yet only 17% attribute more than 5% of EBIT to generative AI.[1][2] Wide adoption, thin financial return. The organizations closing that gap are not the ones with privileged model access, because there is no such thing. They are the ones that built or bought production-grade retrieval infrastructure and redesigned the work around it.

Tricky Wombat made with Google Gemini 3.1 Flash Image, Jun 2026
SourceWhat do practitioners consistently report breaks first?
The same answer keeps surfacing across practitioner forums, vendor guides, and the academic interview study: systems fail below the model. The interview study of enterprise RAG practitioners ranked data protection, answer quality, and security as the top requirements.[^7-arxiv] The 2025 year-end practitioner review captured the lived experience as organizations that feel they cannot live without RAG yet remain unsatisfied, because stable results on complex multi-part queries demand extensive tuning most internal teams cannot sustain.[4] Deloitte's 2026 survey names the constraint directly: insufficient worker skills is the single biggest barrier to integrating AI into workflows, which is why 53% of organizations launched workforce AI education programs and 36% went looking for specialized talent.[5] The skills gap is a build-versus-buy signal. Teams that cannot hire the expertise to run retrieval infrastructure are the teams that should not be operating it themselves.
What drives the gap between strong and weak outcomes?
The cost structure of an in-house build explains most of the failures. Data cleaning and preprocessing is the largest single line item in a RAG project, a reported 30% to 50% of total cost in 2026 consultancy analyses, and it is the category teams most consistently underestimate. The hardest engineering task is not the language model. It is the access-control layer, role-based document filtering synchronized from every source system, so that a user only ever retrieves what they are cleared to see. Underestimate that layer and you ship a system with security gaps or spend months re-engineering it after launch, at a reported three to five times the cost of building it in upfront.
The compounding works both ways. Get retrieval and governance right and payback arrives in months, as the European bank and NHS cases show. Get it wrong and the project joins the abandonment statistics: McKinsey found 42% of organizations abandoned most of their AI initiatives in 2025, up from 17% the year before, and more than 80% report no material EBIT effect from generative AI despite near-universal adoption.[13] The doubling of the abandonment rate in a single year is not a model-quality story. The models got better that year. The infrastructure discipline did not keep pace.
Why is the model not the variable that decides RAG outcomes?
Here is the reframe, and it changes how the build-versus-buy question should be asked. Every buyer can call the same top-tier models through an API. The model is a commodity. What separates a deployment that pays back in two months from one abandoned within a year is the retrieval infrastructure underneath it: how documents are parsed, chunked, and indexed, how access control is synchronized across source systems, how provenance is tracked so every answer carries its source, and how outputs are evaluated on every release. That is a data-engineering discipline, and it is the single layer that managed RAG platforms exist to maintain.
The same technology produces systematically different results depending on the quality of the retrieval infrastructure, not the choice of model.
Gartner makes the same point from the data side. It attributes its predicted 60% project abandonment through 2026 specifically to the absence of AI-ready data, defined as data that is actively governed at the asset level, supported by automated pipelines with quality gates, managed through live metadata, and continuously quality-assured.[14] Those are properties of data engineering, not properties of a model. McKinsey reaches the conclusion from the value side: the dominant predictor of whether a generative AI deployment delivers measurable ROI is whether the organization redesigned its workflow before deploying, and only 17% of respondents attribute 5%-plus EBIT impact to AI, and they are distinguished by how they embedded it into redesigned operations rather than by which model they bought.[13] Two of the most cited analyst houses in enterprise technology, looking at different ends of the problem, land on the same below-the-model variable.

Tricky Wombat made with Google Gemini 3.1 Flash Image, Jun 2026
SourceWhat does a successful implementation reveal about the reframe?
LinkedIn proves the mechanism inside a single system. Its customer-service engineering team could not reliably retrieve relevant historical tickets, because traditional search treated each ticket as flat text and lost the tree structure and the relationships between issues. The team built a knowledge-graph-enhanced RAG system: historical tickets were parsed into graph structures, and at query time the system combined embedding-based similarity search with graph-traversal queries. Retrieval precision, measured as mean reciprocal rank, improved 77.6%, from 0.522 to 0.927. Recall at the top three results rose from 64% to 100%, which means that before the rebuild the correct answer was missing from the first three results a third of the time. Median issue-resolution time fell 28.6%, from seven hours to five.[15] The model did not change. The team restructured how knowledge was represented and retrieved, and every gain followed from that.
Grab shows the same logic in a different domain. Its analytics function faced rising query volumes, with analysts rebuilding the same SQL repeatedly and fraud investigators spending hours tracing patterns across complex data. The team packaged the analytical queries analysts already trusted into a retrievable layer and built a RAG-driven bot to select and run the right query for an investigation. Automated reports saved an estimated three to four hours each, and fraud investigations dropped from hours to minutes.[16] The value was not a more powerful model. It was making existing institutional knowledge retrievable. Both LinkedIn and Grab are elite in-house engineering builds, which is the point worth naming plainly: they demonstrate that infrastructure decides outcomes, and they also demonstrate the level of specialized engineering it takes to build that infrastructure yourself. Most enterprises do not have those teams. That is the case for buying.

Tricky Wombat made with Google Gemini 3.1 Flash Image, Jun 2026
SourceWhat does building RAG in-house actually cost?
Both sides of the ledger point the same direction. On the build side, a production-grade enterprise RAG system with multiple source integrations, role-based access control, single sign-on, audit logging, and compliance support costs a reported $40,000 to $80,000 to build, and that build is a one-time charge. The operating cost is the part that runs forever. At enterprise scale, a system spanning 100,000-plus documents costs a reported $8,100 to $19,500 a month to operate across vector database, model API calls, embedding API, infrastructure, and monitoring. Add allocated engineering time and the all-in total cost of ownership reaches a reported $14,000 to $27,000 a month. In-house builds typically need three to five engineers for three to six months of initial development, plus 20% to 40% of that team's ongoing time for operations, and a specialized RAG engineer bills at a reported $700 to $1,400 a day. None of those figures includes the second build almost everyone forgets: the evaluation suite and CI/CD quality gates that keep the system from quietly regressing into fabrication, which is itself a separate engineering workstream.
The return side is just as concrete when the infrastructure holds. The European bank cleared its investment in two months and freed 36 full-time-equivalents over three years.[11] NHS England recovered 43 minutes per worker per day across more than 30,000 staff.[12] Worked models of production deployments report sub-year payback and three-year ROI above 200%, consistent with those audited-in-public cases even where the underlying consultancy data is not independently verified. The economics connect straight back to the reframe. Managed platforms amortize the access-control and evaluation complexity across their entire customer base. An in-house team pays that complexity in full, every time, and then pays to maintain it indefinitely. That is why the recurring cost, not the build cost, is where most build-versus-buy business cases quietly break.

Tricky Wombat made with Google Gemini 3.1 Flash Image, Jun 2026
SourceHow do you fix the retrieval infrastructure problem?
The thesis restated: RAG outcomes are decided by retrieval infrastructure, not model selection, and that infrastructure is costly and specialized enough that most enterprises should buy it rather than build it. Tricky Wombat operates that pipeline as a service. The point is not a better model, because the model is the same one a competitor can call. The point is getting the pipeline right, on three requirements where in-house builds most often fail.
1. Ingestion that treats preprocessing as the main event
Most systems treat ingestion as a quick first step and spend their engineering attention on prompts and model selection, which inverts the actual cost structure where preprocessing is the largest line item. We run parse-transform-index as a maintained pipeline: documents from every source are parsed, normalized, chunked with structure preserved, and enriched with metadata before they ever reach the retriever. When source data is clean, classified, and governed, retrieval performs near its ceiling. When it is not, no amount of model tuning recovers it.
2. Access control synchronized from every source system
Most systems bolt on document filtering after launch, which is how security gaps ship and how teams end up re-engineering governance at three to five times the cost. We synchronize role-based access control from each source system into the retrieval layer, so a query only ever returns documents the user is cleared to see, and permission changes propagate without a re-index. This is the hardest task in enterprise RAG, and it is the one we maintain rather than hand to your team.
3. Provenance and evaluation on every answer
Most systems generate an answer and stop, leaving no trail of where it came from and no check on whether it is grounded, which is how a deployment passes its demo and then fabricates in production. Every answer we return carries its source citations, and every release runs against an evaluation suite that measures faithfulness, relevancy, and hallucination rate before it reaches users. That evaluation build is the second engineering workstream most in-house projects omit from their estimates entirely.
The pipeline does not stop at deployment. We monitor retrieval quality continuously, re-process documents as sources change so answers stay current, and verify citations on an ongoing basis rather than at launch only. Because the access-control and evaluation complexity is amortized across every customer we serve, the system improves with each deployment, and yours benefits from work it never had to fund.
The bottom line
Across every case here, the organizations that earned a return did not win on the model. Morgan Stanley unlocked a library it already owned, the European bank configured provenance into a workflow it already ran, NHS England met staff where their documents already lived, and LinkedIn and Grab restructured knowledge that already existed. The model was a commodity in all five. The retrieval infrastructure underneath was the variable.
That is the principle worth carrying into any AI initiative: outcomes follow information infrastructure, not model selection. The companies treating RAG as a model decision will keep landing in the 80% reporting no EBIT effect and the 42% abandoning projects within a year.[13] The companies treating it as a retrieval-infrastructure discipline will keep clearing payback in months. Agentic and multimodal retrieval are the next tier, and both require production-grade retrieval as their foundation. Build the foundation badly and there is nothing to stack on it. The organizations that understand where the value actually lives will own the next wave. The ones still shopping for a better model will be explaining, a year from now, why the pilot never scaled.
▶References (16)
- ↩Menlo Ventures, "2025: The State of Generative AI in the Enterprise," November 2025. https://menlovc.com/perspective/2025-the-state-of-generative-ai-in-the-enterprise/
- ↩McKinsey & Company, "The State of AI," 2025. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
- ↩MarketsandMarkets, "Retrieval-Augmented Generation (RAG) Market Report 2025–2030," 2025. https://www.marketsandmarkets.com/PressReleases/retrieval-augmented-generation-rag.asp
- ↩RAGFlow, "From RAG to Context: A 2025 Year-End Review of RAG," December 2025. https://ragflow.io/blog/rag-review-2025-from-rag-to-context
- ↩Deloitte, "State of AI in the Enterprise 2026," August–September 2025. https://www.deloitte.com/us/en/what-we-do/capabilities/applied-artificial-intelligence/content/state-of-ai-in-the-enterprise.html
- ↩Stanford Law School, "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools," Journal of Empirical Legal Studies, 2025. https://law.stanford.edu/publications/hallucination-free-assessing-the-reliability-of-leading-ai-legal-research-tools/
- ↩Stack Overflow, "2025 Developer Survey: AI," 2025. https://survey.stackoverflow.co/2025/ai/
- ↩AI Expert Network, "Case Study: AI at Morgan Stanley – Reshaping the Future of Financial Services," 2025. https://aiexpert.network/ai-at-morgan-stanley-2025/
- ↩Reruption, "Morgan Stanley's AI Debrief: 98% Advisor Adoption Boost," 2025. https://reruption.com/en/knowledge/industry-cases/morgan-stanleys-ai-debrief-98-advisor-adoption-boost
- ↩OpenAI, "Morgan Stanley uses AI evals to shape the future of financial services," 2025. https://openai.com/index/morgan-stanley/
- ↩Squirro, "Success Story: Unlocking ROI in Months with Squirro," 2026. https://squirro.com/unlocking-roi-in-months-with-squirro
- ↩UK Government / NHS England, "Major NHS AI Trial Delivers Unprecedented Time and Cost Savings," GOV.UK, October 21, 2025. https://www.gov.uk/government/news/major-nhs-ai-trial-delivers-unprecedented-time-and-cost-savings
- ↩McKinsey & Company, "The State of AI: How Organizations Are Rewiring to Capture Value," 2025. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-how-organizations-are-rewiring-to-capture-value
- ↩Gartner, "Lack of AI-Ready Data Puts AI Projects at Risk," February 26, 2025. https://www.gartner.com/en/newsroom/press-releases/2025-02-26-lack-of-ai-ready-data-puts-ai-projects-at-risk
- ↩arXiv, "Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering," SIGIR 2024. https://arxiv.org/abs/2404.17723
- ↩Grab Engineering Blog, "Leveraging RAG-powered LLMs for analytical tasks," 2024. https://engineering.grab.com/transforming-the-analytics-landscape-with-RAG-powered-LLM
By Tricky Wombat
Last Updated: Jun 27, 2026