Solving the Enterprise AI Build-Buy Dilemma

A build vs buy enterprise AI framework that shows the true cost before you commit capital you cannot reverse

Video overview

Additional documents available for download

In one year, enterprise AI use cases flipped from 47% built in-house to 24%, while purchased use cases climbed from 53% to 76% [1]. That reversal happened while enterprise generative AI spending more than tripled to $37 billion in 2025 and worldwide AI spending headed toward a projected $2.52 trillion in 2026 [2][1]. The builders did not stop because buying got cheaper. They stopped because the operational demands of running AI in production, especially the retrieval pipelines that ground AI in proprietary data, exposed a cost structure they had not modeled. The build vs buy enterprise AI decision is now a daily capital-allocation problem, and for most organizations buying the infrastructure layer wins. The variable that decides whether either path returns anything is the data foundation underneath.

Key Points

  • Enterprise AI use cases shifted from 47% built / 53% bought in 2024 to 24% built / 76% bought in 2025, the central market signal that in-house build strategies stalled under production demands [1].

Lessons Learned

  • Build only where your proprietary data creates advantage no vendor can replicate. Buy the model and infrastructure layer everywhere else.

Access Our Exclusive Content

Close

Get your PDF of "Solving the Enterprise AI Build-Buy Dilemma"

Unlock access to our premium content by filling out the form below. Get instant access to our whitepapers, ebooks, or webinars and recordings.

* = Required

Tricky Wombat needs the contact information you provide to us to contact you about our products and services. You may unsubscribe from these communications at any time. For information on how to unsubscribe, as well as our privacy practices and commitment to protecting your privacy, please review our Privacy Policy.

What is driving enterprises to stop building their own AI?

The build vs buy enterprise AI decision used to be a once-a-year architecture choice. It is now a recurring operational question that recurs across SaaS, generative AI, and agentic systems, because each new use case forces the same call: assemble the pipeline yourself, or rent it from a vendor who has already built it. The decision turns on six concerns that production systems cannot skip: provenance, concurrency, error handling, performance, security, and data integrity. Each one is cheap to demo and expensive to operate at scale.

Buying means licensing a vendor platform that absorbs the foundation model, the infrastructure, and the lifecycle maintenance. Building means owning all of it: the model orchestration, the retrieval layer, the monitoring, the compliance controls, and the people who keep the system from drifting. Retrieval-Augmented Generation, or RAG, is the juncture where the two paths diverge most sharply. RAG grounds a language model in your own documents and data so it answers from your knowledge rather than its training set. The moment AI has to know your data, you are building or buying a specialized infrastructure layer with its own embedding pipelines, vector databases, and retrieval orchestration. That layer is where build cost stops scaling linearly and starts compounding.

Paired charts showing enterprise AI use cases shifting from 53% bought in 2024 to 76% bought in 2025.
Enterprise AI use cases reversed from majority-built to majority-bought in a single year, even as spending more than tripled.

Tricky Wombat made with Google Gemini 3.1 Flash Image, Jun 2026

Source

What does the data show about build versus buy outcomes?

The measurable gap is wide and consistent. Vendor-led AI implementations reach approximately 67% production success, against approximately 33% for pure in-house builds [3]. Forrester predicted independently that 75% of do-it-yourself agentic AI architecture builds would fail [4]. Two different methodologies, the same direction: building in-house roughly doubles your odds of never reaching production.

The cost data explains why. Post-deployment lifecycle work, maintenance, data governance, model drift management, compliance monitoring, and regression testing, accounts for about two-thirds of total AI system cost over a 3 to 5 year horizon, and total spend often lands at 2 to 3x the initial development cost [5]. That is the structural reason build decisions underperform in long-horizon TCO analysis. The cost the build team estimates is the small cost. The cost that arrives after launch is the one that compounds, and it is the one most initial business cases leave out.

Data readiness sits underneath all of it. 63% of organizations do not have, or are unsure whether they have, the data management practices AI requires [8]. That single gap predicts more failure than any model or vendor choice, which is the thread this article keeps pulling.

What are practitioners reporting from inside these systems?

The people building AI are getting less confident, not more. In the 2025 Stack Overflow Developer Survey of 48,904 professional developers, 84% report using or planning to use AI tools, up from 76% the prior year [10]. Trust moved the other way. Only 29% say they trust AI accuracy, down from 40%, and more developers actively distrust AI outputs than trust them [10]. The top frustration, cited by 66%, is AI that is almost right but not quite, the failure mode that is hardest to catch and most expensive to ship [10]. Organizations attempting to build AI in-house are doing it with engineering teams that are themselves skeptical of the outputs.

Leadership confidence is no higher. Only 25% of enterprises have moved 40% or more of their AI pilots into production, and only 21% report mature governance models for autonomous agents [11]. Across technology leaders, just 39% are confident their current AI investments will have a positive impact on financial performance [7]. High adoption, low conviction. That gap is the lived experience behind the market reversal.

What does the build versus buy decision look like in real organizations?

The aggregate numbers favor buying, but the instructive cases are the ones that cut against the average. The most analytically interesting build in enterprise AI is one that worked, because it shows exactly what conditions building requires, and why almost no one else has them.

JPMorgan Chase: the build that worked because the moat was real

JPMorgan Chase, the largest U.S. bank by assets, runs an $18 billion annual technology budget. Its normal involved two expensive bottlenecks: investment banking analysts spending hours or days producing pitch books and market research, and legal operations consuming 360,000 lawyer-hours a year reviewing commercial credit agreements [12]. Rather than buy, the bank built proprietary AI in-house, deploying more than 450 active agents into production by 2025, including a contract intelligence system to parse credit agreements and presentation agents that generate client materials from structured inputs [12]. Investment banking presentations now generate in roughly 30 seconds, and the contract system cut legal document review errors by 80% while eliminating attorney review of routine agreements [12]. The 360,000 reclaimed lawyer-hours equal about 173 attorney-years, which is the memorable detail and also the point: JPMorgan could build because it held three things at once that almost no other enterprise has together, an enormous technology budget, proprietary financial data no vendor can replicate, and use cases that genuinely differentiate the business. Build only where your data is the moat. That principle is the spine of every case that follows.

Morgan Stanley: the hybrid path at voluntary-adoption scale

Morgan Stanley, a global financial services firm, faced a different version of the same squeeze. Its technology organization was burning developer capacity translating legacy code, and its wealth advisors were losing one to two hours a day to post-call documentation and CRM entry [12]. Instead of building foundation models, the firm built proprietary AI layers on top of vendor models, an approach that lets the vendor absorb the model cost while the firm owns the workflow logic. A code system reclaimed 280,000 developer hours across more than 9 million lines of code reviewed, and a wealth management assistant reached 98% voluntary adoption across advisor teams [12]. Voluntary adoption is the detail that matters. Mandated tools get used and resented. A tool that 98% of advisors choose to use, with no mandate, is one that earns its keep. The hybrid model captured most of the value of building without paying the full cost of owning the foundation.

General Mills: the bought platform that paid back in hard dollars

General Mills, a Fortune 200 food company, was evaluating more than 5,000 daily shipment routing and vendor selection decisions by hand [12]. It deployed an autonomous demand and logistics optimization agent on a vendor platform, able to assess all 5,000-plus daily shipments and execute routing and sourcing without human review [12]. The result was more than $20 million in supply chain savings since fiscal year 2024 [12]. No foundation model team, no in-house retrieval infrastructure, a hard-dollar return. This is the case that the 76% buy rate is made of: the proprietary advantage was the supply chain data and the operational context, not the AI plumbing, so General Mills bought the plumbing and kept its capital pointed at the part that was actually scarce.

What pattern emerges across these cases?

One pattern, repeated. In every case, the vendor absorbed the model and the infrastructure, and the organization invested its own effort in proprietary data and workflow. JPMorgan is the apparent exception that proves the rule, because it built the infrastructure only after meeting conditions, budget, unique data, differentiating use cases, that the survey data says most enterprises cannot meet. The market agrees with the pattern: AI software deals convert to production at 47% versus 25% for traditional SaaS, nearly double, so buying clearly clears the deployment hurdle faster [1]. Yet only 6% of enterprises become AI high performers with measurable financial impact [6]. Faster deployment is not the same as better returns. Buying gets you live. It does not get you paid.

Comparison showing 67% vendor-led production success versus 33% in-house, with a 75% DIY failure reference line.
Vendor-led implementations roughly double the production success rate of pure in-house builds, with Forrester's DIY failure prediction as an independent reference.

Tricky Wombat made with Google Gemini 3.1 Flash Image, Jun 2026

Source

What happens at the organizational level that individual projects miss?

Zoom out from the project to the organization and a cleaner signal appears. Organizations with the highest maturity of AI-ready data and analytics achieve up to 65% better business outcomes, including revenue growth and cost optimization, than low-maturity peers [7]. Those same organizations invest up to four times more of their revenue in foundational data quality, governance, and AI-ready infrastructure [7]. The separator is not which vendor they picked or which model they ran. It is how much they put into the layer beneath the model.

What do users and practitioners consistently report?

The consistent theme across surveys is a deployment gap that money alone does not close. 42% of companies scrapped the majority of their AI initiatives in 2025, up from 17% the prior year, a 2.5x jump, and organizations abandoned an average of 46% of proof-of-concept projects before production [13]. Two-thirds of AI-adopting organizations remain in experiment or pilot mode despite widespread tool deployment [6]. The pattern holds across Deloitte, Menlo Ventures, and S&P Global panels: tools get adopted, pilots get funded, and production stays out of reach for most. The organizations that cross the gap are not the ones that spent the most on models. They are the ones that prepared their data.

What drives the gap between strong and weak outcomes?

Do the math on the build path and the gap explains itself. A custom enterprise AI platform runs $300,000 to $1.5 million-plus in year-one build cost [14]. Layer on the post-deployment reality, two-thirds of total cost arriving after launch and total spend reaching 2 to 3x the initial development figure, and a $750,000 build becomes a multi-year commitment well past $1.5 million before it returns anything [5]. Now compound the failure rate. If only one in three in-house builds reaches production, two of every three build budgets produce no production system at all [3]. The enterprise that builds three systems to ship one has paid for three and shipped one. The enterprise that buys three ships two, at a fraction of the per-system cost, and redirects the difference. Over a portfolio of AI initiatives, that compounding is the difference between joining the 6% and abandoning 46% of your pilots [6][13].

A cost split showing one-third upfront build and two-thirds post-deployment lifecycle cost.
The cost the build team estimates is the visible third. Two-thirds arrives after deployment.

Tricky Wombat made with Google Gemini 3.1 Flash Image, Jun 2026

Source

Why does the same AI model produce different results in different companies?

Here is the reframe the evidence keeps pointing at. The same foundation model, deployed in two enterprises, produces returns in one and abandoned pilots in the other. The variable is not the model. It is not the vendor. It is not even the build-versus-buy choice itself. It is the data infrastructure underneath the AI.

Gartner measured outcomes against investment decisions and found that data foundation maturity, not technology selection, was the isolable differentiator [7]. No equivalent correlation between vendor, model, or architecture choice and outcomes showed up in the same analysis. The organizations that win invest up to four times more in data foundations and earn up to 65% better outcomes [7]. The causal chain is explicit in the failure data: Gartner projects 60% of AI projects will fail in 2026 specifically for lack of AI-ready data, not for model quality, contract terms, or engineering execution [8]. When an agent fails, it usually fails because the data pipeline beneath it cannot sustain accurate retrieval at the rate, scale, and specificity the agent needs.

The model is effectively a constant in outcome variance. The data pipeline is the variable. This is why buying is necessary but not sufficient. Buying solves the deployment problem and buys back the engineering hours you would have spent maintaining pipelines. What you do with those hours, whether you pour them into the data layer that decides accuracy and trust, determines whether you join the 6% or the 46% who abandon their pilots [13][6].

Paired charts contrasting a 10-20-70 AI investment allocation against a typical model-heavy enterprise budget.
Top performers spend only 10% of AI effort on models. Most enterprise budgets invert the ratio.

Tricky Wombat made with Google Gemini 3.1 Flash Image, Jun 2026

Source

What does a successful bought implementation look like?

A major European financial institution shows the reframe in action. Its audit and compliance operations needed systematic risk detection across large volumes of regulatory filings, internal policies, and transaction records. Rather than build, it deployed a vendor RAG platform combining knowledge-graph-augmented retrieval, real-time API connections to live compliance data, granular access control at the retrieval layer, and an LLM-agnostic architecture that lets the model be swapped without rebuilding the pipeline. The institution reported over EUR 20 million saved in three years, ROI within two months of deployment, and capacity equal to 36 full-time employees freed for higher-value work [15]. The detail that matters is the two-month payback. The vendor absorbed the model and the retrieval infrastructure, so the institution spent its effort encoding its own compliance knowledge as graph relationships rather than building embedding pipelines from scratch.

Physics Wallah, a major Indian edtech company, ran the same play at consumer scale. It had built a proprietary content library of over a million Q&As and ten million solved student problems, but needed to make it conversationally accessible across multiple regional languages [16]. It bought the AI foundation rather than building it, deploying its study companion on a vendor cloud and managed vector platform, and invested its own engineering in the RAG pipeline connecting the model to its proprietary content and in multilingual support [16]. The platform now serves roughly 2 million students daily [16]. The contrast with JPMorgan is the lesson: JPMorgan built the infrastructure because it had the budget and a regulated-data moat, while Physics Wallah bought the infrastructure and kept its scarce engineering pointed at the content data that was actually its advantage. Same principle, opposite build-buy choice, because the moat sat in a different place.

What are the real economics of building versus buying enterprise AI?

Both sides of the ledger matter. On the cost-of-getting-it-wrong side, 42% of companies scrapped most of their AI initiatives in 2025, and the build path concentrates risk: a $300,000 to $1.5 million-plus year-one platform that, with lifecycle cost, reaches 2 to 3x its build figure over three years [14][13][5]. Data preparation alone consumes 30 to 50% of the total AI project budget, the largest single cost category, ahead of cloud infrastructure and AI talent [17]. That is the cost most build cases never put on the slide.

RAG is where the hidden costs cluster. The most consistently underestimated variable is embedding drift, the slow degradation of retrieval quality as content and models change, which forces periodic regeneration of embeddings. One manufacturing company spent $400,000 deploying a RAG system, then discovered true operating costs of $18,000 per month, more than double its projection, driven by compounding costs across embedding generation, reranking infrastructure, and the persistent compute needed for sub-500-millisecond retrieval latency [9]. Three silos, each with its own bill: the embedding pipeline, the vector database tier, and the query orchestration layer. A vendor platform carries those as standard line items. A build team discovers them after launch.

On the return side, the buy path clears deployment fast, but the return is conditional on the same variable as everything else. Vendor platforms convert to production at 47% and can pay back in months when the data foundation is ready, as the European institution's two-month ROI shows [15][1]. The economics do not actually argue for buying over building in the abstract. They argue for spending capital where it compounds: on the data layer, not the model, and on owning only the pieces your proprietary data makes worth owning.

A layered diagram of RAG cost silos: embedding pipeline, vector database, query orchestration, and drift maintenance.
RAG cost compounds across three silos plus drift management. A $400,000 deployment carried $18,000 per month after launch.

Tricky Wombat made with Google Gemini 3.1 Flash Image, Jun 2026

Source

How do you fix the build versus buy enterprise AI problem?

The decision is not build or buy as a slogan. It is buy the infrastructure layer, build only where your proprietary data is the moat, and invest the freed capital in the data foundation that actually decides outcomes. That is where Tricky Wombat operates. We build the retrieval pipeline so your team does not have to own embedding drift, vector storage, and query orchestration as a permanent engineering tax. The pipeline, not the model, is what determines whether the answers are right. Here is what it has to get right.

1. Grounding and provenance

Most systems retrieve a passage, hand it to the model, and present whatever comes back as fact, with no way to trace the answer to its source. That is how the almost-right-but-not-quite failure mode ships, the one 66% of developers name as their top frustration [10]. Our pipeline attaches provenance to every retrieved fact and verifies that generated claims map to retrieved sources before they reach the user. An answer the system cannot ground, it does not assert.

2. Retrieval quality over model choice

Most build teams tune the model and treat retrieval as solved by a single vector search. Pure vector search misses exact-match and keyword-critical queries. Our pipeline combines dense semantic retrieval with sparse keyword methods and reranks the results, so the passage that reaches the model is the right one. Because the architecture is model-agnostic, you swap the underlying model without rebuilding the pipeline, which is the retrofit cost that sinks pre-standard in-house builds.

3. Drift and integrity management

Most systems treat embeddings as a one-time job, then degrade silently as content and models change, the embedding drift that turned a $400,000 manufacturing deployment into an $18,000-per-month surprise [9]. Our pipeline monitors retrieval quality continuously, re-processes content when sources or models change, and encodes stable institutional knowledge as structured relationships rather than embeddings that need constant regeneration. Integrity is maintained as a running operation, not a launch milestone.

The pipeline runs continuously: monitoring retrieval quality, re-processing changed content, and verifying citations on every answer. Because the data layer is instrumented rather than assumed, the system gets more accurate as your knowledge base grows, not less.

The bottom line

Across every case in this article, the winner spent its scarce capital on the layer that was actually scarce. JPMorgan built where its regulated-data moat justified it. Physics Wallah, General Mills, and the European institution bought the infrastructure and invested in their own data. The build-versus-buy question is real and worth modeling carefully, but it is the surface of a deeper one: whether your data foundation can sustain accurate retrieval at the scale your AI demands.

The organizations stuck in the 46% that abandon their pilots are not there because they picked the wrong model or the wrong vendor [13]. They are there because they treated the data layer as plumbing and the model as the product, when the evidence says it is the other way around. Buying clears the deployment hurdle. The data foundation clears the return.

The enterprises that internalize this will spend the next budget cycle buying what compounds against them when built, and building only what compounds in their favor. The ones that do not will keep paying three times to ship once, and keep wondering why the model everyone else is using works for everyone else.

References (17)
  1. Menlo Ventures, "2025: The State of Generative AI in the Enterprise," December 9, 2025. https://menlovc.com/perspective/2025-the-state-of-generative-ai-in-the-enterprise/
  2. Gartner, "Gartner Says Worldwide AI Spending Will Total $2.5 Trillion in 2026," January 15, 2026. https://www.gartner.com/en/newsroom/press-releases/2026-1-15-gartner-says-worldwide-ai-spending-will-total-2-point-5-trillion-dollars-in-2026
  3. MIT Project NANDA, "The GenAI Divide: State of AI in Business 2025," July 2025. https://mlq.ai/media/quarterly_decks/v0.1_State_of_AI_in_Business_2025_Report.pdf
  4. Forrester Research, "The State of AI 2025" and "2026 Technology Predictions," 2025–2026. https://www.forrester.com/predictions/
  5. Keyhole Software, "AI Software Development Costs 2026: Enterprise Spending, TCO, and ROI Analysis," March 2026. https://keyholesoftware.com/ai-software-development-cost-2026/
  6. McKinsey & Company, "The State of AI in 2025: Agents, Innovation, and Transformation," November 2025. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
  7. Gartner, "Gartner Says Organizations with Successful AI Initiatives Invest Up to Four Times More in Data and Analytics Foundations," April 16, 2026. https://www.gartner.com/en/newsroom/press-releases/2026-04-16-gartner-says-organizations-with-successful-ai-initiatives-invest-up-to-four-times-more-in-data-and-analytics-foundations
  8. Gartner, "Lack of AI-Ready Data Puts AI Projects at Risk," February 26, 2025. https://www.gartner.com/en/newsroom/press-releases/2025-02-26-lack-of-ai-ready-data-puts-ai-projects-at-risk
  9. RAGaboutit, "The Hidden Cost Architecture: Why Enterprise RAG ROI Calculations Are Missing Critical Variables," December 2025. https://ragaboutit.com/the-hidden-cost-architecture-why-enterprise-rag-roi-calculations-are-missing-critical-variables/
  10. Stack Overflow, "2025 Developer Survey," December 2025. https://survey.stackoverflow.co/2025/ai/
  11. Deloitte, "The State of AI in the Enterprise 2026," 2026. https://www.deloitte.com/us/en/about/press-room/state-of-ai-report-2026.html
  12. AI Monk, "12 Agentic AI Examples With Measurable ROI: Enterprise Case Studies From 2025-2026," 2025–2026. https://aimonk.com/agentic-ai-examples-enterprise-roi-case-studies/
  13. S&P Global Market Intelligence, "Generative AI shows rapid growth but yields mixed results," October 2025. https://www.spglobal.com/market-intelligence/en/news-insights/research/2025/10/generative-ai-shows-rapid-growth-but-yields-mixed-results
  14. Octopus Builds, "Build vs Buy Enterprise AI in 2026: Costs, Success Rates, and Hybrid Strategies," April 2026. https://octopusbuilds.com/blog/build-vs-buy-enterprise-ai-costs-success-rates
  15. Squirro, "Unlocking ROI in Months with Squirro," 2026. https://squirro.com/unlocking-roi-in-months-with-squirro
  16. Microsoft, "Physics Wallah: Affordable, Hyperpersonalized Learning," Microsoft AI First Movers (India), 2025. https://www.microsoft.com/en-in/aifirstmovers/physicswallah
  17. Presenc AI, "Enterprise AI Budget Allocation 2026," May 2026, drawing on BCG, Deloitte, and Tredence CIO survey data. https://www.bcg.com/capabilities/artificial-intelligence

By Tricky Wombat

Last Updated: Jun 27, 2026