Proven AI Case Studies: Where the Money Actually Moved

Brandon Sneider · January 2026

The boring ML delivers the biggest returns. JPMorgan's fraud detection, UPS's route optimization, and Walmart's demand forecasting run on traditional ML — gradient boosting, random forests, neural networks — that has been compounding value for years, not months. Combined, these systems drive billions in annual savings with measured, sustained ROI.

Customer service AI is a cautionary tale, not a success story. Klarna cut 700 jobs, claimed $40M in savings, then reversed course when quality collapsed. Bank of America's Erica works because it augments humans rather than replacing them: 3.2 billion interactions, 98% resolution without escalation, 19% revenue lift.

Reuters 2025; UPS SEC filings; Klarna/CNBC 2025; BofA press releases 2018-2025

The Scorecard

Company	Domain	Measured Outcome	Sustained?	Credibility
UPS	Route optimization	$300-400M annual savings; 100M fewer miles	Yes (10+ years)	HIGH
JPMorgan	Fraud detection	$1.5B in prevented losses; 95% fewer AML false positives	Yes (multi-year)	MEDIUM-HIGH
Stripe	Fraud detection	$6B in recovered false declines (2024); 80% carding reduction	Yes (10+ years of ML)	MEDIUM
Bank of America	Customer service	3.2B interactions; 98% resolution; 19% revenue lift	Yes (7 years)	MEDIUM-HIGH
Morgan Stanley	Enterprise RAG	98% advisor adoption; $64B net new assets (Q3 2024)	Yes (2+ years)	MEDIUM-HIGH
Walmart	Demand forecasting	$55M from single system; 90% inventory accuracy	Yes (multi-year)	MEDIUM
Insilico Medicine	Drug discovery	Phase IIa positive results; 30-month target-to-Phase-I	Too early	HIGH
Klarna	Customer service	$0.32 to $0.19 per transaction, then reversed course	No (reversed 2025)	MEDIUM

Klarna: The Full Arc

Klarna is the most cited AI customer service deployment in the world. It is also the most instructive failure.

In February 2024, Klarna's OpenAI-powered chatbot handled 2.3 million conversations in its first month, replacing 700 full-time agents. Cost per transaction fell 40%. The company projected $40 million in annual profit improvement.

By 2025, CEO Sebastian Siemiatkowski publicly admitted "we went too far." Customer satisfaction declined. Complex issues received generic, repetitive responses. Klarna began rehiring human agents, piloting a hybrid workforce model. More than 55% of companies that made AI-driven customer service layoffs now report regretting the decision.

Klarna optimized for cost, not customer experience. The per-transaction savings looked compelling in a spreadsheet, but customer experience degradation erodes revenue in ways that take 6-12 months to surface. Bank of America's Erica resolves 98% of inquiries without escalation because it was designed for the inquiries that do not need a human, not for the ones that do.

Klarna International, Feb 2024; CNBC, May 2025; BofA press releases, 2024-2025

UPS ORION: A Decade of Compounding Returns

UPS's On Road Integrated Optimization and Navigation system launched in 2013 and reached full deployment by 2016. It remains one of the best-documented ML deployments in any industry.

100 million fewer driving miles per year
$300-400 million in annual cost savings
10 million gallons of fuel saved annually
55,000 U.S. drivers on ORION-optimized routes in 2025
$250 million investment — ROI achieved within two years

UPS has disclosed ORION metrics in SEC filings, investor presentations, and sustainability reports consistently since 2015. This is the gold standard for AI ROI documentation.

UPS SEC filings; Supply Chain Dive; UPS investor presentations, 2015-2025

Traditional ML Still Beats LLMs on Structured Data

The largest share of ML-driven business value in production today comes from classical machine learning, not large language models. On tabular data (fraud, pricing, recommendations, anomaly detection), XGBoost and gradient boosting achieve 99%+ AUC while running 100x cheaper and 1,000x faster than LLM-based approaches.

Three constraints keep LLMs out of these production workloads:

Latency. Fraud detection requires sub-100ms decisions. LLM inference takes seconds. Gradient boosting scores a transaction in single-digit milliseconds.
Cost. Running an LLM on every transaction at JPMorgan's scale (billions of transactions) would cost orders of magnitude more than traditional ML.
Explainability. Regulators require that credit decisions and fraud flags be explainable. Traditional ML produces feature importance scores. LLMs produce prose, which does not satisfy regulatory requirements.

Neural Computing and Applications benchmark, 2025; R Consortium; Kaggle competition data

What This Means

Start with boring ML, not GenAI. The largest, most sustained returns in this research come from traditional machine learning applied to structured business problems: fraud, forecasting, routing, pricing, anomaly detection. If your organization has not deployed gradient boosting on transaction data, you are leaving money on the table while debating which LLM to buy.

Treat customer service AI as augmentation, not replacement. The organizations seeing sustained value — Bank of America, Morgan Stanley, Delta — use AI to make human workers more effective, not to eliminate them.

Deploy enterprise RAG with human verification, not vendor trust. Stanford's empirical study found legal RAG tools hallucinate 17-34% of the time. Morgan Stanley succeeded because advisors remain in the decision loop. Any deployment in high-stakes domains requires verification workflows that assume the system will be wrong 15-30% of the time.

If you are evaluating which AI use cases in your organization have genuine evidence behind them versus vendor-driven hype, that distinction is worth a focused conversation — brandon@brandonsneider.com.

Sources

UPS ORION — SEC filings, investor presentations, sustainability reports, 2015-2025
JPMorgan — Reuters, May 2025; Constellation Research; investor presentations
Stripe Radar — Stripe annual review, 2024
Bank of America Erica — BofA press releases, 2024-2025
Morgan Stanley — OpenAI case study; Q3 2024 SEC filing
Klarna — Company press release, Feb 2024; CNBC reversal coverage, May 2025
Walmart — Supply Chain Dive; CIO Dive; corporate communications
Insilico Medicine — Nature Medicine, June 2025 (peer-reviewed)
Stanford Legal RAG Study — Magesh et al., Journal of Empirical Legal Studies, 2025
Siemens Senseye — Product materials and customer case studies, 2024
GE Vernova SmartSignal — Product documentation; Gartner Market Guide, 2025
Rolls-Royce TotalCare — Press releases; IBM partner materials
Tabular data ML benchmarks — Neural Computing and Applications, 2025
Delta Air Lines — CES 2025; investor communications