Proven AI Case Studies: Where the Money Actually Moved

The boring ML delivers the biggest returns. JPMorgan's fraud detection, UPS's route optimization, and Walmart's demand forecasting run on traditional ML — gradient boosting, random forests, neural networks — that has been compounding value for years, not months. Combined, these systems drive billions in annual savings with measured, sustained ROI.

Customer service AI is a cautionary tale, not a success story. Klarna cut 700 jobs, claimed $40M in savings, then reversed course when quality collapsed. Bank of America's Erica works because it augments humans rather than replacing them: 3.2 billion interactions, 98% resolution without escalation, 19% revenue lift.

Reuters 2025; UPS SEC filings; Klarna/CNBC 2025; BofA press releases 2018-2025

The Scorecard

Company Domain Measured Outcome Sustained? Credibility
UPS Route optimization $300-400M annual savings; 100M fewer miles Yes (10+ years) HIGH
JPMorgan Fraud detection $1.5B in prevented losses; 95% fewer AML false positives Yes (multi-year) MEDIUM-HIGH
Stripe Fraud detection $6B in recovered false declines (2024); 80% carding reduction Yes (10+ years of ML) MEDIUM
Bank of America Customer service 3.2B interactions; 98% resolution; 19% revenue lift Yes (7 years) MEDIUM-HIGH
Morgan Stanley Enterprise RAG 98% advisor adoption; $64B net new assets (Q3 2024) Yes (2+ years) MEDIUM-HIGH
Walmart Demand forecasting $55M from single system; 90% inventory accuracy Yes (multi-year) MEDIUM
Insilico Medicine Drug discovery Phase IIa positive results; 30-month target-to-Phase-I Too early HIGH
Klarna Customer service $0.32 to $0.19 per transaction, then reversed course No (reversed 2025) MEDIUM

Klarna: The Full Arc

Klarna is the most cited AI customer service deployment in the world. It is also the most instructive failure.

In February 2024, Klarna's OpenAI-powered chatbot handled 2.3 million conversations in its first month, replacing 700 full-time agents. Cost per transaction fell 40%. The company projected $40 million in annual profit improvement.

By 2025, CEO Sebastian Siemiatkowski publicly admitted "we went too far." Customer satisfaction declined. Complex issues received generic, repetitive responses. Klarna began rehiring human agents, piloting a hybrid workforce model. More than 55% of companies that made AI-driven customer service layoffs now report regretting the decision.

Klarna optimized for cost, not customer experience. The per-transaction savings looked compelling in a spreadsheet, but customer experience degradation erodes revenue in ways that take 6-12 months to surface. Bank of America's Erica resolves 98% of inquiries without escalation because it was designed for the inquiries that do not need a human, not for the ones that do.

Klarna International, Feb 2024; CNBC, May 2025; BofA press releases, 2024-2025

UPS ORION: A Decade of Compounding Returns

UPS's On Road Integrated Optimization and Navigation system launched in 2013 and reached full deployment by 2016. It remains one of the best-documented ML deployments in any industry.

UPS has disclosed ORION metrics in SEC filings, investor presentations, and sustainability reports consistently since 2015. This is the gold standard for AI ROI documentation.

UPS SEC filings; Supply Chain Dive; UPS investor presentations, 2015-2025

Traditional ML Still Beats LLMs on Structured Data

The largest share of ML-driven business value in production today comes from classical machine learning, not large language models. On tabular data (fraud, pricing, recommendations, anomaly detection), XGBoost and gradient boosting achieve 99%+ AUC while running 100x cheaper and 1,000x faster than LLM-based approaches.

Three constraints keep LLMs out of these production workloads:

Neural Computing and Applications benchmark, 2025; R Consortium; Kaggle competition data

What This Means

Start with boring ML, not GenAI. The largest, most sustained returns in this research come from traditional machine learning applied to structured business problems: fraud, forecasting, routing, pricing, anomaly detection. If your organization has not deployed gradient boosting on transaction data, you are leaving money on the table while debating which LLM to buy.

Treat customer service AI as augmentation, not replacement. The organizations seeing sustained value — Bank of America, Morgan Stanley, Delta — use AI to make human workers more effective, not to eliminate them.

Deploy enterprise RAG with human verification, not vendor trust. Stanford's empirical study found legal RAG tools hallucinate 17-34% of the time. Morgan Stanley succeeded because advisors remain in the decision loop. Any deployment in high-stakes domains requires verification workflows that assume the system will be wrong 15-30% of the time.

If you are evaluating which AI use cases in your organization have genuine evidence behind them versus vendor-driven hype, that distinction is worth a focused conversation — brandon@brandonsneider.com.

Sources

I publish research on AI strategy and security for executives. Data, not hype.