Proven AI Case Studies: Where the Money Actually Moved
The boring ML delivers the biggest returns. JPMorgan's fraud detection, UPS's route optimization, and Walmart's demand forecasting run on traditional ML — gradient boosting, random forests, neural networks — that has been compounding value for years, not months. Combined, these systems drive billions in annual savings with measured, sustained ROI.
Customer service AI is a cautionary tale, not a success story. Klarna cut 700 jobs, claimed $40M in savings, then reversed course when quality collapsed. Bank of America's Erica works because it augments humans rather than replacing them: 3.2 billion interactions, 98% resolution without escalation, 19% revenue lift.
Reuters 2025; UPS SEC filings; Klarna/CNBC 2025; BofA press releases 2018-2025
The Scorecard
| Company | Domain | Measured Outcome | Sustained? | Credibility |
|---|---|---|---|---|
| UPS | Route optimization | $300-400M annual savings; 100M fewer miles | Yes (10+ years) | HIGH |
| JPMorgan | Fraud detection | $1.5B in prevented losses; 95% fewer AML false positives | Yes (multi-year) | MEDIUM-HIGH |
| Stripe | Fraud detection | $6B in recovered false declines (2024); 80% carding reduction | Yes (10+ years of ML) | MEDIUM |
| Bank of America | Customer service | 3.2B interactions; 98% resolution; 19% revenue lift | Yes (7 years) | MEDIUM-HIGH |
| Morgan Stanley | Enterprise RAG | 98% advisor adoption; $64B net new assets (Q3 2024) | Yes (2+ years) | MEDIUM-HIGH |
| Walmart | Demand forecasting | $55M from single system; 90% inventory accuracy | Yes (multi-year) | MEDIUM |
| Insilico Medicine | Drug discovery | Phase IIa positive results; 30-month target-to-Phase-I | Too early | HIGH |
| Klarna | Customer service | $0.32 to $0.19 per transaction, then reversed course | No (reversed 2025) | MEDIUM |
Klarna: The Full Arc
Klarna is the most cited AI customer service deployment in the world. It is also the most instructive failure.
In February 2024, Klarna's OpenAI-powered chatbot handled 2.3 million conversations in its first month, replacing 700 full-time agents. Cost per transaction fell 40%. The company projected $40 million in annual profit improvement.
By 2025, CEO Sebastian Siemiatkowski publicly admitted "we went too far." Customer satisfaction declined. Complex issues received generic, repetitive responses. Klarna began rehiring human agents, piloting a hybrid workforce model. More than 55% of companies that made AI-driven customer service layoffs now report regretting the decision.
Klarna optimized for cost, not customer experience. The per-transaction savings looked compelling in a spreadsheet, but customer experience degradation erodes revenue in ways that take 6-12 months to surface. Bank of America's Erica resolves 98% of inquiries without escalation because it was designed for the inquiries that do not need a human, not for the ones that do.
Klarna International, Feb 2024; CNBC, May 2025; BofA press releases, 2024-2025
UPS ORION: A Decade of Compounding Returns
UPS's On Road Integrated Optimization and Navigation system launched in 2013 and reached full deployment by 2016. It remains one of the best-documented ML deployments in any industry.
- 100 million fewer driving miles per year
- $300-400 million in annual cost savings
- 10 million gallons of fuel saved annually
- 55,000 U.S. drivers on ORION-optimized routes in 2025
- $250 million investment — ROI achieved within two years
UPS has disclosed ORION metrics in SEC filings, investor presentations, and sustainability reports consistently since 2015. This is the gold standard for AI ROI documentation.
UPS SEC filings; Supply Chain Dive; UPS investor presentations, 2015-2025
Traditional ML Still Beats LLMs on Structured Data
The largest share of ML-driven business value in production today comes from classical machine learning, not large language models. On tabular data (fraud, pricing, recommendations, anomaly detection), XGBoost and gradient boosting achieve 99%+ AUC while running 100x cheaper and 1,000x faster than LLM-based approaches.
Three constraints keep LLMs out of these production workloads:
- Latency. Fraud detection requires sub-100ms decisions. LLM inference takes seconds. Gradient boosting scores a transaction in single-digit milliseconds.
- Cost. Running an LLM on every transaction at JPMorgan's scale (billions of transactions) would cost orders of magnitude more than traditional ML.
- Explainability. Regulators require that credit decisions and fraud flags be explainable. Traditional ML produces feature importance scores. LLMs produce prose, which does not satisfy regulatory requirements.
Neural Computing and Applications benchmark, 2025; R Consortium; Kaggle competition data
What This Means
Start with boring ML, not GenAI. The largest, most sustained returns in this research come from traditional machine learning applied to structured business problems: fraud, forecasting, routing, pricing, anomaly detection. If your organization has not deployed gradient boosting on transaction data, you are leaving money on the table while debating which LLM to buy.
Treat customer service AI as augmentation, not replacement. The organizations seeing sustained value — Bank of America, Morgan Stanley, Delta — use AI to make human workers more effective, not to eliminate them.
Deploy enterprise RAG with human verification, not vendor trust. Stanford's empirical study found legal RAG tools hallucinate 17-34% of the time. Morgan Stanley succeeded because advisors remain in the decision loop. Any deployment in high-stakes domains requires verification workflows that assume the system will be wrong 15-30% of the time.
If you are evaluating which AI use cases in your organization have genuine evidence behind them versus vendor-driven hype, that distinction is worth a focused conversation — brandon@brandonsneider.com.
Sources
- UPS ORION — SEC filings, investor presentations, sustainability reports, 2015-2025
- JPMorgan — Reuters, May 2025; Constellation Research; investor presentations
- Stripe Radar — Stripe annual review, 2024
- Bank of America Erica — BofA press releases, 2024-2025
- Morgan Stanley — OpenAI case study; Q3 2024 SEC filing
- Klarna — Company press release, Feb 2024; CNBC reversal coverage, May 2025
- Walmart — Supply Chain Dive; CIO Dive; corporate communications
- Insilico Medicine — Nature Medicine, June 2025 (peer-reviewed)
- Stanford Legal RAG Study — Magesh et al., Journal of Empirical Legal Studies, 2025
- Siemens Senseye — Product materials and customer case studies, 2024
- GE Vernova SmartSignal — Product documentation; Gartner Market Guide, 2025
- Rolls-Royce TotalCare — Press releases; IBM partner materials
- Tabular data ML benchmarks — Neural Computing and Applications, 2025
- Delta Air Lines — CES 2025; investor communications
I publish research on AI strategy and security for executives. Data, not hype.
Confirmed. You'll hear from me when there's something worth reading.