What Actually Works: The Honest Guide to AI in Engineering

Brandon Sneider · February 2026

AI delivers genuine, measurable value on specific tasks: boilerplate generation, unit testing (83% coverage vs. 54% traditional), documentation, and code autocomplete produce 25-35% speed gains that hold up across every controlled study in the literature. The difference between individual gains and organizational gains is a solvable problem — but it requires acknowledging where the bottleneck actually moved.

Developers produce 21% more tasks and 98% more PRs (Faros AI, 10,000+ developers across 1,255 teams). That speed shifts the bottleneck to code review, which grows 91%. Organizations that address the new bottleneck capture the gains. Those that do not see the speed evaporate.

Faros AI, 10,000+ developers across 1,255 teams, 2025; AlterSquare, 20+ client projects, 2026

What the Controlled Studies Show

Study	N	Finding	Credibility
METR RCT (2025)	16 devs, 246 tasks	19% SLOWER with AI; devs believed they were 20% faster	HIGH
GitHub/Microsoft (2023)	95 devs	55.8% faster on specific task	MEDIUM
Google RCT (2025)	96 devs	21% productivity gain	MEDIUM
DX/Faros (2025)	135K+ devs	3.6 hrs/week saved per dev, BUT no org-level improvement	HIGH
AlterSquare (2026)	20+ client projects	46% more PRs, but 91% more review time	HIGH
CodeRabbit (2025)	Large corpus	AI code creates 1.7x more problems	HIGH

For C-Suite: "Your developers will tell you AI makes them faster. The data says it depends entirely on the task. For boilerplate and tests, they are right. For complex work, they are wrong. And at the organizational level, nobody has proven it helps yet."

Where AI Delivers (Task-Level Truth)

Tier 1: Proven Value — Deploy Now

Task	Effectiveness	Effort
Code autocomplete	HIGH — 25-35% faster for routine coding	Install extension, assign license
Boilerplate generation	HIGH — eliminates tedious scaffolding	Works out of the box
Unit test generation	HIGH — 83% coverage vs. 54% traditional	Configure test framework preferences
Documentation generation	HIGH — auto-generates from code	Point at codebase
Code explanation	HIGH — onboarding and knowledge transfer	Ask questions in chat

100% of Tier 1 value comes from configuration. Zero custom development required.

Tier 3: Risky and Overhyped

Task	Effectiveness	Risk
Architecture decisions	LOW	AI lacks context of organizational constraints
Complex business logic	LOW	Subtle errors surface under load (47 bugs from one ChatGPT session)
Security-critical code	DANGEROUS	2.74x higher vulnerability rate in AI co-authored PRs
Autonomous AI agents	VERY EARLY	One AutoGen agent ran an infinite loop: $2,400 overnight API bill

AlterSquare, 2026; TDS/OWASP/Checkmarx vulnerability data; Gartner defect predictions

The Bottleneck Problem

The Faros AI research applies Amdahl's Law to AI-assisted development: a system moves only as fast as its slowest component. Before AI, coding consumed roughly 40% of delivery time. After AI, coding drops to 15% — but code review balloons from 30% to 55% of the total cycle.

PRs per developer: +98% (great)
PR review time: +91% (terrible)
PR size: +154% (makes review harder)
Bugs per developer: +9% (more code = more bugs)
Org-level delivery improvement: 0% (it cancels out)

The fix is not better AI tools. It is workflow redesign: AI-assisted code review (CodeRabbit, Qodo), smaller PRs by policy, automated quality gates in CI before human review, review load balancing, and tiered review where AI-generated boilerplate gets lighter scrutiny than novel logic.

The Real Cost

Cost Category	Annual Cost (10-person team)	% of Total
Direct AI tool subscriptions	$8,400	4.4%
Debugging AI-generated errors	$46,800	24.3%
Increased code review time (+91%)	$78,000	40.5%
Integration, training, governance	$59,466	30.8%
Total real cost	$192,666	100%

For every $1 in AI licenses, expect $22 in surrounding costs. A CFO looking at "$19/seat/month for Copilot" is seeing 4.4% of the real cost.

AlterSquare cost analysis across 20+ client projects, 2026

What This Means

The data confirms what you likely already sense: AI delivers real productivity gains for engineering teams on the right tasks. The question is how to capture those gains at the organizational level, not just the individual level.

The answer is specific and actionable. Tier 1 tasks — autocomplete, test generation, documentation, boilerplate — produce measurable ROI across every controlled study. Getting these working well across an engineering team is the highest-return starting point, and it can be done in weeks through configuration alone. No custom development required.

On cost, the full picture is $192K/year for a 10-person team, not $8.4K in license fees. That ratio sounds daunting, but it is the key to building an honest business case. Organizations that model the full cost upfront and prove ROI against it are the ones whose pilots survive the budget review. The 23x multiplier is not a reason to hesitate. It is a reason to plan accurately.

If you are building that business case and want to pressure-test the assumptions against what the data actually shows, that is a conversation worth having early in the process — brandon@brandonsneider.com.

Sources

METR — "AI Makes Experienced Developers 19% Slower" (RCT, n=16, 246 tasks, 2025)
Faros AI — "The AI Productivity Paradox" (10,000+ developers across 1,255 teams, 2025)
AlterSquare — AI Tools Across 20+ Client Projects (2026)
CodeRabbit — State of AI vs. Human Code Generation Report (2025)
Stack Overflow — "Are Bugs Inevitable with AI Agents?" (2026)
GitHub — "Quantifying Copilot's Impact on Developer Productivity" (2023)
Google — Internal RCT, 96 developers (2025)
TDS — "Vibe Coding and the Security Debt Crisis" (2025)
OWASP/Checkmarx — AI code vulnerability analysis
Gartner — Software defect predictions for uncontrolled AI adoption (2028 forecast)