What Actually Works: The Honest Guide to AI in Engineering

AI delivers genuine, measurable value on specific tasks: boilerplate generation, unit testing (83% coverage vs. 54% traditional), documentation, and code autocomplete produce 25-35% speed gains that hold up across every controlled study in the literature. The difference between individual gains and organizational gains is a solvable problem — but it requires acknowledging where the bottleneck actually moved.

Developers produce 21% more tasks and 98% more PRs (Faros AI, 10,000+ developers across 1,255 teams). That speed shifts the bottleneck to code review, which grows 91%. Organizations that address the new bottleneck capture the gains. Those that do not see the speed evaporate.

Faros AI, 10,000+ developers across 1,255 teams, 2025; AlterSquare, 20+ client projects, 2026

What the Controlled Studies Show

Study N Finding Credibility
METR RCT (2025) 16 devs, 246 tasks 19% SLOWER with AI; devs believed they were 20% faster HIGH
GitHub/Microsoft (2023) 95 devs 55.8% faster on specific task MEDIUM
Google RCT (2025) 96 devs 21% productivity gain MEDIUM
DX/Faros (2025) 135K+ devs 3.6 hrs/week saved per dev, BUT no org-level improvement HIGH
AlterSquare (2026) 20+ client projects 46% more PRs, but 91% more review time HIGH
CodeRabbit (2025) Large corpus AI code creates 1.7x more problems HIGH

For C-Suite: "Your developers will tell you AI makes them faster. The data says it depends entirely on the task. For boilerplate and tests, they are right. For complex work, they are wrong. And at the organizational level, nobody has proven it helps yet."

Where AI Delivers (Task-Level Truth)

Tier 1: Proven Value — Deploy Now

Task Effectiveness Effort
Code autocomplete HIGH — 25-35% faster for routine coding Install extension, assign license
Boilerplate generation HIGH — eliminates tedious scaffolding Works out of the box
Unit test generation HIGH — 83% coverage vs. 54% traditional Configure test framework preferences
Documentation generation HIGH — auto-generates from code Point at codebase
Code explanation HIGH — onboarding and knowledge transfer Ask questions in chat

100% of Tier 1 value comes from configuration. Zero custom development required.

Tier 3: Risky and Overhyped

Task Effectiveness Risk
Architecture decisions LOW AI lacks context of organizational constraints
Complex business logic LOW Subtle errors surface under load (47 bugs from one ChatGPT session)
Security-critical code DANGEROUS 2.74x higher vulnerability rate in AI co-authored PRs
Autonomous AI agents VERY EARLY One AutoGen agent ran an infinite loop: $2,400 overnight API bill

AlterSquare, 2026; TDS/OWASP/Checkmarx vulnerability data; Gartner defect predictions

The Bottleneck Problem

The Faros AI research applies Amdahl's Law to AI-assisted development: a system moves only as fast as its slowest component. Before AI, coding consumed roughly 40% of delivery time. After AI, coding drops to 15% — but code review balloons from 30% to 55% of the total cycle.

The fix is not better AI tools. It is workflow redesign: AI-assisted code review (CodeRabbit, Qodo), smaller PRs by policy, automated quality gates in CI before human review, review load balancing, and tiered review where AI-generated boilerplate gets lighter scrutiny than novel logic.

The Real Cost

Cost Category Annual Cost (10-person team) % of Total
Direct AI tool subscriptions $8,400 4.4%
Debugging AI-generated errors $46,800 24.3%
Increased code review time (+91%) $78,000 40.5%
Integration, training, governance $59,466 30.8%
Total real cost $192,666 100%

For every $1 in AI licenses, expect $22 in surrounding costs. A CFO looking at "$19/seat/month for Copilot" is seeing 4.4% of the real cost.

AlterSquare cost analysis across 20+ client projects, 2026

What This Means

The data confirms what you likely already sense: AI delivers real productivity gains for engineering teams on the right tasks. The question is how to capture those gains at the organizational level, not just the individual level.

The answer is specific and actionable. Tier 1 tasks — autocomplete, test generation, documentation, boilerplate — produce measurable ROI across every controlled study. Getting these working well across an engineering team is the highest-return starting point, and it can be done in weeks through configuration alone. No custom development required.

On cost, the full picture is $192K/year for a 10-person team, not $8.4K in license fees. That ratio sounds daunting, but it is the key to building an honest business case. Organizations that model the full cost upfront and prove ROI against it are the ones whose pilots survive the budget review. The 23x multiplier is not a reason to hesitate. It is a reason to plan accurately.

If you are building that business case and want to pressure-test the assumptions against what the data actually shows, that is a conversation worth having early in the process — brandon@brandonsneider.com.

Sources

I publish research on AI strategy and security for executives. Data, not hype.