What Actually Works: The Honest Guide to AI in Engineering
AI delivers genuine, measurable value on specific tasks: boilerplate generation, unit testing (83% coverage vs. 54% traditional), documentation, and code autocomplete produce 25-35% speed gains that hold up across every controlled study in the literature. The difference between individual gains and organizational gains is a solvable problem — but it requires acknowledging where the bottleneck actually moved.
Developers produce 21% more tasks and 98% more PRs (Faros AI, 10,000+ developers across 1,255 teams). That speed shifts the bottleneck to code review, which grows 91%. Organizations that address the new bottleneck capture the gains. Those that do not see the speed evaporate.
Faros AI, 10,000+ developers across 1,255 teams, 2025; AlterSquare, 20+ client projects, 2026
What the Controlled Studies Show
| Study | N | Finding | Credibility |
|---|---|---|---|
| METR RCT (2025) | 16 devs, 246 tasks | 19% SLOWER with AI; devs believed they were 20% faster | HIGH |
| GitHub/Microsoft (2023) | 95 devs | 55.8% faster on specific task | MEDIUM |
| Google RCT (2025) | 96 devs | 21% productivity gain | MEDIUM |
| DX/Faros (2025) | 135K+ devs | 3.6 hrs/week saved per dev, BUT no org-level improvement | HIGH |
| AlterSquare (2026) | 20+ client projects | 46% more PRs, but 91% more review time | HIGH |
| CodeRabbit (2025) | Large corpus | AI code creates 1.7x more problems | HIGH |
For C-Suite: "Your developers will tell you AI makes them faster. The data says it depends entirely on the task. For boilerplate and tests, they are right. For complex work, they are wrong. And at the organizational level, nobody has proven it helps yet."
Where AI Delivers (Task-Level Truth)
Tier 1: Proven Value — Deploy Now
| Task | Effectiveness | Effort |
|---|---|---|
| Code autocomplete | HIGH — 25-35% faster for routine coding | Install extension, assign license |
| Boilerplate generation | HIGH — eliminates tedious scaffolding | Works out of the box |
| Unit test generation | HIGH — 83% coverage vs. 54% traditional | Configure test framework preferences |
| Documentation generation | HIGH — auto-generates from code | Point at codebase |
| Code explanation | HIGH — onboarding and knowledge transfer | Ask questions in chat |
100% of Tier 1 value comes from configuration. Zero custom development required.
Tier 3: Risky and Overhyped
| Task | Effectiveness | Risk |
|---|---|---|
| Architecture decisions | LOW | AI lacks context of organizational constraints |
| Complex business logic | LOW | Subtle errors surface under load (47 bugs from one ChatGPT session) |
| Security-critical code | DANGEROUS | 2.74x higher vulnerability rate in AI co-authored PRs |
| Autonomous AI agents | VERY EARLY | One AutoGen agent ran an infinite loop: $2,400 overnight API bill |
AlterSquare, 2026; TDS/OWASP/Checkmarx vulnerability data; Gartner defect predictions
The Bottleneck Problem
The Faros AI research applies Amdahl's Law to AI-assisted development: a system moves only as fast as its slowest component. Before AI, coding consumed roughly 40% of delivery time. After AI, coding drops to 15% — but code review balloons from 30% to 55% of the total cycle.
- PRs per developer: +98% (great)
- PR review time: +91% (terrible)
- PR size: +154% (makes review harder)
- Bugs per developer: +9% (more code = more bugs)
- Org-level delivery improvement: 0% (it cancels out)
The fix is not better AI tools. It is workflow redesign: AI-assisted code review (CodeRabbit, Qodo), smaller PRs by policy, automated quality gates in CI before human review, review load balancing, and tiered review where AI-generated boilerplate gets lighter scrutiny than novel logic.
The Real Cost
| Cost Category | Annual Cost (10-person team) | % of Total |
|---|---|---|
| Direct AI tool subscriptions | $8,400 | 4.4% |
| Debugging AI-generated errors | $46,800 | 24.3% |
| Increased code review time (+91%) | $78,000 | 40.5% |
| Integration, training, governance | $59,466 | 30.8% |
| Total real cost | $192,666 | 100% |
For every $1 in AI licenses, expect $22 in surrounding costs. A CFO looking at "$19/seat/month for Copilot" is seeing 4.4% of the real cost.
AlterSquare cost analysis across 20+ client projects, 2026
What This Means
The data confirms what you likely already sense: AI delivers real productivity gains for engineering teams on the right tasks. The question is how to capture those gains at the organizational level, not just the individual level.
The answer is specific and actionable. Tier 1 tasks — autocomplete, test generation, documentation, boilerplate — produce measurable ROI across every controlled study. Getting these working well across an engineering team is the highest-return starting point, and it can be done in weeks through configuration alone. No custom development required.
On cost, the full picture is $192K/year for a 10-person team, not $8.4K in license fees. That ratio sounds daunting, but it is the key to building an honest business case. Organizations that model the full cost upfront and prove ROI against it are the ones whose pilots survive the budget review. The 23x multiplier is not a reason to hesitate. It is a reason to plan accurately.
If you are building that business case and want to pressure-test the assumptions against what the data actually shows, that is a conversation worth having early in the process — brandon@brandonsneider.com.
Sources
- METR — "AI Makes Experienced Developers 19% Slower" (RCT, n=16, 246 tasks, 2025)
- Faros AI — "The AI Productivity Paradox" (10,000+ developers across 1,255 teams, 2025)
- AlterSquare — AI Tools Across 20+ Client Projects (2026)
- CodeRabbit — State of AI vs. Human Code Generation Report (2025)
- Stack Overflow — "Are Bugs Inevitable with AI Agents?" (2026)
- GitHub — "Quantifying Copilot's Impact on Developer Productivity" (2023)
- Google — Internal RCT, 96 developers (2025)
- TDS — "Vibe Coding and the Security Debt Crisis" (2025)
- OWASP/Checkmarx — AI code vulnerability analysis
- Gartner — Software defect predictions for uncontrolled AI adoption (2028 forecast)
I publish research on AI strategy and security for executives. Data, not hype.
Confirmed. You'll hear from me when there's something worth reading.