"Does AI make teams more productive?" is one of those questions where almost everyone has an opinion and almost no one has read the studies. The honest answer in 2026 is yes, for most teams, on the right tasks, by margins that range from solid to spectacular. But the same body of evidence carries a warning the marketing copy never mentions: the gains are unevenly distributed, they are largest for the least experienced workers, and in at least one rigorous trial AI actually made expert developers slower while convincing them they were faster. This is an evidence-based walk through what the strongest research shows, where the nuance bites, and how teams across the USA, UK, Canada, Europe and Australia can capture the real gains rather than the imagined ones.
The Strongest Single Study: Customer Support at Scale
The most quoted piece of evidence is also one of the best designed. Economists Erik Brynjolfsson, Danielle Li and Lindsey Raymond, in NBER Working Paper 31161, studied customer support agents at a large software firm who were given access to a generative AI assistant. Agents using the tool raised their productivity by 14% on average. They spent roughly 9% less time per chat and handled about 14% more chats per hour, and crucially there was no significant change in customer satisfaction, so the speed did not come at the expense of quality.
The detail that matters most for managers is the distribution of the gain. The least-experienced agents improved by 35%, while the most experienced, top-performing agents saw little measurable effect. The AI was, in effect, packaging the tacit knowledge of the best agents and handing it to everyone else. For a support organisation in any market, that is a profound finding: AI did not replace the agents, it compressed the experience gap between a new hire and a veteran.
Knowledge Work: The BCG Consultant Experiment
Support work is structured and repetitive, so a skeptic might dismiss the gains as narrow. The next study tackled exactly that objection. A 2023 experiment run by researchers from Harvard, MIT and BCG gave management consultants access to GPT-4 on realistic professional tasks. Consultants using the AI completed 12.2% more tasks, finished them 25.1% faster, and produced work rated 40% higher in quality than the group working without it.
Those are large numbers for a profession that lives on analysis and judgment, and they reinforce a theme: AI does not only speed up the mechanical parts of a job, it can raise the quality of the output on genuinely cognitive tasks. The same study also found a "jagged frontier" effect, where AI helped enormously on tasks inside its capability but hurt on tasks just outside it, where confident-but-wrong output led people astray. That nuance becomes important later.
Software Development: Big Gains, With an Asterisk
Software is where the productivity debate is loudest, partly because the headline numbers are enormous. A 2023 GitHub field experiment found developers using GitHub Copilot completed a defined coding task 55% faster than the control group. A separate six-week experiment at Australia's ANZ Bank found the Copilot group was 42.36% faster overall, with the familiar pattern by experience level: beginners improved 52.27%, intermediates 41.6% and advanced developers 40.48%. Even at the top of the experience range, the gain stayed above 40%.
If the story ended there, it would read as an unqualified win. It does not. In July 2025, the research group METR ran a randomized controlled trial with 16 experienced open-source developers across 246 real tasks on large codebases those developers knew intimately. The result was the opposite of the headline: with AI tools, the developers took 19% longer. The number that should give every leader pause is what they believed. Going in, they expected AI to speed them up by 24%, and even after finishing slower, they still estimated they had been 20% faster. The gap between perceived and measured productivity was almost 40 percentage points. We dig deeper into this terrain in our piece on whether AI is replacing software developers.
Why Novices Gain Most and Experts Sometimes Lose
Read the studies side by side and a consistent pattern emerges. AI tends to lift the floor faster than it lifts the ceiling. The least-experienced support agents gained 35% against near-zero for top performers; the weakest consultants improved most; beginner developers at ANZ improved more than advanced ones. The intuition is simple. AI output approximates competent, best-practice work, which is a large upgrade for someone still learning and a smaller one for an expert who already operates at that level or above.
The METR finding is the sharp edge of the same coin. Highly experienced developers working in codebases they have built over years already have most of the relevant knowledge in their heads. For them, stopping to prompt, read, and verify AI suggestions can add friction rather than remove it, especially when the suggestions are subtly wrong and must be checked line by line. The lesson is not "AI is fake," it is that the productivity question has no single answer. It depends on the task's complexity, how much the worker already knows, and how easy it is to verify the AI's output. We explore the broader version of this in our analysis of AI augmentation versus replacement and what the data shows.
Perceived Versus Measured Productivity
The most actionable insight in the whole literature is the cheapest to ignore: how fast AI feels is not how fast it is. The METR developers were not careless or inexperienced. They were skilled people, on their own projects, who were confidently wrong about their own speed by tens of percentage points. If experts can misjudge their productivity that badly on work they know best, every other team should assume their gut feel is unreliable too.
This is why we are wary of any AI rollout justified purely by enthusiasm. The feeling of momentum that comes from an AI drafting code, copy, or analysis in seconds is real and seductive, and it is not the same thing as shipping more value per week. Teams that win with AI are the ones that instrument the work, comparing cycle time, throughput and quality before and after on the same kind of task. Macro surveys point the same way: organisations consistently report time savings from generative AI, but the firms seeing durable returns are those that redesign workflows and measure outcomes, not those that simply hand out licences and hope.
How to Actually Capture the Gains
The practical playbook follows directly from the evidence. First, start where the studies are strongest: high-volume, structured, verifiable tasks such as customer support, first-draft content, data transformation and well-scoped coding work. These are where support agents gained 14% and developers gained 55%, and they are the safest places to see a return early. This is exactly how we scope business automation engagements, sequencing the easy, measurable wins before the ambiguous ones.
Second, target your rollout by experience level. Because novices and average performers gain most, the fastest organisation-wide lift often comes from putting AI in the hands of newer staff doing high-volume work, while letting senior experts opt in where it genuinely helps rather than mandating it everywhere. The ANZ and NBER results both support concentrating early effort on the people with the most room to grow.
Third, measure on real work, not demos. Pick a metric that reflects shipped value, run a before-and-after or a controlled comparison, and let the numbers decide where to expand. The cost of skipping this step is the METR trap: a team that feels 20% faster while quietly running 19% slower. For leaders weighing the bigger financial picture, our look at the economics of replacing staff with AI and the wider AI job displacement statistics for 2026 set the productivity question in context.
So, does AI make teams more productive? On the weight of the 2026 evidence, yes, and sometimes dramatically, but only when it is pointed at the right tasks, given disproportionately to the people with the most to gain, and validated against measured output rather than the feeling of speed. Treated that way, AI is one of the best-documented productivity tools in modern business. Treated as magic, it is one of the easiest ways to feel busy while moving slower.
Frequently Asked Questions
Does AI actually make teams more productive?
For most teams, yes, when AI is deployed on the right tasks. Controlled studies show real gains: support agents using a generative AI assistant raised productivity 14% on average, BCG consultants completed 12.2% more tasks and 25.1% faster, and developers finished tasks 55% faster with GitHub Copilot. The size of the gain depends heavily on the task and the worker's experience level.
How much do support agents improve with AI?
In the Brynjolfsson, Li and Raymond NBER study, customer support agents using a generative AI assistant raised productivity 14% on average, handling about 14% more chats per hour and spending roughly 9% less time per chat with no significant change in customer satisfaction. The least-experienced agents gained 35% while top performers saw little effect.
Do developers really code faster with AI?
Often, yes. A GitHub field experiment found developers completed tasks 55% faster with Copilot, and a six-week ANZ Bank study found the Copilot group 42.36% faster overall, with beginners improving 52.27%. But a 2025 METR randomized trial of experienced open-source developers found they took 19% longer with AI tools on their own large codebases, so context matters enormously.
Why do novices gain more from AI than experts?
AI tends to encode something close to best-practice output, which lifts less-experienced workers toward the level of stronger colleagues. In the NBER support study the least-experienced agents gained 35% while top performers saw little effect, and the ANZ developer trial showed beginners improving 52.27% versus 40.48% for advanced engineers. AI raises the floor faster than it raises the ceiling.
Can AI ever make experienced workers slower?
Yes, in specific conditions. A 2025 METR randomized controlled trial of 16 experienced open-source developers across 246 tasks found they took 19% longer with AI tools on mature codebases they knew deeply. Strikingly, those developers expected a 24% speedup and still believed they were 20% faster afterward, showing perceived productivity can diverge sharply from measured productivity.
How do we capture real productivity gains from AI?
Measure before and after on real work rather than trusting how fast it feels, deploy AI on high-volume repeatable tasks first, focus early rollouts where novices and average performers do high-volume work, and keep experienced staff in control of complex judgment calls. Pilot at production volume and let measured outcomes, not enthusiasm, decide where to expand.
Ready to Start Your Project?
Book a free 30-minute strategy call with SpiderHunts Technologies.