How AI Bottlenecks Shift

How AI bottlenecks shift

I map how AI bottlenecks shift across coding, agents, and enterprise data. By examining transitions from tool-calling to reliability and context windows to evaluation, I highlight why yesterday’s impossible tasks are today’s standard features.

I wrote about my changing AI opinions. At least some of this is because the industry is moving so fast that the bottlenecks keep shifting.

Here are four examples of how we AI couldn’t do something (the bottleneck), but that became possible, and the bottleneck shifted - changing the way we work.

It’s good to keep this in mind when thinking about AI.

Coding:

“It can’t write useful code. We can’t get real help.”
- But in Sep 2022: GitHub finds Copilot developers are 55% faster.
“It writes code but doesn’t know our codebase. We can’t let it touch real projects.”
- But in Feb 2024: Gemini 1.5 Pro has 1M-token context ~ 30K LOC". Cursor indexes code.
“It understands the repo but can’t ship a fix on its own. We can’t hand it a whole issue.”
- But in Mar 2024: Devin solves 14% of SWE-bench - up from 2%.. Verified SWE-Bench is now 70%+.
“It ships fixes, but we can’t review them fast enough or trust they’re stable.”

Agents

“It does one step. We can’t chain actions.”
- But Jun 2023: OpenAI function calling lets models invoke tools and return structured calls.
“Every integration is bespoke. We can’t connect it to all our systems.”
- But Nov 2024: Anthropic open-sources MCP, standardizing tool and data access.
“It can act and connect, but over a long task its errors compound. We can’t trust a 20-step run.”
- Now: Mar 2025: METR finds autonomous task horizon doubling ~every 7 months. Reliability is a challenge.
- But Claude Mythos, with a ~16 hour reliable execution, might fix this.

Enterprise knowledge work

“It only knows the public internet. We can’t use it on our own documents.”
- But Sep 2023: Morgan Stanley’s assistant uses ~100K internal documents.
“It reads our documents but can’t fit enough of them. We can’t ask across the whole corpus.”
- But May 2023: Claude’s 100K-token context and Feb 2024: Gemini 1.5’s 1M tokens reduce chunking needs.
“It runs on our data, but we can’t trust it without a way to measure when it’s silently wrong.”
- Now: the Morgan Stanley deployment relies on an eval framework - evals are the bottleneck.

Document processing

“It needs thousands of labeled samples. We can’t stand up new doc types quickly.”
- But Sep 2023: Google Document AI extracts with limited-to-no ML training.
“It learns fast but reads only text. We can’t handle scans, charts, and tables.”
- But Sep 2023: GPT-4V vision model and May 2024: GPT-4o native multimodal solved this.
“It sees the page but can’t understand long, layout-heavy documents. We can’t trust it on real multi-page files.”
- Now: NeurIPS 2024: on MMLongBench-Doc, GPT-4o scored under ~50 on multi-page chart/table documents.
- But Gemini 3.5 Flash, GPT 5.5, Claude 4.8 Opus, etc. have excellent vision and need to be tested.

Related