In Feb 2024, Claude 3 Opus was the best model, at $15/MTok.
In Jul 2024, GPT 4o Mini reached that quality at 10% of the price.
In Dec 2024, DeepSeek v3 reached that quality at 1% of the price.
If the price continues to fall 10x every 6 months or so (and it has been), then in a year, a Claude 4.6 Opus like model will cost 1/10th of the $5/MTok today, and in 2 years, 1/100th of that.
(We’ll be using better models, of course.)
But 2 years isn’t far away. If Opus 4.6 were 100x cheaper, I could do 100x of what I could do with it today. What would we do with it?
If we assume that they’ll become 100x faster as well (and that’s an important assumption), and the reliability will continue to improve, then:
- LLM LSPs. Language servers could be LLMs. Hover over a squiggly line to understand a bug it spotted, right click and fix. Move on.
- LLM pre-commit hooks. Write docs, write and run tests, refactor - automatically before you commit.
- Continous refactoring. LLMs auto-refactor the code, run tests, and commit better code.
- Auto-fix from logs. Log analysis -> Test case -> Fix -> Deployment can be automated.
- Pick best option. LLMs generate 30 diverse options for each task, test all, and pick the best. E.g. What’s the best language / framework for this? What’s the better visual design? What should I build?
- Live docs. LLMs auto-update docs every commit.
- Adversarial workflows. LLMs continously run adversarial test cases to break the code, and fix it.
- Build & discard, don’t buy. Most tools are easier to create than purchase. They’re also easier to throw away. To hell with code quality!