Based on several (i.e. two) recommendations, I subscribed to MiniMax. At $10/month, you get 1,500 requests every 5 hours and 15,000 every week. That’s a LOT!
Using the same prompt I had Claude Code generate two data stories:
Here’s my comparison of the two. It’s partly based on Claude Opus 4.6’s comparison but I felt the same way.
| Dimension | Sonnet 4.6 | MiniMax M2.7 |
|---|---|---|
| Narrative quality | Immersive | |
| Content coverage | Comprehensive | |
| Visual design | More varied, ambitious bands, no errors | |
| CSS | Better use of CSS variables | |
| Tooltips | Richer, comprehensive, data-tip |
|
| Modals/popups | Richer, more types, more details | |
| Animated SVGs | Richer, visually distinctive, sophisticated | |
| Slides | Larger readable grid | |
| Code samples | XML vs JSON-LD side-by-side | |
| External references | Far more authoritative links | |
| Accessibility | ARIA, keyboard, alt text | |
| Generation quality | Clean, no Chinese character artifacts |
In other words, Sonnet 4.6 is a clear winner on nearly every dimension.
But the cost factor is too big a difference to ignore. It feels like a 10x difference. So the question probably is: what can I do with a reasonably good model that can generate 10X the quantity at the same price?
(To be fair, GPT 5.4 Mini at 75c/MTok and Gemini 3 Flash at 50c/MTok are not far from MiniMax M2.7 at 30c/MTok - but their code quality seems lower. I generated a Codex - GPT 5.4 Mini version and while it has fewer errors it has even less visual style and narrative quality.)
Computer use feels like a candidate. I used Rodney to research what drives my LinkedIn reach & engagement, and update my SKILL.md.
I could try experimenting with sub-agents, doing bulk analysis (e.g. of code, transcripts, images), data discovery, etc. The crux of these is parallelization - something I have not explored much.
It looks like twe’re entering an era where there are two kinds of use cases: high-quality for the best models, large-scale for the cheap models. The question is: how do I make the most of both?
UPDATE: Cheap models (or at least MiniMax M2.7) may be far less useful than I thought. I used MiniMax M2.7 with Claude Code for:
- 24 Mar 2026: Email analysis. I had it review my 15-year Gramener email archive for key events for a book. But it fetched too few results, so I switched to Codex (GPT 5.4 xhigh).
- 25 Mar 2026: Capture The Flag. But it couldn’t solve problems, so I switched to Codex (GPT 5.4 xhigh).
- 25 Mar 2026: Songs download. I had it find popular Tamil songs and download them from YouTube. But the metadata was poor, so I switched to my own song collection.
- 26 Mar 2026: LEAN proofs. It started making too many basic mistakes (spelling errors in code!) I switched to Copilot (GPT 5.4 xhigh).
- 29 Mar 2026: Calvin & Hobbes image analysis. It couldn’t even read the images and confidently saw “Hobbes stuck to a baseball bat with Mom & Dad” in a strip that only featured Calvin & Susie.
The main problems are:
- It errs confidently. It doesn’t do ROT13 well. It can’t see images. It mis-understands error messages. It assigned my earlier company’s incorporation date (NGIMAGE) as Gramener’s. It made Vijay Sethupathi a lyricist. When a process failed with just 12% coverage, it just continued. It just reported what’s done, not what’s missing.
- It’s a slow learner. For picoCTF, it had the pieces but couldn’t assemble them. Claude Code resets the cwd, but it never switched to absolute paths. It mixed
uv runwithpython3. It rewrites, resets or waits instead of diagnosing.
It’s best for simple, single-step tasks. Not where knowledge, accuracy, research matters. When using it, keep tasks small and verify correctness, completeness.