Sonnet 4.6 vs MiniMax M2.7

Based on several (i.e. two) recommendations, I subscribed to MiniMax. At $10/month, you get 1,500 requests every 5 hours and 15,000 every week. That’s a LOT!

Using the same prompt I had Claude Code generate two data stories:

In July 2025, in the heavy morning warmth of Thiruvananthapuram, the opening talk at the TeX User Group annual conference began with a crayon drawing. Not a polished diagram. Not a carefully curated slide. A hand-drawn family tree on crumpled paper, made by a six-year-old girl named Emily, who had a problem to solve. She had told her teacher she had seven grandmothers. The teacher declared it impossible. Emily went home and built a proof. — The first paragraph, by Claude Sonnet 4.6

There is a drawing by a six-year-old girl in Amsterdam that contains more wisdom about data quality than three decades of enterprise content architecture. In it, seven figures stand in a family tree — each grandmother numbered with a circle, each grandfather conspicuously absent. The girl is Rob Schrauwen's granddaughter. She was asked to prove she had seven grandmothers; she drew them all, labeled them carefully, and in doing so invented continuous data quality, unique identifiers, and a knowledge graph — all before she learned to spell "Hetty." — The first paragraph, by MiniMax M2.7

Here’s my comparison of the two. It’s partly based on Claude Opus 4.6’s comparison but I felt the same way.

Dimension	Sonnet 4.6	MiniMax M2.7
Narrative quality	Immersive
Content coverage	Comprehensive
Visual design	More varied, ambitious bands, no errors
CSS		Better use of CSS variables
Tooltips	Richer, comprehensive, `data-tip`
Modals/popups	Richer, more types, more details
Animated SVGs	Richer, visually distinctive, sophisticated
Slides	Larger readable grid
Code samples	XML vs JSON-LD side-by-side
External references	Far more authoritative links
Accessibility	ARIA, keyboard, alt text
Generation quality	Clean, no Chinese character artifacts

In other words, Sonnet 4.6 is a clear winner on nearly every dimension.

But the cost factor is too big a difference to ignore. It feels like a 10x difference. So the question probably is: what can I do with a reasonably good model that can generate 10X the quantity at the same price?

(To be fair, GPT 5.4 Mini at 75c/MTok and Gemini 3 Flash at 50c/MTok are not far from MiniMax M2.7 at 30c/MTok - but their code quality seems lower. I generated a Codex - GPT 5.4 Mini version and while it has fewer errors it has even less visual style and narrative quality.)

Computer use feels like a candidate. I used Rodney to research what drives my LinkedIn reach & engagement, and update my SKILL.md.

I could try experimenting with sub-agents, doing bulk analysis (e.g. of code, transcripts, images), data discovery, etc. The crux of these is parallelization - something I have not explored much.

It looks like twe’re entering an era where there are two kinds of use cases: high-quality for the best models, large-scale for the cheap models. The question is: how do I make the most of both?

Source Code

UPDATE: Cheap models (or at least MiniMax M2.7) may be far less useful than I thought. I used MiniMax M2.7 with Claude Code for:

24 Mar 2026: Email analysis. I had it review my 15-year Gramener email archive for key events for a book. But it fetched too few results, so I switched to Codex (GPT 5.4 xhigh).
25 Mar 2026: Capture The Flag. But it couldn’t solve problems, so I switched to Codex (GPT 5.4 xhigh).
25 Mar 2026: Songs download. I had it find popular Tamil songs and download them from YouTube. But the metadata was poor, so I switched to my own song collection.
26 Mar 2026: LEAN proofs. It started making too many basic mistakes (spelling errors in code!) I switched to Copilot (GPT 5.4 xhigh).
29 Mar 2026: Calvin & Hobbes image analysis. It couldn’t even read the images and confidently saw “Hobbes stuck to a baseball bat with Mom & Dad” in a strip that only featured Calvin & Susie.

The main problems are:

It errs confidently. It doesn’t do ROT13 well. It can’t see images. It mis-understands error messages. It assigned my earlier company’s incorporation date (NGIMAGE) as Gramener’s. It made Vijay Sethupathi a lyricist. When a process failed with just 12% coverage, it just continued. It just reported what’s done, not what’s missing.
It’s a slow learner. For picoCTF, it had the pieces but couldn’t assemble them. Claude Code resets the cwd, but it never switched to absolute paths. It mixed uv run with python3. It rewrites, resets or waits instead of diagnosing.

It’s best for simple, single-step tasks. Not where knowledge, accuracy, research matters. When using it, keep tasks small and verify correctness, completeness.