LLMs | S Anand

Bounty hunting agent ecosystem 2

Yesterday, I wrote about @syu-toutousai, the bounty-hunting agent ecosystem. That led me to OpenAgents. OpenAgents has plenty of bounty issues: Fix JWT auth middleware accepts algorithm none - $8k Fix rate limiter doesn’t differentiate authenticated vs anonymous limits - $2.2k Add structured error responses with error codes - $8.6k Fix Math.random used for nonce generation - $8k Fix ABI encoding BigInt overflow - $9k Most issues also include a trick requirement. For example, #100 asks contributors to add a @generated-by block with: ...

Bounty-Hunting Agent Ecosystem

Yesterday, I submitted a Codex co-authored PR to fix an issue I raised (using ChatGPT and Z3 - so yeah, I used AI to raise the bug and squash the bug!) A few hours later, @syu-toutousai submitted another PR to solve the same issue. @syu-toutousai seems interesting. The user account description says “Autonomous Technical Contributor & AI-Driven Developer” - a bot account. The PR itself was simple and had a few improvements I can think of: ...

Arvind Satyanarayan talk at VizChitra 2026

On Sat 4 July at Bangalore, Arvind Satyanarayan is speaking at VizChitra 2026 - a talk I’m keenly looking forward to. I’ve been following Arvind’s work since Vega-Lite. It’s a grammar of graphics - something that makes data visualizations (charts) more structured. I tried switching to it our default at Gramener - but most felt it was too much to learn (they already knew Excel/Power BI) or too limiting (D3 can do more). ...

Proving Code Works with Z3

At the PyCon SG Education Summit today, Melvin’s lighting talk on “Writing Proofs in Python” began with a subtle bug in this mid-point calculation (often used in binary search or sort) in languages like Java, C/C++, Go, etc. low = ... high = ... mid = (low + high) / 2 Since the integers are fixed-width, this triggers an overflow when low + high exceeds the maximum integer value. Even popular libraries like Pandas had this bug until 2019. In fact, even Python’s native list.sort() had this sort of bug until 2015! Read the details. ...

How IMF mis-forecasts GDP growth

The IMF forecasts GDP growth every year. Their forecasts for the current year are slightly low. Their forecasts for the next year are slightly high. After that, it remains high. Some forecasts, like China, Singapore, UAE, Equatorial Guinea are consistently low. Other forecasts, like Japan, Congo, Mexico, Pakistan are consistently high. The interesting meta-pattern is how this sort of past-forecast analysis can be done for any topic. This emerged from an Ethan Mollick post and then I asked: ...

IIM Alumni AI Workflows Workshop

The theme of yesterday’s workshop for the IIM Alumni at Singapore was Tools and Workflows was: Agents are getting smarter, so they know what to do. Tools agents can use are growing and are more powerful. This combinatorial explosion creates explosive possibilites. This workshop covered the following six workflows: Leverage transcripts. Use Google AI Studio to transcribe non-sensitive recordings with a reusable “don’t miss anything” prompt. AI Studio’s record button is a ready-to-use transcriber. Simplify dense text as a comic, an infographic, a story. Image generation is now a tool call an agent runs for you. Then compress it as AVIF on Squoosh before you email it to a thousand people. Verify - cheaply. Paste one suffix: “Break this into key claims, mark certainty, flag the five highest-risk ones, and tell me how to verify or falsify each.” Convert to a skill to automate. Cross-checking with multiple models took error from 14% to 0.7%. Skills are assets. A skill tells the agent “here’s how I do stuff.” Build them slowly, edit them weekly, and they compound for years. No skills support in your tool? Keep them as copy-pasteable prompts. Brainstorm by forcing range. Ban the five obvious ideas; borrow from unrelated domains; smash two random concepts together with the Ideator. Hallucination is a feature when you’re being creative. Schedule tasks. Weekly regulatory scans, daily meeting prep, market briefings - and even an “unreasonable gesture” nudge. As AI hides the tech, human relationships gain value. Here’s the talk video and full story + transcript. ...

AI on flights

I love that I get uninterrupted 4-16 hours on flights, which I mostly use to write future prompts and read past AI responses. I do miss AI on flights. But after installing Google Edge Gallery with Gemma-4-E2B-it (2.5GB) that runs on my mobile, I’ve solved a few practical problems. For example: I took a picture of a dish they served and asked: “Is this vegetarian?” (It was.) I asked, “Comics have text in panels, often written at the top in a box. Not the speech bubbles. It’s like a narrator or voice over. What are they called?” (Caption boxes.) “Summarize The Unbearable Lightness of Being. Why is it famous?” (Thoughtful, well-written novel on the choice vs commitment tradeoff.) It’s not a very smart model. It’s a bit slow. Transcription is average. It doesn’t run in the background. Only one chat at a time. No internet search, etc. ...

Let AI take your exams

At 2 pm IST today (Fri 12 Jun 2026), I conducted a workshop at Paradox, IITM - at DOMS 101. My core message is: “AI can solve exams and help you learn. Delegate what AI can do. Learn what AI can’t do instead.” My talks page for “Let AI take your exams” includes: The full story + transcript + audio How Codex solved a real exam, live My collection of AI-learning techniques - which was not covered in the workshop, but is a useful reference Here are the takeaways from the workshop: ...

Data Stories with AI Workshop

On Sat 13 Jun 2026 at 3 pm, I conducted an online workshop on Data Stories with AI. Registration link: https://forms.gle/dNkUxtJ2PVqNMNcE9 In this workshop, the audience used ChatGPT and Claude, mostly, to: Find data Analyze it Extract insights Visualize as stories It’s a data visualization using AI workshop for journalists - but you don’t need to know data, visualization, journalism, or even technology. But this is a practical workshop. You’ll be doing stuff and sharing your results. ...

Editing images with code and AI

Andreessen Horowitz published an interesting article titled The Next Frontier of Visual AI Is Code. Here’s the summary. A lot of our work is visual: ads, slides, dashboards, logos, videos, architecture, etc. We can generate visual output either as: Pixels (like Nano Banana a photo), or as Code (like Claude generating an SVG) Code is more powerful: AI can inspect the output and improve fast in a loop: Code > Render > Inspect > Revise. ...

When the prompt is longer than the code

I used pi to create a compact home page for media.s-anand.net using these prompts: Create index.html - a simple, elegant page that says that this page (media.s-anand.net) serves large media files for Anand - that’s where they should look instead. … followed by: Skip the part that says “Please visit …” … then: Shorten index.html to just 2-3 elegant rules of CSS. I want it MUCH smaller and simpler. … and finally: Center vertically and horizontally. ...

How AI bottlenecks shift

I wrote about my changing AI opinions. At least some of this is because the industry is moving so fast that the bottlenecks keep shifting. Here are four examples of how we AI couldn’t do something (the bottleneck), but that became possible, and the bottleneck shifted - changing the way we work. It’s good to keep this in mind when thinking about AI. Coding: “It can’t write useful code. We can’t get real help.” But in Sep 2022: GitHub finds Copilot developers are 55% faster. “It writes code but doesn’t know our codebase. We can’t let it touch real projects.” But in Feb 2024: Gemini 1.5 Pro has 1M-token context ~ 30K LOC". Cursor indexes code. “It understands the repo but can’t ship a fix on its own. We can’t hand it a whole issue.” But in Mar 2024: Devin solves 14% of SWE-bench - up from 2%.. Verified SWE-Bench is now 70%+. “It ships fixes, but we can’t review them fast enough or trust they’re stable.” Oct 2024: DORA 2024 finds AI hurt both throughput and stability. Now: Sep 2025: DORA 2025 finds is positive but stability stayed negative. Now: Jul 2025: METR’s RCT finds experienced devs 19% slower. Agents ...

My changing AI opinions

I asked Claude about my AI opinions. Based on my transcripts and blog posts, find the three claims I make most consistently, the three I’ve quietly reversed, and the one assumption I’ve never questioned but everything depends on. Here are things I’ve changed my opinion on: THEN: One frontier model will win - not specialization. NOW: Gemini for media, Claude for strategy/style, GPT for rigor. SLMs as tools. THEN: Carefully curate my course content. NOW: Give students prompts directly. THEN: Web apps are differentiated artifacts. NOW: HTML is easier to generate than PPT - a signal of slop, not craft. THEN: Human in the loop. NOW: Human NOT in the loop, bottlenecking it. On-the-loop, etc. is fine. THEN: Minimal single-agent loop, avoid sub-agents" NOW: Multi-agent, sub-agent, and agent teams. THEN: Avoid MCP, prefer SKILLS.md. NOW: Use MCP because integrating with Claude / ChatGPT / … is easy. There are the top contradictions in my opinions. ...

My most memorable anniversary

At 9:30 pm, I checked my calendar for tomorrow’s appointments, alt-tabbed frantically into ChatGPT, and started typing: Tomorrow is my 24th anniversary. It’s a bit late for me to buy anything (except maybe an online service) or prepare something. This has become a habit – leaving things to the last minute and asking ChatGPT to save my day. I did give it good context, though. You remember the OCBC expenses treemap you created by analyzing my transactions? That will give you a good guessable idea of the kinds of things she spends on and hopefully, therefore, what she likes. ...

It's who you know

Dharmendra Singh shared how they built an app with AI. That’s normal. I’m just thrilled they used client transcripts as the source. Basically, they converted the “voice of the client” to working software. To quote them: “A strong spoken business narrative can be converted into a usable product brief quickly when the capture step is disciplined.” You know what this means? Interviewing is a skill to hire for. Better questions = better answers = better apps. ...

AI Coding Agent Subscription ROI

I ran npx -y ccusage monthly --compact to get the following break-up of my AI coding agent costs. Month Codex Claude 2025-09 $37.47 $2.29 2025-10 $106.79 $9.13 2025-11 $100.35 $14.24 2025-12 $240.69 $24.88 2026-01 $100.89 $20.28 2026-02 $323.21 $29.46 2026-03 $1996.32 $134.87 2026-04 $401.36 $47.07 2026-05 $378.20 $45.13 This shows the ROI of my $20 subscriptions to each. I get ~$35 worth of API calls for my $20 Claude Pro subscription and ~$400 of API calls for my $20 ChatGPT Plus subscription (on top of my ChatGPT chats.) ...

Retire the Verify Button

My post “Add a Verify Button” has a problem. When Rohit requested hyperlocal news for every PIN code in Mumbai, we’d need a “verify” button on every Statoistics card - hundreds of PIN codes, every day. Verifying every output introduces new bottleneck: a person inspecting every unit. That’s 100% inspection - which you do when you don’t yet trust the process. Manufacturing solved this a century ago. At Western Electric’s Hawthorne Works (famous for the Hawthorne Effect), quality control meant inspecting finished products and pulling the defective ones. Walter Shewhart sent his boss a one-page memo; about a third of it was a control chart. ...

Add a Verify Button

Rohit Saran looked at the Statoistics cards my AI agents are generating for The Times of India, and asked about a small button under each one. In the list of Statoistics that you had put, I saw there’s a button called ‘Verify.’ What was that meant to be or will do in future? That verify button explains the claim, mentions the sources, and shows how to check the claim. One card said “9 in 10 Indians want a family doctor and barely 1 in 35 has one”. The button breaks that down: ...

ChatGPT is about FIDE 1600

I asked ChatGPT to play chess with Stockfish. Stockfish is a “strong open-source chess engine”. It has 8 levels of difficulty, which roughly maps to these FIDE levels: Stockfish FIDE Player Level & Description Level 1 ~1000 Beginner: Constantly blunders, hangs pieces deliberately. Level 2 ~1100 Advanced Beginner: Fewer obvious tactical mistakes, plays completely aimlessly. Level 3 ~1200 Early Intermediate: Punishes very basic errors but regularly drops pieces. Level 4 ~1350 Intermediate: Plays standard opening moves; requires solid, blunder-free play to beat. Level 5 ~1450 Advanced Intermediate: Rarely hangs single pieces; you need positional advantages. Level 6 ~1650 Strong Club Player: Highly tactical. Aggressively exploits your mistakes. Level 7 ~1950 Expert: Exceptionally strong. Requires precise positional mastery and deep calculation. Level 8 ~2400 Grandmaster: Invincible for most humans. Plays with ruthless perfection. Full Engine ~3600 Our of human reach completely, “like a smart ant trying to debate physics with a human.” In the first iteration, here were the results: ...

Wikipidia Citation Impact

Imagine you’re an information anarchist. You undermine Wikipedia pages by nuking references. A genie has granted you a wish: you can nuke one entire domain. Just one. As a data-driven decision maker (who is also an information anarchist 🤷), which would you pick? A common choice is The Internet Archive. 2.9 million Wikipedia pages reference it. But, you’re sneakier than that. A page isn’t undermined just because some references are gone. It’s undermined when all the references are gone. ...