LLMs | S Anand

If a bot passes your exam, what are you teaching?

It’s incredible how far coding agents have come. They can now solve complete exams. That changes what we should measure. My Tools in Data Science course has a Remote Online Exam. It was so difficult that, in 2023, it sparked threads titled “What is the purpose of an impossible ROE?” Today, despite making the test harder, students solve it easily with Claude, ChatGPT, etc. Here’s today’s score distribution: ...

Is all AI content slop?

Is all AI content slop? I asked Claude to: Analyze this thread. Then explain it like a Malcolm Gladwell New Yorker article. https://news.ycombinator.com/item?id=45820872 It gave me a beautiful, engaging and insightful essay about a 300+ message debate about AI vs humans on routine tasks. https://claude.ai/share/60c5810f-5c81-4970-8026-a24bf89c3392 Is this slop? One phrase stood out: There’s an irony here that the commenter doesn’t quite state but implies beautifully: we’ve spent so long celebrating automation because humans are imperfect that we’ve forgotten we also value humans because they’re imperfect. ...

OpenAI TTS cost

The OpenAI text-to-speech cost documentation is confusing. As of 2 Nov 2025: GPT-4o mini TTS costs $0.60 / MTok input and $12.00 / MTok audio output according to the model page and the pricing page. They also estimate this to be ~1.5c per minute - both for input and output. It supports up to 2,000 tokens input. TTS-1 costs $15 / MTok speech generated according to the model page but the pricing page says it's $15 / MChars. No estimate per minute is provided. Is supports up to 4,096 characters input. TTS-1 HD is twice as expensive as TTS-1 I wanted to find the approximate total cost for a typical text input measured per character and token. ...

Tamil AI

I was testing LLMs’ sense of Tamil humor with this quote: Extend this post with more funny Tamil words that end with .ai - mentioning why they’re funny. Chenn.ai is the artificial intelligence capital of India. Kadal.ai Kad.ai Dos.ai Vad.ai Ad.ai Thal.ai Mallig.ai Aratt.ai And finally Podad.ai All spoken in namma bash.ai 😅 The Chinese models didn’t fare well. DeepSeek made up words. Mood.ai - An AI that perfectly captures your mood. Sokk.ai - The AI for when you’re bored. Thanni.ai - A hydration assistant. Qwen too. ...

How to create a data-driven exam strategy

Can ChatGPT give teachers data-driven heuristics on student grades? I uploaded last term’s scores from about 1,700 students in my Tools in Data Science course and asked ChatGPT: This sheet contains the scores of students … (and explained the columns). I want to find out what are the best predictors of the total plus bonus… (and explained how scores are calculated). I am looking for simple statements with 80%+ correctness along the lines of: ...

The Non-Obvious Impact of Reasoning Defaults

Yesterday, I discovered how much reasoning improves model quality. My Tools in Data Science assignment asks students to draft an llms.txt file for ipify and auto-checks with GPT-5 Nano - a fast, cheap reasoning model. I set reasoning_effort to minimal and ran this checklist: 1. Starts with "# ipify" and explains ipify. 2. Markdown sections on API access, support (e.g. GitHub, libraries). 3. Covers API endpoints (IPv4, IPv6, universal) and formats (text, JSON, JSONP). 4. Mentions free, no-auth usage, availability, open-source, safeguards. 5. Has maintenance metadata (e.g. "Last updated: <Month YYYY>"). 6. Mentions robots.txt alignment. Stay concise (no filler, <= ~15 links). If even one checklist item is missing or wrong, fail it. Respond with EXACTLY one line: PASS - <brief justification> or FAIL - <brief explanation of the first failed item>. With a perfect llms.txt, it claimed “Metadata section is missing” and “JSONP not mentioned” – though both were present. ...

Tools in Data Science Sep 2025 edition is live: https://tds.s-anand.net/. Major update: a new AI-Coding section and fresh projects. I teach TDS at the Indian Institute of Technology, Madras as part of the BS in Data Science. Anyone can audit. The course is public. You can read the content and practice assessments. I fed the May 2025 term student feedback into The Sales Mind and asked: What are the top non-intuitive / surprising inferences? What are interesting observations? What are high impact actions? Full analysis: https://lnkd.in/gVWVqaxN: summary, outliers, and action ideas. ...

Vibe-Scraping: Write outcomes, not scrapers

There hasn’t been a box-office explosion like Dangal in the history of Bollywood. CPI inflation-adjusted to 2024, it is the only film in the ₹3,000 Cr club. 3 Idiots (2009) is the first member of the ₹1,000 Cr club (2024-inflation-adjusted). The hot streak was 2013-2017: each year, a film crossed that bar: Dhoom 3, PK, Bajrangi Bhaijaan, Dangal, Secret Superstar. Since then, we never saw such a release except in 2023 (Jawan, Pathan). ...

Vibe Shopping

I’ve started vibe shopping, i.e. using ChatGPT to shop for small, daily items and buying without verifying. For example: “A metal rack for the floor: at least 2 ft * 1 ft * 2 ft, small gaps, popular options on Amazon.in.” https://chatgpt.com/share/68d61d68-7040-800c-936b-354749539308 “An optical wired mouse that’s smaller than usual, 4*+, popular, Prime-eligible for Chennai by the weekend on Amazon.in.” https://chatgpt.com/share/68d61e0d-420c-800c-bc71-821b9f9296a9 The best use is when I don’t know the right terms. In this case, the terms were wire rack and mini mouse. ...

Tools in Data Science Sep 2025 edition is live: https://tds.s-anand.net/. Major update: a new AI-Coding section and fresh projects. I teach TDS at the Indian Institute of Technology, Madras as part of the BS in Data Science. Anyone can audit. The course is public. You can read the content and practice assessments. I fed the May 2025 term student feedback into The Sales Mind and asked: What are the top non-intuitive / surprising inferences? What are interesting observations? What are high impact actions? Full analysis: https://chatgpt.com/share/68cba081-afc0-800c-9da3-75222e84a499: summary, outliers, and action ideas. ...

Voice coding is the new live coding

In Feb 2025 at PyConf Hyderabad, I tried a new slide format: command-line slideshows in bash. I’ve used this format in more talks since then: LLMs in the CLI, PyCon Singapore, Jun 2025 Agents in the CLI, Singapore Python User Group, Jul 2025 DuckDB is the new Pandas, PyCon India, Sep 2025 It’s my favorite format. I can demo code without breaking the presentation flow. It also draws interest. My setup was the top question in my PyCon talk. ...

AfterSlides: Write Slides After Talks

25 years ago, Mr. Krishnan (IAS) amused us with anecdotes of bureaucrats writing meeting minutes before the meeting. This week, I flipped that. I wrote slides after the talk. I call them AfterSlides. Why. I ran a couple of Ask-Me-Anything (AMA) sessions where the audience set the agenda. I learned their interests. They got answers. No slides prepared. How. I okayed recording with the organizers, recorded on my phone, transcribed with Gemini, and asked ChatGPT to generate the AfterSlides. ...

Turning Generic Gifts Into Joy with AI

In 2001, I received a campus interview invitation from BCG. It opened like this: Dear Anand, We’d like to invite you to an interview on … We were impressed by your … … and went on to share 2-3 phrases about what they liked about my CV. A dozen of us got similar letters – each personalized! That was cool. Two decades later, I still remember it. It showed care and competence – care enough to personalize for each candidate, competence to pull it off at scale across campuses. ...

GPT-5 (Codex) follows instructions exactly as given. Usually a good thing, but sometimes, it this is what happens. AGENTS.md: ALWAYS WRITE TESTS before coding. Codex: Let me begin with the tests. (Spends 5 minutes writing tests.) Anand: Stop! This is a proof of concept. We don’t need tests! AGENTS.md: Write tests before coding. Drop tests for proof-of-concepts. Codex: (Proceeds to delete all existing tests.) Anand: STOP! We need those tests! ...

I use LLMs to create photos and comics. But they can generate any kind of illustration. So why limit ourselves? My problem is imagination: I know little about art. So, I asked ChatGPT, Claude, and DeepSeek: Suggest 10 unusual illustration styles that are not popular in social media yet but are visually striking. I would like to have an LLM create images in that style. For each of those, show me an (and link to) an online image in that style. ...

Slides for my DataHack Summit talk (controversially) titled RIP Data Scientists are at https://sanand0.github.io/talks/2025-08-21-rip-data-scientists/ Summary: as data scientists we explore, clean, model, explain, deploy, and anonymize datasets. I live-vibe-coded each step with DGCA data in 35 minutes using ChatGPT. Of course, it’s the tasks that are dying, not the role. Data scientists will leverage AI, differentiate on other skills, and move on. But the highlight was an audience comment: “I’m no data scientist. I’m a domain person. I’ll tell you all this: If you don’t follow these practices, you won’t have a job with me!” ...

My Tools in Data Science course uses LLMs for assessments. We use LLMs to Suggest project ideas (I pick), e.g. https://chatgpt.com/share/6741d870-73f4-800c-a741-af127d20eec7 Draft the project brief (we edit), e.g. https://docs.google.com/document/d/1VgtVtypnVyPWiXied5q0_CcAt3zufOdFwIhvDDCmPXk/edit Propose scoring rubrics (we tweak), e.g. https://chatgpt.com/share/68b8eef6-60ec-800c-8b10-cfff1a571590 Score code against the rubric (we test), e.g. https://github.com/sanand0/tds-evals/blob/5cfabf09c21c2884623e0774eae9a01db212c76a/llm-browser-agent/process_submissions.py Analyze the results (we refine), e.g. https://chatgpt.com/share/68b8f962-16a4-800c-84ff-fb9e3f0c779a This changed our assessments process. It’s easier and better. Earlier, TAs took 2 weeks to evaluate 500 code submissions. In the example above, it took 2 hours. Quality held up: LLMs match my judgement as closely as TAs do but run fast and at scale. ...

Here’s my current answer when asked, “How do I use LLMs better?” Use the best models. O3 (via $20 ChatGPT), Gemini 2.5 Pro (free on Gemini app), or Claude 4 Opus (via $20 Claude). The older models are the default and far worse. Use audio. Speak & listen, don’t just type & read. It’s harder to skip and easier to stay in the present when listening. It’s also easier to ramble than to type. Write down what fails. Maintain that “impossibility list”. There is a jagged edge to AI. Retry every month, you can see how that edge shifts. Wait for better models. Many problems can be solved just by waiting a few months for a new model. You don’t need to find or build your own app. Give LLMs lots of context. It’s a huge enabler. Search, copy-pasteable files, past chats, connectors, APIs/tools, … Have LLMs write code. LLMs are bad at math. They’re good at code. Code hallucinates less. So you get creativity and reliability. Learn AI coding. 1. Build a game with ChatGPT/Claude/Gemini. 2. Create a tool useful to you. 3. Publish it on GitHub. APIs are cheaper than self hosting. Don’t bother running your own models. Datasets matter. Building custom models does not. You can always fine-tune a newer model if you have the datasets. Comic via https://tools.s-anand.net/picbook/ ...

The Surprising Power of LLMs: Jack-of-All-Trades

I asked ChatGPT to analyze our daily innovation-call transcripts. I used command-line tools to fetch the transcripts and convert them into text: # Copy the transcripts rclone copy "gdrive:" . --drive-shared-with-me --include "Innovation*Transcript*.docx" # Convert Word documents to Markdown for f in *.docx; do pandoc "$f" -f docx -t gfm+tex_math_dollars --wrap=none -o "${f%.docx}.md" done # Compress into a single file tar -cvzf transcripts.tgz *.md … and uploaded it to ChatGPT with this prompt: ...

Measuring talking time with LLMs

I record my conversations these days, mainly for LLM use. I use them in 3 ways: Summarize what I learned and the next steps. Ideate as raw material for my Ideator tool: /blog/llms-as-idea-connection-machines/ Analyze my transcript statistics. For example, I learned that: When I’m interviewing, others ramble (speak long per turn), I am brief (less words/turn) and quiet (lower voice share). In one interview, I spoke ~30 words per turn. Others spoke ~120. My share was ~10%. When I’m advising or demo-ing, I ramble. I spoke ~120 words per turn in an advice call, and took ~75% of the talk-time. This pattern is independent of meeting length and group size. I used Codex CLI (command-line tool) for this, with the prompt: ...