Extracting AI advice

This weekend, two people asked me, roughly “How do I use AI better?” This is a frequently asked questions. I document my FAQs, e.g. time management, career advice, etc. and it was time to add AI advice to this list. I often record online calls and transcribe them. I asked Gemini, Claude and ChatGPT for the best way to summarize 400 transcripts of ~40K each. Claude’s suggestion was the best: Use Gemini Flash (1M context, dirt cheap) to process calls in batches of 20-25 Each batch → extract advice themes Aggregate batch results with Claude Sonnet for final synthesis But I ignored it because it was too much work. (See my AI advice: “Ask for easier output”) ...

TDS Comic Generation

I use comics to make my course more engaging. Each question has a comic strip that explains what question is trying to teach. For example, here’s the comic for the question that teaches students about prompt injection attacks: For each question, I use this prompt on Nano Banano Pro via Gemini 3 Pro: Create a simple black and white line drawing comic strips with minimal shading, with 1-2 panels, and clear speech bubbles with capitalized text, to explain why my online student quizzes teach a specific concept in a specific way. Use the likeness of the characters and style in the attached image from https://files.s-anand.net/images/gb-shuv-genie.avif. 1. GB: an enthusiastic socially oblivious geek chatterbox 2. Shuv: a cynic whose humor is at the expense of others 3. Genie: a naive, over-helpful AI that pops out of a lamp Their exaggerated facial expressions to convey their emotions effectively. --- Panel 1/2 (left): GB (excited): I taught Genie to follow orders. Shuv (deadpan): Genie, beat yourself to death. Panel 2/2 (right): Genie is a bloody mess, having beaten itself to death. GB (sheepish): Maybe obedient isn't always best... … along with this reference image for character consistency: ...

TDS Jan 2026 GA1 released

Graded Assignment 1 (GA1) for the Tools in Data Science course is released and is due Sun 15 Feb 2026. See https://exam.sanand.workers.dev/tds-2026-01-ga1 If you already started, you might notice some questions have changed. Why is GA1 changing? Because some questions don’t work. For example: We replaced Claude Artifacts with a Vercel question because Claude won’t allow a proxy anymore. A question had unintentionally wrong instructions. (Some questions have intentionally wrong instructions, but those are, …um… intentional). Someone changed an API key. … etc. When will GA1 stabilize? Probably by end of day, Sun 9 Feb 2026? ...

Things I Learned - 08 Feb 2026

This week, I learned: The Disconnected Git Workflow explains how to use the git send-email workflow. That’s like using email instead of GitHub as the collaboration mechanism - decentralizing and reducing dependencies. Grok throws a HTTP 431 when you pass it a query over 6,890 characters in the URL. Here’s an example with 6,900 characters. As of now, there’s no way to tell uv to use the cache and install only missing repos (#15454). But this is Deno’s default behavior, making Deno a slightly better choice in this regard. Shelling Out Sucks shares common pitfalls when calling the shell from programs. Suggestion: Shell-escape ALL inputs. Use set -o pipefail to detect failures in the middle of a pipe chain. Explicitly check the error code, not just stderr. dax, which is based on zx, is a simpler Deno-based alternative to shell scripts. See examples. ChatGPT. However, scripting language matters more when humans maintain shell scripts. Since I’m using AI, it’s easier to use bash scripts and let it handle any complexities. git push --force-with-lease is like git push --force but won’t overwrite if others have pushed in the meantime. Default to this instead of --force – it’s safer. Microsoft’s docfind generates a WASM search index for documents, building a dependency free browser based compact and fast search. diffs seems a promising library for rendering diffs in the browser. Genie 3 seems pretty good. We should expect to see World Models becoming usable in a few months.

Migrating TDS from Docsify to Hugo

This morning, I migrated my Tools in Data Science course page from Docsify to Hugo using Codex. Why? Because Docsify was great for a single term. For multiple terms, archives became complex. I still could have made it work, but it felt like time to move towards a static site generator. I don’t know how Hugo or Go work. I didn’t look at the code. I just gave Codex instructions and it did the rest. This gives me a bit more confidence that educators can start creating their own course sites without needing coding or platforms. Soon, they might not be stuck to LMSs either - they can build their own. ...

RIP, Data Engineers

As AI marches along, another role at risk is the data engineer / database administrator. (Data scientists are already feeling the heat.) A common task for data engineers is to analyze SQL queries - to optimize and standardize. Pavan used Antigravity to analyze 1,500 SQL queries and found: 30% of queries are purely headcount / volume related. Much more than revenue (25%) or engagement (15%). That’s sign of a tactical culture. 70% of the queries are about What happened yesterday? rather than What will happen tomorrow? - again, tactical culture. Here’s the analysis. ...

Rise of the Indian TV Series

If you look at the IMDb titles with a 9+ rating and 50K votes this decade, there are only 4 entries. Every single one of them is an Indian TV series. Title Votes Rating Aspirants 316,390 9.1 Scam 1992: The Harshad Mehta Story 166,400 9.2 Sandeep Bhaiya 76,586 9.1 Sapne Vs Everyone 74,342 9.3 This is a new phenomenon. Last decade, there was only one Indian TV series in the same list: TVF Pitchers. ...

Gemini 3 Flash OCRs Dilbert accurately

Scott Adams, the author of Dilbert, passed away last month. While his work will live on, I was curious about the best way to build a Dilbert search engine. The first step is to extract the text. Pavan tested over half a dozen LLMs on ~30 Dilbert strips to see which one transcribed them best. Here are the results. Summary: Gemini 3 Flash does the best, and would cost ~$20 to process the entire Dilbert archive. But if you want a local solution, Qwen 3 VL 32b is the best. ...

When to use which Gemini mode

I continue to be impressed by Gemini 3 and it’s become my default agent. It writes in simpler language than ChatGPT (almost as eloquent as Claude), has much larger limits, and, of course, is unbeaten at generating images. The Gemini app has 3 modes: Fast, Thinking, and Pro. Here’s when to use each: Simple task, e.g., grammar check, translate, summarize, or basic question? Use Fast. Pro overthinks. Multi-step logic, e.g., planning a trip with constraints, checking 15 emails, or identifying a subtle error in code? Use Thinking. Flash-based thinking beats Pro. Large input, e.g. 300-page PDF, 2 hours of video, etc.? Use Pro. It uses the 1M+ token window well. Complex problem, e.g. PhD-level science or a legal contract review, with high stakes? Use Pro. If you hit your Pro limit (which is pretty high!), just switch to Thinking, which is smart enough for most jobs anyway. ...

Breaking Rules in the Age of AI

Several educators have AI-enabled their courses, like: David Malan at Harvard CS50 provides an AI-powered “rubber duck debugger” trained on course-specific materials. Mohan Paturi at UC San Diego has deployed AI-tutors to his students. Ethan Mollick at Wharton uses AI as tutor, coach, teammate, simulator, even student, and runs simulations. Jeremy Howard’s Fast.ai encourages students to use LLMs to write code, with a strict verification loop. Andrew Ng DeepLearning.AI integrates a chatbot into the platform, next to code cells, to handle syntax errors and beginner questions. But no one seems to have eliminated reading material, nor added an “Ask AI” button to solve each question, nor run it at my scale (~3,000 students annually). ...

Things I Learned - 01 Feb 2026

This week, I learned: Android screen recorder is the easiest way to record phone and WhatsApp calls. But that won’t work for Google Meet, Teams, Zoom, etc. Gemini exiftool remains the best media metadata extractor (music, images, …) though it’s old, slow, and Perl-based. exiftool -csv -r ~/Music/ > music.csv exports all metadata as CSV. Installing the source via https://sourceforge.net/projects/exiftool/files/latest/download seems best. It’s a good alternative to mp3tag / puddletag UI-based exports. ChatGPT Gemini ⭐ Some questions are for us to learn. Some are Socratic, and meant for the answerer to learn. When working with AI agents and interns, I find myself asking them several questions that I don’t want to know the answer for, but is important for them along their journey. Roughly the equivalent of “Think step by step” converted into the Socratic method. For example: Instead of “Build a demo for this client”, ask “Who is the audience? What’s their objective?” and THEN ask for a demo. Instead of “Generate a dummy dataset for X”, ask “What interesting insights would we want when analyzing X?” and THEN ask for a dataset. Instead of “Write this code”, ask “What’s the best architecture for this?” and THEN ask for code. Executable Markdown files with Unix pipes sounds like a clever idea. Prefix Markdown files with #!/usr/bin/env codex (or claude -p). Then, just write programs by describing them. Quotes from Isles of the Emberdark: Really, he should have known better than to punch a senator. Important people had underlings you punched on their behalf, and he should have found one of those. ChatGPT Canvas has a cool feature for editing documents or code. Just select a portion, ask for changes, and it edits it. Importantly, it’s very fast. Greeking Out is a kid-friendly National Geographic podcast about ancient Greece and its influence on modern life. fly.io containers at sprites.dev seem impressive. You can SSH into them. They have public & private HTTPS URLs. It auto-sleeps after 30s. You can checkpoint any time and restore the ENTIRE system. It’s FAST! This is great for agents. Just install Claude Code / Codex and other tools. Checkpoint it. Then ssh into it and use as required. The cost is typically ~12c/hour - which is expensive to run forever but great for bursts. Simon Willison I’m seeing the Collider Bias in action (on a small sample). The developers who can communicate well don’t code as well, and vice versa. Not because there’s a negative correlation - but because I’m eliminating people who can neither code nor communicate. But interestingly, over a 1-3 month horizon, the ones who code start communicating much better but the ones who communicate well don’t start coding much better. My theory is that the developers I work are communication-bottlenecked (e.g. lack of confidence) than unskilled (e.g. poor communicators). Prefer Zod for TypeScript validation and Ajv for schema validation. Typing has a lot of value, but don’t overdo it. It’s best used at fragile boundaries. ChatGPT ⭐ Notes from LLM poetry and the “greatness” question: Gwern follows this process to create good poetry. It’s a good structure for ANY kind of expert workflow with LLMs today: Analyze the style, content, and intent of the original. Brainstorm 10+ different directions the poem could go. Emphasize diversity. Critique each direction. Rate 1-5 stars. Write the best one. Critique and edit line by line. Generate a new clean draft. Repeat at least twice. Print final version. “As a poet and scholar of poetry I feel comfortable arguing that Gwern’s work engineering prompts is, in effect, writing poetry.” Mercor uses expert poets to creates rubric. Models generate poem that experts grades, which refines the rubric, which trains the model. But models tend to the mean and need nudges (from humans?) to surface outliers and ascribe meaning (uniquely human?), which is where greatness lies. Ethan Mollick: “I keep warning that so many of our systems are still built around the assumption that quality writing and analysis are costly and therefore meaningful signals. Our systems are very much not ready for the revelation that this is no longer true, as this planning objection AI shows.” Basically, AI lowers the cost of Government and Corporate interactions. It’d be a cool hack to agent-ify these to death, i.e. do all kinds of Government / Corporate interactions that were painful earlier, but now are much easier. I just realized: “Will AI take my job?” is a variant of “Will immigrants take my job?” or “Will affirmative action take my job?” Any increase in labor capacity is a threat. But then, the only way to get promoted is if someone takes your job. So, maybe we should ask: “How do I become their boss?” Better yet, tell your boss “I created a 4-agent team and got 2X done. Give me a new title.” Some simple yet powerful AI adoption principles from Will Larson - that I’ve seen work rather well: Make tools accessible Document tips & tricks Highlight how people (especially senior leaders) are using it An analysis of 1,250 Claude user interviews indicates that: Adoption of Creatives > Workforce > Scientists. Interestingly, the identity threat and guilt of Creatives > Workforce > Scientists! Creatives they feel they’re cheating, lazy, or not adding value! Scientists use it less, but it’s more a tool and THEY verify. Sceptical verification is the strongest thread. Mintlify is proposing .well-known/skills/ as the directory to store LLM skills sites want to publish. This could be an extension of the llms.txt mechanism. Open Responses is the open version of OpenAI’s Responses API. OpenRouter and HuggingFace support is a big deal, and though Google, Anthropic, Meta etc. don’t yet support it, they might. Restish converts OpenAPI specs into CLI tools - with shell completion. Combined with an OAuth CLI like oauth2c this is a great way to conert APIs to CLI commands. Via Vercel’s agent-browser seems a good CLI choice for browser automation, alongside playwright-cli. It may be work switching from direct Playwright coding (on CDP). ChatGPT Capturing actions using HAR and passing it to LLMs seems like another clever way of using AI coding agents for browser automation. Via Open a browser. Open Devtools > Network and filter to HTML, XHR, WS, Other. Do what you want to automate, i.e. load LinkedIn, search, scroll, fetch next pages, etc. Devtools > Network > right click > “Save All As HAR”. Run the file through a HAR-sanitizer Prompt: “Create a Python client to automate the actions I captured in file.har". When any AI coding agent can build apps, value will probably migrate away from software to data, network (distribution and users), trust, taste, and physical goods. Owning these controls value. Also, infrastructure to run vibe-coded apps (e.g. auth, hosting, DB, LLM APIs, etc. bundled) will likely lead to Medium / WordPress like platforms. After 30 years of learning (and teaching) statistics, I finally found a good explanation of R². R²=80% means that ~80% of the change is because of the other variable. Gemini ⭐ People think numbers create trust; often they create attack surfaces. Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.” By providing a number, you invite people to “game” the system or find the flaws in how that number was manufactured. The Precision Trap: While precise numbers can increase perceived credibility initially, they also lead to “anchoring.” If the number is even slightly off, the entire foundation of trust collapses more violently than it would for a general estimate. Statistical Literacy Gap: Most people don’t argue with “vibes,” but many will argue with “averages” if their personal experience represents an outlier. The number creates a surface for anecdotal rebuttal. Eraser.io offers an AI architecture diagram generator that creates reasonable architectures. It uses its own diagram-as-code DSL, competing with D2, PlantUML, Mermaid, Exposing your workflow as a software interface productizes services businesses. For example, my auditors and immigration lawyers have portals where I can fill out forms, upload documents, see my status, etc. This standardizes their delivery, and creates a “product” moat. ⭐ Your “villains” or enemies are often alternatives/backups that have a role in the ecosystem, offering diversity/resilience when you’re wrong. Create roles and incentives for them rather than eliminating them. For example: Don’t make LLMs do all the work. Create a role for the clunky SQL whose resilience saves the day when LLMs hallucinate. Make the person who hates your prototype the Red Team Lead - to catch the flaws you miss. Make the people who reject your product the scouts / innovators - to find alternatives you miss. Neon.com is like Supabase but without auth, functions, etc. It’s just Postgres as a service. An alternative for prototypes (that I haven’t tried yet.) ChatGPT SuperTokens is an open-source self-hosted auth service that I’m hearing about more often, but haven’t tested. Seems to be ahead of alternatives like Auth.js / Better Auth. ChatGPT Bollywood Falls Out Of Love is a great visual data story on The Kontinentalist by Surbhi about the decline of romance and growth of nationalism on bollywood genres. Recharts is a React charting library with some slick capabilities like brushing, customizable tooltips, and bar chart races. Via Rukmini - Data for India Qwen3 TTS is impressive. It voice-clones, streams, and the tone/style can be controlled via prompts. The model is small. I ran it locally without flash-attn (which I couldn’t get to work) and took ~14 seconds to generate an audio file for 10 words on my GPU machine. Environment setup: uv venv --python 3.12 UV_TORCH_BACKEND=auto uv pip install -U qwen-tts DeepSeek created an external memory system for LLMs that lets them look up (instead of computing to remember) knowledge. That means CPU RAM can be used instead of GPU, models can become smaller, and training can become faster. This looks like an example of how algorithms/ideas can continue the scaling laws. Gemini via Jeremy Howard