September 2025

Vibe-Scraping: Write outcomes, not scrapers

There hasn’t been a box-office explosion like Dangal in the history of Bollywood. CPI inflation-adjusted to 2024, it is the only film in the ₹3,000 Cr club. 3 Idiots (2009) is the first member of the ₹1,000 Cr club (2024-inflation-adjusted). The hot streak was 2013-2017: each year, a film crossed that bar: Dhoom 3, PK, Bajrangi Bhaijaan, Dangal, Secret Superstar. Since then, we never saw such a release except in 2023 (Jawan, Pathan). ...

How to review trending GitHub repos on VS Code

Here’s how I track trending GitHub repos each week. I run a scheduled script that saves a clean TSV I can scan fast. It uses uvx gtrending to fetch weekly trending repos for: Rust: High-quality system tools. (Anything in Rust seems cool.) Go: Reliable CLI/infra tools. (Like Rust, most Go code seems good.) Python: Most AI/ML stuff TypeScript: Most modern JS codebases JavaScript: Most front-end utilities Shell: Productivity scripts I pipe results through jq to extract: ...

Vibe Shopping

I’ve started vibe shopping, i.e. using ChatGPT to shop for small, daily items and buying without verifying. For example: “A metal rack for the floor: at least 2 ft * 1 ft * 2 ft, small gaps, popular options on Amazon.in.” https://chatgpt.com/share/68d61d68-7040-800c-936b-354749539308 “An optical wired mouse that’s smaller than usual, 4*+, popular, Prime-eligible for Chennai by the weekend on Amazon.in.” https://chatgpt.com/share/68d61e0d-420c-800c-bc71-821b9f9296a9 The best use is when I don’t know the right terms. In this case, the terms were wire rack and mini mouse. ...

Tools in Data Science Sep 2025 edition is live: https://tds.s-anand.net/. Major update: a new AI-Coding section and fresh projects. I teach TDS at the Indian Institute of Technology, Madras as part of the BS in Data Science. Anyone can audit. The course is public. You can read the content and practice assessments. I fed the May 2025 term student feedback into The Sales Mind and asked: What are the top non-intuitive / surprising inferences? What are interesting observations? What are high impact actions? Full analysis: https://chatgpt.com/share/68cba081-afc0-800c-9da3-75222e84a499: summary, outliers, and action ideas. ...

The 10 sites I visit most often

Here are the 10 most frequent sites I use (based on Microsoft Edge’s home bar): ChatGPT. It replaced Google as my default knowledge source. I prefer it over Gemini, Claude, etc. because the app has good features (memory from past conversations, code interpreter, strong voice mode, remote MCP on web app, etc.) The OpenAI models have pros and cons, but the app features are ahead of competition. Gmail. It’s my work inbox. Interestingly, I check it more (and respond faster) than social channels (e.g. WhatsApp, Google Chat, LinkedIn). It also doubles up as my task queue. Prime Video. I mainly watch The Mentalist. Totally love Patrick Jane! Google AI Studio. Mostly for transcription. It’s better than Gemini on UI, ability to handle uploads, file-formats, etc. It’s also free (though the data is used for training.) My Talks page. I give 1-1.5 talks a week, mostly on AI/ML topics. I use Marp to render Markdown slides and publish it here. Google Chat. It’s Straive’s social channel. I can’t use it from my phone, so I log in only if I need to check if I missed something. LinkedIn. It’s where I post by default. I don’t use it for networking and only connect with people I’ve met and know well. YouTube. Mostly for movie clips over dinner. I occasionally watch educational content. Playground. LLM Foundry is Straive’s internal gateway to multiple model APIs (I built it). I use it to experiment with models, grab API keys, and demo LLMs to clients. Squoosh. I compress every image, every time. Mostly into WebP (hands-down the best format today), typically lossless with an 8-color palette, or lossy at ~0-10% quality for photos. That’s my current home row. It will change. But the reasons probably won’t: fast, simple, automatable, and practical (for me).

Voice coding is the new live coding

In Feb 2025 at PyConf Hyderabad, I tried a new slide format: command-line slideshows in bash. I’ve used this format in more talks since then: LLMs in the CLI, PyCon Singapore, Jun 2025 Agents in the CLI, Singapore Python User Group, Jul 2025 DuckDB is the new Pandas, PyCon India, Sep 2025 It’s my favorite format. I can demo code without breaking the presentation flow. It also draws interest. My setup was the top question in my PyCon talk. ...

AfterSlides: Write Slides After Talks

25 years ago, Mr. Krishnan (IAS) amused us with anecdotes of bureaucrats writing meeting minutes before the meeting. This week, I flipped that. I wrote slides after the talk. I call them AfterSlides. Why. I ran a couple of Ask-Me-Anything (AMA) sessions where the audience set the agenda. I learned their interests. They got answers. No slides prepared. How. I okayed recording with the organizers, recorded on my phone, transcribed with Gemini, and asked ChatGPT to generate the AfterSlides. ...

Turning Generic Gifts Into Joy with AI

In 2001, I received a campus interview invitation from BCG. It opened like this: Dear Anand, We’d like to invite you to an interview on … We were impressed by your … … and went on to share 2-3 phrases about what they liked about my CV. A dozen of us got similar letters – each personalized! That was cool. Two decades later, I still remember it. It showed care and competence – care enough to personalize for each candidate, competence to pull it off at scale across campuses. ...

Tomorrow, we’ll be vibe-analyzing data at a Hasgeek Fifth Elephant workshop. It’s a follow-up to my DataHack Summit talk “RIP Data Scientists”. I showed how it’s possible to automate many data science tasks. In this workshop, the audience will be doing that. Slides: https://sanand0.github.io/talks/2025-09-16-vibe-analysis/ (minimal because… well, it’s “vibe analysis”. We’ll code as we go.) Here are datasets I’ll suggest to the audience: India Census 2011: https://www.kaggle.com/datasets/danofer/india-census MovieLens movies: https://grouplens.org/datasets/movielens/32m/ IMDb movies: https://datasets.imdbws.com/ Occupational Employment and Wage Statistics (OEWS): https://www.bls.gov/oes/tables.htm Global AI Job Market & Salary Trends 2025: https://www.kaggle.com/datasets/bismasajjad/global-ai-job-market-and-salary-trends-2025 Flight Delay Dataset: https://www.kaggle.com/datasets/shubhamsingh42/flight-delay-dataset-2018-2024 London House Price Data: https://www.kaggle.com/datasets/jakewright/house-price-data Exchange Rates to USD: https://www.kaggle.com/datasets/robikscube/exhange-rates-to-usd-from-imforg-updated-daily Thailand Road Accidents (2019-202): https://www.kaggle.com/datasets/thaweewatboy/thailand-road-accident-2019-2022 … but if you’d like stories from any interesting recent datasets (10K - 10M rows, easy-to-download), please suggest in the comments. 🙏 ...

GPT-5 (Codex) follows instructions exactly as given. Usually a good thing, but sometimes, it this is what happens. AGENTS.md: ALWAYS WRITE TESTS before coding. Codex: Let me begin with the tests. (Spends 5 minutes writing tests.) Anand: Stop! This is a proof of concept. We don’t need tests! AGENTS.md: Write tests before coding. Drop tests for proof-of-concepts. Codex: (Proceeds to delete all existing tests.) Anand: STOP! We need those tests! ...

I use LLMs to create photos and comics. But they can generate any kind of illustration. So why limit ourselves? My problem is imagination: I know little about art. So, I asked ChatGPT, Claude, and DeepSeek: Suggest 10 unusual illustration styles that are not popular in social media yet but are visually striking. I would like to have an LLM create images in that style. For each of those, show me an (and link to) an online image in that style. ...

My Tools in Data Science course uses LLMs for assessments. We use LLMs to Suggest project ideas (I pick), e.g. https://chatgpt.com/share/6741d870-73f4-800c-a741-af127d20eec7 Draft the project brief (we edit), e.g. https://docs.google.com/document/d/1VgtVtypnVyPWiXied5q0_CcAt3zufOdFwIhvDDCmPXk/edit Propose scoring rubrics (we tweak), e.g. https://chatgpt.com/share/68b8eef6-60ec-800c-8b10-cfff1a571590 Score code against the rubric (we test), e.g. https://github.com/sanand0/tds-evals/blob/5cfabf09c21c2884623e0774eae9a01db212c76a/llm-browser-agent/process_submissions.py Analyze the results (we refine), e.g. https://chatgpt.com/share/68b8f962-16a4-800c-84ff-fb9e3f0c779a This changed our assessments process. It’s easier and better. Earlier, TAs took 2 weeks to evaluate 500 code submissions. In the example above, it took 2 hours. Quality held up: LLMs match my judgement as closely as TAs do but run fast and at scale. ...

Slides for my DataHack Summit talk (controversially) titled RIP Data Scientists are at https://sanand0.github.io/talks/2025-08-21-rip-data-scientists/ Summary: as data scientists we explore, clean, model, explain, deploy, and anonymize datasets. I live-vibe-coded each step with DGCA data in 35 minutes using ChatGPT. Of course, it’s the tasks that are dying, not the role. Data scientists will leverage AI, differentiate on other skills, and move on. But the highlight was an audience comment: “I’m no data scientist. I’m a domain person. I’ll tell you all this: If you don’t follow these practices, you won’t have a job with me!” ...