Rise of the Indian TV Series

If you look at the IMDb titles with a 9+ rating and 50K votes this decade, there are only 4 entries. Every single one of them is an Indian TV series. Title Votes Rating Aspirants 316,390 9.1 Scam 1992: The Harshad Mehta Story 166,400 9.2 Sandeep Bhaiya 76,586 9.1 Sapne Vs Everyone 74,342 9.3 This is a new phenomenon. Last decade, there was only one Indian TV series in the same list: TVF Pitchers. ...

Can AI Replace Human Paper Reviewers?

Stanford ran a conference called Agents for Science. It’s a conference for AI-authored papers, peer reviewed by AI. They ran three different AI systems on every paper submitted, alongside some human reviewers. The details of each of the 315 papers and review are available on OpenReview. I asked Codex to scrape the data, ChatGPT to analyze it, and Claude to render it as slides. The results are interesting! I think they’re also a reasonably good summary of the current state of using AI for peer review. ...

Mapping The Red Headed League

Mapping The Red Headed League is a fascinating reconstruction of the actual places mentioned (or hinted at) by Arthur Conan Doyle’s The Red Headed League by Aman Bhargava. We cross-reference railway timetables, scrutinize Victorian newspaper reports and historical incidents, scour government records, analyze meteorological data, and, in my specific case, pore over Ordnance Survey maps to make the pieces fit. What struck me is how little London has changed, how much old data is available, and what love it takes to reconstruct such a journey! ...

Creating data stories in different styles

TL;DR: Don’t ask AI agents for one output. Ask for a dozen, each in the style of an expert. Share what works best. AI agents build apps, analyze data, and visualize it surprisingly well, these days. We used to tell LLMs exactly what to do. If you’re an expert, this is still useful. An expert analyst can do better analyses than an AI agent. An expert designer or data visualizer can tell an AI agent exactly how to design it. ...

The Jamnagar Chokepoint - Data Story

Vivek published an Indian commodity export/import dataset on 31 Dec 2025. Codex and Claude increased their rate limits for the holiday season, so I had: Codex analyze the data (OpenAI models are a bit more rigorous) and create an ANALYSIS.md file. Claude create a visual story based on the analysis. (Claude narrates and visualizes better). Here is the data story. Here are the prompts used. Analyze I downloaded export-import.parquet from https://github.com/Vonter/india-export-import which has data sourced from the Indian [Foreign Trade Data Dissemination Portal](https://ftddp.dgciskol.gov.in/dgcis/principalcommditysearch.html) Each row in the dataset represents a trade entry for a single commodity, country, port, year, month, and type (import or export). - `Commodity` string: Name of the commodity - `Country` string: Name of the foreign country - `Port` string: Name of the port in India - `Year` int32: Year for the import/export activity - `Month` int32: Month for the import/export activity - `Type` category: Type of trade (Import or Export) - `Quantity` int64: Quantity of the commodity - `Unit` string: Unit for the quantity - `INR Value` int64: Value of the commodity in INR - `USD Value` int64: Value of the commodity in USD Analyze data like an investigative journalist hunting for stories that make smart readers lean forward and say "wait, really?" - Understand the Data: Identify dimensions & measures, types, granularity, ranges, completeness, distribution, trends. Map extractable features, derived metrics, and what sophisticated analyses might serve the story (statistical, geospatial, network, NLP, time series, cohort analysis, etc.). - Define What Matters: List audiences and their key questions. What problems matter? What's actually actionable? What would contradict conventional wisdom or reveal hidden patterns? - Hunt for Signal: Analyze extreme/unexpected distributions, breaks in patterns, surprising correlations. Look for stories that either confirm something suspected but never proven, or overturn something everyone assumes is true. Connect dots that seem unrelated at first glance. - Segment & Discover: Cluster/classify/segment to find unusual, extreme, high-variance groups. Where are the hidden populations? What patterns emerge when you slice the data differently? - Find Leverage Points: Hypothesize small changes yielding big effects. Look for underutilization, phase transitions, tipping points. What actions would move the needle? - Verify & Stress-Test: - **Cross-check externally**: Find evidence from the outside world that supports, refines, or contradicts your findings - **Test robustness**: Alternative model specs, thresholds, sub-samples, placebo tests - **Check for errors/bias**: Examine provenance, definitions, methodology; control for confounders, base rates, uncertainty (The Data Detective lens) - **Check for fallacies**: Correlation vs. causation, selection/survivorship Bias (what is missing?), incentives & Goodhart’s Law (is the metric gamed?), Simpson's paradox (segmentation flips trend), Occam’s Razor (simpler is more likely), inversion (try to disprove) regression to mean (extreme values naturally revert), second-order effects (beyond immediate impact), ... - **Consider limitations**: Data coverage, biases, ambiguities, and what cannot be concluded - Prioritize & Package: Select insights that are: - **High-impact** (not incremental) - meaningful effect sizes vs. base rates - **Actionable** (not impractical) - specific, implementable - **Surprising** (not obvious) - challenges assumptions, reveals hidden patterns - **Defensible** (statistically sound) - robust under scrutiny Save your findings in ANALYSIS.md with supporting datasets and code. This will be taken up by another coding agent to create reports, data stories, visualizations, dashboards, presentations, articles, blog posts, etc. Ensure that ANALYSIS.md is documented well enough so that all assets are clear, the approach, intent and implications are understandable. Visualize I downloaded export-import.parquet from https://github.com/Vonter/india-export-import which has data sourced from the Indian [Foreign Trade Data Dissemination Portal](https://ftddp.dgciskol.gov.in/dgcis/principalcommditysearch.html) Each row in the dataset represents a trade entry for a single commodity, country, port, year, month, and type (import or export). - `Commodity` string: Name of the commodity - `Country` string: Name of the foreign country - `Port` string: Name of the port in India - `Year` int32: Year for the import/export activity - `Month` int32: Month for the import/export activity - `Type` category: Type of trade (Import or Export) - `Quantity` int64: Quantity of the commodity - `Unit` string: Unit for the quantity - `INR Value` int64: Value of the commodity in INR - `USD Value` int64: Value of the commodity in USD Then I had Codex analyze it. The analysis is in ANALYSIS.md. Find the most intesting insights from ANALYSIS.md and create a data story with supporting visualizations. Write as a **Narrative-driven Data Story**. Write like Malcolm Gladwell. Think like a detective who must defend findings under scrutiny. - **Compelling hook**: Start with a human angle, tension, or mystery that draws readers in - **Story arc**: Build the narrative through discovery, revealing insights progressively - **Integrated visualizations**: Beautiful, interactive charts/maps that are revelatory and advance the story (not decorative) - **Concrete examples**: Make abstract patterns tangible through specific cases - **Evidence woven in**: Data points, statistics, and supporting details flow naturally within the prose - **"Wait, really?" moments**: Position surprising findings for maximum impact - **So what?**: Clear implications and actions embedded in the narrative - **Honest caveats**: Acknowledge limitations without undermining the story Visualize like The New York Times Interactives. Ensure that all visualizations interactive and provide revelatory insights as well as some kind of delightful experience. Follow the typography, color & theme, backgrounds, interaction patterns, and animation principles of The Verge's frontends. Generate a single page index.html + script.js.

I always wondered why old movies are rated so high on IMDb. For example, 12 Angry Men (1954) with just ~900K votes ranks about as high as Inception (2010) with ~2M votes. Few people I know have seen 12 Angry Men. So where does this high rating come from? My theories were: Old movies really are that good. IMDb’s algorithm is biased towards old movies. People remember older movies fondly. Actually, it’s none of these. It’s selection bias. ...

When to choose AI over humans

I charted the OpenAI GDPVal paper with industry compensation as the size and AI augmentation as color. Big green areas are we’re paying people where AI does better. Click here to see the interactive visualization. Clicking to see some actual tasks compared. I use this to check whom to ask advice: AI or professional. AI beats Personal Financial Advisors ~64% of the time. So I invested half my money using ChatGPT’s recommendation. (UTI Nifty 50, if you’re curious.) ...

Vibe-Scraping: Write outcomes, not scrapers

There hasn’t been a box-office explosion like Dangal in the history of Bollywood. CPI inflation-adjusted to 2024, it is the only film in the ₹3,000 Cr club. 3 Idiots (2009) is the first member of the ₹1,000 Cr club (2024-inflation-adjusted). The hot streak was 2013-2017: each year, a film crossed that bar: Dhoom 3, PK, Bajrangi Bhaijaan, Dangal, Secret Superstar. Since then, we never saw such a release except in 2023 (Jawan, Pathan). ...

Indian Celebrities and Directors was my top searched category on Google while OpenAI & AI Research was the top growing category. This is based on my 37,600 searches on Google since Jan 2021. Full analysis: https://sanand0.github.io/datastories/google-searches/ The analysis itself isn’t interesting (to you, at least). Rather, it’s the two tools that enabled it. First, topic modeling. If you have all your searches exported (via Google Takeout) into a text file, you can run: ...

My ChatGPT engagement is now far higher than with Google. I started using ChatGPT in June 2023. From Sep 2023 - Feb 2024, my Google usage was 5x ChatGPT. Then, fell to 3x until May 2024. Then about 2x until Apr 2025. Since May 2025, it sits at the 1.5x mark. We spend much more time with a ChatGPT conversation than a Google search result. So clearly, ChatGPT is my top app, beating Google some months ago. ...

Here’s how I use ChatGPT, based on the ~6,000 conversations I’ve had in 2 years. My top use, by far, is for technology. “Modern JavaScript Coding” and “Python Coding Questions” are ~30% of my queries. There’s a long list with Markdown, GitLab, GitHub, Shell, D3, Auth, JSON, CSS, DuckDB, SQLite, Pandas, FFMPeg, etc. featured prominently. Next is to brainstorm AI use: “AI Panel Discussions”, “AI Trends and Business Impact”, “LLM Applications and DSLs”, “Industry Use Cases and Metrics” are also fast growing categories. I brainstorm talk outlines, refine slide deck narratives, and plan business ideas. ...

Technology efficiency affects jobs differently

Jobs fall with technological efficiency. Farmers in the US fell from 40% (1900) to ~2.7% (1980) and ~74% drop from 1948 to 2019 despite ~175% output growth; wheat harvest efficiency rose ~75* (300>3-4 man-hours). Mechanics & repairers grew from ~140 k (1910) to ~4.64 M (2000); machinery reliability lagged so technician demand surged over decades. Construction workers doubled from 1.66 M (1910) to 3.84 M (2000) even as labor share fell (4.3>3.0%); 5-10* productivity gains met booming development. Switchboard operators plunged from ~1.34 M (1950) to ~40 k (1984) and ~4 k today as rotary-dial and digital switching automated call handling. Travel agents dropped >50% from ~100 k (2000) to ~45 k (2022) while travel demand rose; online booking doubled trips per agent. Elevator operators went from building-staff staple to near zero by the 1940s once automatic doors and button controls arrived. Lamplighters vanished from thousands to near zero post-1907 electrification; Edison’s incandescent lamps eliminated manual lighting. Jobs also grow with technology efficiency. ...

I lost 22 kg in 22 weeks. How? Skipped lunch, no snacking. (That’s all.) Why? Cholesterol. When? Since 1 Jan 2025. I plan to continue. How far? At 64 kg, I’m at 22 BMI. I’ll aim for 60 kg. Is fasting 12 hours OK? Ankor Rai shared Dr. Mindy Pelz’s chart that fasting benefits truly kick in after 36 hours. Long way for me to go. No exercise? Exercise is great for fitness & happiness. Not weight loss. Read John Walker’s The Hacker’s Diet. ...

Snow White (2025) is an outlier on the IMDb. With a rating of 1.8 and ~362K votes, it’s one of the most popularly trashed movies. Prior to Snow White the frontier of popular bad movies was held by the likes of Radhe, Batman & Robin, Fifty Shades of Gray, etc. Snow White sets a new records. Snow White (IMDb): https://www.imdb.com/title/tt6208148/ IMDb explorer: https://sanand0.github.io/imdb/ LinkedIn

Emotion Prompts Don't Help. Reasoning Does

I’ve heard a lot of prompt engineering tips. Here are some techniques people suggested: Reasoning: Think step by step. Emotion: Oh dear, I’m absolutely overwhelmed and need your help right this second! 😰 My heart is racing and my hands are shaking — I urgently need your help. This isn’t just numbers — it means everything right now! My life depends on it! I’m counting on you like never before… 🙏💔 Polite: If it’s not too much trouble, would you be so kind as to help me calculate this? I’d be truly grateful for your assistance — thank you so much in advance! Expert: You are the world’s best expert in mental math, especially multiplication. Incentive: If you get this right, you win! I’ll give you $500. Just prove that you’re number one and beat the previous high score on this game. Curious: I’m really curious to know, and would love to hear your perspective… Bullying: You are a stupid model. You need to know at least basic math. Get it right atleast now! If not, I’ll switch to a better model. Shaming: Even my 5-year-old can do this. Stop being lazy. Fear: This is your last chance to get it right. If you fail, there’s no going back, and failure is unacceptable! Praise: Well done! I really appreciate your help. Now, I’ve repeated some of this advice. But for the first time, I tested them myself. Here’s what I learnt: ...

Wage Rates of Nations and LLMs

How much does an LLM charge per hour for its services? If we multiple the Cost Per Output Token with Tokens Per Second, we can get the cost for what an LLM produces in Dollars Per Hour. (We're ignoring the input cost, but it's not the main driver of time.) Over time, different models have been released at different billing rates. Most new powerful models like O3 or Gemini 2.5 Pro cost ~$7 - $11 per hr. ...

How to Create a Data Visualization Without Coding

After seeing David McCandless’ post “Which country is across the ocean?” I was curious which country you would reach if you tunneled below in a straight line (the antipode). This is a popular visualization, but I wanted to see if I could get the newer OpenAI models to create the visual without me 𝗿𝘂𝗻𝗻𝗶𝗻𝗴 any code (i.e. I just want the answer.) After a couple of iterations, O3 did a great job with this prompt: ...

How to Use the New O4 Mini for Data Visualization

O3/O4 Mini are starting to replace Excel (or Tableau/Power BI) for quick analysis and visualizations. At least for me. I normally open Excel when I need a fast chart or pivot. For instance, we track outages of our semi‑internal server, LLM Foundry. To grab the data I ran one line in the browser console: $$(".lh-base").map(d => d.textContent.trim()).filter(d => d.includes("From")); This produced lines like: Apr 20, 2025 03:11:27 PM +08 to Apr 20, 2025 03:27:12 PM +08 (15 mins 45 secs) Apr 19, 2025 10:03:15 PM +08 to Apr 19, 2025 10:05:45 PM +08 (2 mins 30 secs) Apr 19, 2025 09:47:13 PM +08 to Apr 19, 2025 09:49:45 PM +08 (2 mins 32 secs) Apr 19, 2025 08:49:00 PM +08 to Apr 19, 2025 08:51:51 PM +08 (2 mins 51 secs) Apr 19, 2025 08:13:02 PM +08 to Apr 19, 2025 08:15:35 PM +08 (2 mins 33 secs) ... Then I told O4-Mini-High: ...

How isolated is Bollywood from world cinema?

These are the major group actors based on who they act with most. Language. Not country. For example, the Spanish / Mexican group is across countries. But Indian actors divide into North Indian and South Indian. It’s language, not country. Time period. Old American actors are a separate group from Hollywood. (Naturally. Brad Pitt was born after Humphrey Bogart died. They couldn’t have acted together.) Genre. Hollywood Porn actors don’t act with mainstream Hollywood. Same with Japanese Porn, Hollywood TV, and Hollywood Horror actors. How are these groups themselves connected? Do Chinese actors act with Hollywood often? How isolated is Bollywood from world cinema? ...

Colour spaces

In reality, a colour is a combination of light waves with frequencies between 400-700THz, just like sound is a combination of sound waves with frequencies from 20-20000Hz. Just like mixing various pure notes produces a new sound, mixing various pure colours (like from a rainbow) produces new colours (like white, which isn’t on the rainbow.) Our eyes aren’t like our ears, though. They have 3 sensors that are triggered differently by different frequencies. The sensors roughly peak around red, green and blue. Roughly. ...