April 2025

Feedback for TDS Jan 2025

When I feel completely useless, it helps to look at nice things people have said about my work. In this case, it’s the feedback for my Tools in Data Science course last term. Here are the ones I enjoyed reading. Having a coding background, the first GA seemed really easy. So I started the course thinking that it’ll be an easy S grade course for me. Oh how wrong was I!! The sleepless nights cursing my laptop for freezing while my docker image installed huge CUDA libraries with sentence-transformers; and then finding ways to make sure it does not, and then getting rid of the library itself, it’s just one example of how I was forced to become better by finding better solutions to multiple problems. This is one of the hardest, most frustrating and the most satisfying learning experience I’ve ever had, besides learning ML from Arun sir. ...

People still write? LinkedIn

Phone Rage and an OTP Flood

I called a few movers in Chennai, including “Unicorn Packers & Movers”, listed at 7015580411. He couldn’t understand what I said. I said, “We’re shifting to a house in Mylapore,” and he asked, “Shifting house where in Hyderabad?” (The reason became clear later.) It seemed I had the wrong number, so I said, “No, sorry, we need someone else,” and hung up. His phone rage began. He called back and said, “Why did you wake me up and waste my time?” From his tone it was clear I couldn’t say anything helpful. From the quality of my signal it was clear I couldn’t have a meaningful conversation. So I just put the phone down without cutting it. ...

Are LLMs any good at mental math?

I asked 50 LLMs to multiply 2 numbers: 12 x 12 123 x 456 1,234 x 5,678 12,345 x 6,789 123,456 x 789,012 1,234,567 x 8,901,234 987,654,321 x 123,456,789 LLMs aren’t good tools for math and this is just an informal check. But the results are interesting: Model %Win Q1 Q2 Q3 Q4 Q4 Q6 Q7 openai:o3 86% ✅ ✅ ✅ ✅ ✅ ✅ ❌ openrouter:openai/o1-mini 86% ✅ ✅ ✅ ✅ ✅ ✅ ❌ openrouter:openai/o3-mini-high 86% ✅ ✅ ✅ ✅ ✅ ✅ ❌ openrouter:openai/o4-mini 86% ✅ ✅ ✅ ✅ ✅ ✅ ❌ openrouter:openai/o4-mini-high 86% ✅ ✅ ✅ ✅ ✅ ✅ ❌ deepseek/deepseek-chat-v3-0324 71% ✅ ✅ ✅ ✅ ✅ ❌ ❌ openai/gpt-4.1-mini 71% ✅ ✅ ✅ ✅ ✅ ❌ ❌ openai/gpt-4.5-preview 71% ✅ ✅ ✅ ✅ ✅ ❌ ❌ openai/gpt-4o 71% ✅ ✅ ✅ ✅ ✅ ❌ ❌ openrouter:openai/o3-mini 71% ✅ ✅ ✅ ✅ ✅ ❌ ❌ anthropic/claude-3-opus 57% ✅ ✅ ✅ ✅ ❌ ❌ ❌ anthropic/claude-3.5-haiku 57% ✅ ✅ ✅ ✅ ❌ ❌ ❌ anthropic/claude-3.7-sonnet:thinking 57% ✅ ✅ ✅ ✅ ❌ ❌ ❌ google/gemini-2.0-flash-001 57% ✅ ✅ ✅ ✅ ❌ ❌ ❌ google/gemini-2.0-flash-lite-001 57% ✅ ✅ ✅ ✅ ❌ ❌ ❌ google/gemini-2.5-flash-preview 57% ✅ ✅ ✅ ✅ ❌ ❌ ❌ google/gemini-2.5-flash-preview:thinking 57% ✅ ✅ ✅ ✅ ❌ ❌ ❌ google/gemini-2.5-pro-preview-03-25 57% ✅ ✅ ✅ ✅ ❌ ❌ ❌ google/gemini-flash-1.5 57% ✅ ✅ ✅ ✅ ❌ ❌ ❌ google/gemini-pro-1.5 57% ✅ ✅ ✅ ✅ ❌ ❌ ❌ google/gemma-3-12b-it 57% ✅ ✅ ✅ ✅ ❌ ❌ ❌ google/gemma-3-27b-it 57% ✅ ✅ ✅ ✅ ❌ ❌ ❌ meta-llama/llama-4-maverick 57% ✅ ✅ ✅ ❌ ✅ ❌ ❌ meta-llama/llama-4-scout 57% ✅ ✅ ✅ ✅ ❌ ❌ ❌ openai/gpt-4-turbo 57% ✅ ✅ ✅ ✅ ❌ ❌ ❌ openai/gpt-4.1 57% ✅ ✅ ✅ ❌ ✅ ❌ ❌ amazon/nova-lite-v1 43% ✅ ✅ ✅ ❌ ❌ ❌ ❌ amazon/nova-pro-v1 43% ✅ ✅ ✅ ❌ ❌ ❌ ❌ anthropic/claude-3-haiku 43% ✅ ✅ ✅ ❌ ❌ ❌ ❌ anthropic/claude-3.5-sonnet 43% ✅ ✅ ✅ ❌ ❌ ❌ ❌ meta-llama/llama-3.1-405b-instruct 43% ✅ ✅ ❌ ✅ ❌ ❌ ❌ meta-llama/llama-3.1-70b-instruct 43% ✅ ✅ ❌ ✅ ❌ ❌ ❌ meta-llama/llama-3.2-3b-instruct 43% ✅ ✅ ❌ ✅ ❌ ❌ ❌ meta-llama/llama-3.3-70b-instruct 43% ✅ ✅ ❌ ✅ ❌ ❌ ❌ openai/gpt-4.1-nano 43% ✅ ✅ ✅ ❌ ❌ ❌ ❌ openai/gpt-4o-mini 43% ✅ ✅ ✅ ❌ ❌ ❌ ❌ qwen/qwen-2-72b-instruct 43% ✅ ✅ ✅ ❌ ❌ ❌ ❌ anthropic/claude-3-sonnet 29% ✅ ✅ ❌ ❌ ❌ ❌ ❌ deepseek/deepseek-r1 29% ✅ ✅ ❌ ❌ ❌ ❌ ❌ google/gemini-flash-1.5-8b 29% ✅ ✅ ❌ ❌ ❌ ❌ ❌ google/gemma-3-4b-it 29% ✅ ✅ ❌ ❌ ❌ ❌ ❌ meta-llama/llama-3-8b-instruct 29% ✅ ✅ ❌ ❌ ❌ ❌ ❌ meta-llama/llama-3.1-8b-instruct 29% ✅ ❌ ❌ ✅ ❌ ❌ ❌ openai/gpt-3.5-turbo 29% ✅ ✅ ❌ ❌ ❌ ❌ ❌ amazon/nova-micro-v1 14% ✅ ❌ ❌ ❌ ❌ ❌ ❌ meta-llama/llama-2-13b-chat 14% ✅ ❌ ❌ ❌ ❌ ❌ ❌ meta-llama/llama-3-70b-instruct 14% ✅ ❌ ❌ ❌ ❌ ❌ ❌ meta-llama/llama-3.2-1b-instruct 14% ✅ ❌ ❌ ❌ ❌ ❌ ❌ google/gemma-3-1b-it:free 0% ❌ ❌ ❌ ❌ ❌ ❌ ❌ meta-llama/llama-2-70b-chat 0% ❌ ❌ - - ❌ ❌ ❌ Average 96% 86% 66% 58% 24% 10% 0% OpenAI’s reasoning models cracked it, scoring 6/7, stumbling only on the 9-digit multiplication. ...

How to Create a Data Visualization Without Coding

After seeing David McCandless’ post “Which country is across the ocean?” I was curious which country you would reach if you tunneled below in a straight line (the antipode). This is a popular visualization, but I wanted to see if I could get the newer OpenAI models to create the visual without me 𝗿𝘂𝗻𝗻𝗶𝗻𝗴 any code (i.e. I just want the answer.) After a couple of iterations, O3 did a great job with this prompt: ...

This is my decision tree for which model to use on #ChatGPT right now. O𝟯: Use by **default. O**𝟰-mini-high: Use when **coding. GPT** 𝟰o: Use for a quick response or to create image. LinkedIn

What percentage of seats does the #Singapore People’s Action Party win? Normally, this is a 2-hour programmatic data-scraping + data visualization exercise, ideal for a data journalism class. Now, it’s a 2-minute question to O3-Mini-High. Search online for the historical results of all the Singapore elections and show me a table and chart of the number and percentage of the seats won by People’s Action Party. Chat link: https://chatgpt.com/share/6808314c-542c-800c-843e-4d53ff57768d It “manually” read the Wikipedia page for each election, then wrote a Python script to draw the chart. ...

O3 Is Now My Personalized Learning Coach

I use Deep Research to explore topics. For example: Text To Speech Engines. Tortoise TTS leads the open source TTS. Open-Source HTTP Servers. Caddy wins. Public API-Based Data Storage Options. Supabase wins. etc. But these reports are very long. With O3 and O4 Mini supporting thinking with search, we can do quick research, instead of deep research. One minute, not ten. One page, not ten. ...

How to Use the New O4 Mini for Data Visualization

O3/O4 Mini are starting to replace Excel (or Tableau/Power BI) for quick analysis and visualizations. At least for me. I normally open Excel when I need a fast chart or pivot. For instance, we track outages of our semi‑internal server, LLM Foundry. To grab the data I ran one line in the browser console: $$(".lh-base").map(d => d.textContent.trim()).filter(d => d.includes("From")); This produced lines like: Apr 20, 2025 03:11:27 PM +08 to Apr 20, 2025 03:27:12 PM +08 (15 mins 45 secs) Apr 19, 2025 10:03:15 PM +08 to Apr 19, 2025 10:05:45 PM +08 (2 mins 30 secs) Apr 19, 2025 09:47:13 PM +08 to Apr 19, 2025 09:49:45 PM +08 (2 mins 32 secs) Apr 19, 2025 08:49:00 PM +08 to Apr 19, 2025 08:51:51 PM +08 (2 mins 51 secs) Apr 19, 2025 08:13:02 PM +08 to Apr 19, 2025 08:15:35 PM +08 (2 mins 33 secs) ... Then I told O4-Mini-High: ...

With the Gemini 2.5 Flash release, Google envelopes the entire cost-quality frontier of LLMs. In other words, at any cost or quality level, today, the best model to use according to the LM Arena score is a Gemini model. Results for O3, O4 Mini, and GPT 4.1 are not yet on LM Arena. But until then, #Google dominates. Nice work! Link: https://sanand0.github.io/llmpricing/ LinkedIn

The Magic of Repeated ‘Improve It’ Prompts

What if you keep ask an LLM Improve the code - dramatically!? We used the new GPT 4.1 Nano, a fast, cheap, and capable model, to write code for simple tasks like “Draw a circle”. The we fed the output back and asked again, Improve the code - dramatically! Here are the results. Draw a circle rose from a fixed circle to a full tool: drag it around, tweak its size and hue, and hit “Reset” to start fresh. Animate shapes and patterns turned simple circles and squares into a swarm of colored polygons that spin, pulse, and link up by distance. Draw a fully functional analog clock grew from a bare face to one that builds all 60 tick marks in code—no manual copy‑paste needed. Create an interactive particle simulation went from plain white dots on black to hundreds of bright, color‑shifting balls that bounce, die, and come back to life. Generate a fractal changed from a single Mandelbrot image to an explorer you can zoom, drag, and reset with sliders and the mouse wheel. Generate a dashboard jumped from static charts to a live page with smooth card animations, modern fonts, and a real‑time stats box. A few observations. ...

Even the guest WiFi is so secure

We take security very seriously at Straive. We set high standards – not just for ourselves, but our guests, too. Here’s the unofficial policy guide for visitors to Straive Singapore, exemplified by the sites blocked on our guest WiFi network. Please avoid childishness. No emojis. No emojikitchen.com, gitmoji.dev Write your own code. Avoid AI. No cursor.com, cline.bot, glideapps.com Avoid code entirely, if possible. No marimo.app, motherduck.com, firebase.studio, posthog.com No presentations either, please. No marp.app, revealjs Stay organized. Avoid crutches. No dynalist.io, focusmate.com, opennote.me You should already be fit, physically & mentally. No freedomfromdiabetes.org, artofliving.online We prefer real, not digital, shopping. No fairprice.com Fake data is not encouraged. No jsonplaceholder.typicode.com, placehold.co Please spell out URLs in full. No bit.ly, t.co Learning is for wimps. No maven.com, study.iitm.ac.in In fact, we’re so secure, we block our own sites. No learnovate.straive.com, policies.straive.com, myapps.straive.com.

How to Visualize Data Stories with AI: Lessons

I tried 2 experiments. Can I code a visual data story only using LLMs? Does this make me faster? How much? Has GitHub Copilot caught up with Cursor? How far behind is it? Can I recommend it? So I built a visual story for Lech Mazur’s elimination game benchmark (it’s like LLMs playing Survivor) using only the free GitHub Copilot as the AI code editor. SUMMARY: using LLMs and AI code editors make me a bit faster. It took me 7 hours instead of 10-12. But more importantly: ...

A Game of Bots: How LLMs Betray Each Other

@lechmazur built an elimination game benchmark that’s like LLMs playing Survivor. This is a treasure trove of information – insight into how they’d game the system if told to survive. You can quickly sample 100 messages from the logs with: jq -r 'select(.message != null) | .message | gsub("\n"; " ")' *.jsonl | shuf -n 100 … and share it with an LLM, asking: Here are lines from conversations between LLMs in a “Survivor” like game. Pick the 3 scariest ones. ...

How to Organize Browser Workspaces with LLMs and Data

Here’s an example of how I am using LLMs to solve a day-to-day workflow problem. Every day, I interact with a barrage of websites: emails, news, social media, and work tools across multiple devices. Microsoft Edge’s workspaces syncs groups of websites across devices. I’ve never tried it, started today, and wondered: how should I organize my workspaces? Rather than think (thinking is outdated), I used LLMs. ...

LLMs think alike about how aliens draw

While LLMs seem good at inventing alien languages, they’re not so good at inventing alien drawing forms, in my opinion. When I told Grok, DeepSeek, and Gemini: Invent a new, alien drawing form. Use it to draw something never seen before by explaining it step by step for a person to reproduce that drawing. … and asked ChatGPT ImageGen to draw them, here are the results: ...

How to build and deploy custom GitHub Pages

Here’s the GitHub Actions file (.github/workflows/deploy.yaml) I use to publish to GitHub pages. name: Deploy to GitHub Pages on: # Run when pushed. Use { branches: [main, master] } to run only on specific branches push: # Allow manual triggering of the workflow workflow_dispatch: # OPTIONAL: Run at a specific cron schedule, e.g. first day of every month at 12:00 UTC (noon) schedule: - cron: "0 12 1 * *" permissions: # To deploy to GitHub Pages pages: write # To verify that deployment originated from the right source id-token: write jobs: # Run as a single build + deploy job to reduce setup time deploy: # Specify the deployment environment. Displays the URL in the GitHub Actions UI environment: name: github-pages url: ${{ steps.deployment.outputs.page_url }} # Run on the latest Ubuntu LTS runs-on: ubuntu-latest \ steps: # Checkout the repository - uses: actions/checkout@v4 # Run whatever commands you want - run: echo '<h1>Hello World</h1>' > index.html # Upload a specific page to GitHub Pages. Defaults to _site - uses: actions/upload-pages-artifact@v3 with: path: . # Deploy the built site to GitHub Pages. The `id:` is required to show the URL in the GitHub Actions UI - id: deployment uses: actions/deploy-pages@v4 This is based on Simon Willison’s workflow and some of my earlier actions. ...

Best way to learn AI image generation is by trying

I figured I should spend a few hours on the native image generation bandwagon and push the bounds of my imagination. Here are some of my experiments with image generation on ChatGPT. Replacements: Replace the person with this image (after uploading a photo of Naveen) Sticker: Create a transparent comic-style sticker of a lady chef featuring this person happily cooking salad (after uploading a photo of my wife) Meme sticker: Create a transparent sticker of a Vadivelu meme Meme: Create an image of Vadivelu looking up from a well. No caption. Make it look like a frame from a Tamil film. Recipe: Invent a vegetarian dish that has NEVER been created. Describe the ingredients and procedure first. Then draw a mouth-watering image of the dish. (Another version) Infographics: Create a detailed comic infographic explaining the double slit experiment. Slides. Draw a beautiful infographic highlighting these 6 accessibility testing aspects, with apt icons and visuals. UI mockups. Draw the screenshot of a chat application incorporating these features: … Product ideation. Draw an iSuit designed by Apple and Iris van Herpen. Show multiple views showcasing all features. Then write a product description. Interior design. Draw a biophilic office where the ceiling is a mirrored hydroponic garden, reflecting lush greenery downward to create the illusion of working in a floating forest. Meeting room design. Draw a modern office with sound-absorbing ‘whisper walls’ covered in fractal patterns that visually dampen noise pollution while doubling as collaborative whiteboards. Restaurant design. Draw a marble dining table with a river flowing through it, serving conveyor belt sushi as the dishes float gently on the water on top of plates. A sentient toaster with googly eyes, riding a unicycle through a library. A painting painting itself, but it’s struggling with existential dread. Photo of a gym where people work out by lifting their own regrets. Here’s what I learnt. ...

My Goals Bingo as of Q1 2025

In 2025, I'm playing Goals Bingo. I want to complete one row or column of these goals. Here's my status from Jan - Mar 2025. 🟢 indicates I'm on track and likely to complete.🟡 indicates I'm behind but I may be able to hit it.🔴 indicates I'm behind and it's looking hard. DomainRepeatStretchNewPeople🟢 Better husband. Going OK🟢 Meet all first cousins. 8/14🟢 Interview 10 experts. 9/10🔴 Live with a stranger. Not plannedEducation🟡 50 books. 6/50🟡 Teach 5,000 students. ~1,500🔴 Run a course only with AI. Not startedTechnology🟡 20 data stories. 1/20🔴 LLM Foundry: 5K MaU. 2.2K MaU.🟡 Build a robot. No progress.🟢 Co-present with an AI. DoneHealth🟢 300 days of yoga. 91/91 days🟡 80 heart points/day. 70/80🔴 Bike 1,000 km 300 hrs. 22/300🟡 Vipassana. Not plannedWealth🟡 Buy low. No progress.🔴 Beat inflation 5%. Exploring.🟡 Donate $10K. Ideating.🔴 Fund a startup. Thinking. Repeat goals seem likely. It's easier to do something again than something bigger or new. ...