Author name: S Anand

Feedback for TDS Jan 2025

When I feel completely useless, it helps to look at nice things people have said about my work.

In this case, it’s the feedback for my Tools in Data Science course last term. Here are the ones I enjoyed reading.


Having a coding background, the first GA seemed really easy. So I started the course thinking that it’ll be an easy S grade course for me. Oh how wrong was I!!

The sleepless nights cursing my laptop for freezing while my docker image installed huge CUDA libraries with sentence-transformers; and then finding ways to make sure it does not, and then getting rid of the library itself, it’s just one example of how I was forced to become better by finding better solutions to multiple problems. This is one of the hardest, most frustrating and the most satisfying learning experience I’ve ever had, besides learning ML from Arun sir.

@s.anand sir did a tremendous job of designing this course. I feel this course ties every other course together by applying learnings from each course and building further upon them. I learnt development in python and APIs in MAD-1, JS in MAD-2, bash scripting in SC, SQL in DBMS, data structures in PDSA and TDS had applications of each. In MAD projects, we made an app and submitted it as zip file. Here we went on to learn to containerise our app and deploy and also make the process automated. Our apps combined python and shell scripting along with the power of LLMs. Yeah the projects did seem extremely overwhelming but the learning is invaluable.

@Jivraj @Saransh_Saini @Carlton I’m sure it wasn’t an easy task being the TA of this course. Thanks for all the help throughout the term with the Mock ROEs, project sessions and clearing doubts in discourse and live sessions.

I still got an S grade but it was a very hard-fought S with the help of some luck. I’d love to follow the changes this course goes through in the upcoming terms.


This course was the most exciting and memorable of all the courses I have taken so far (actually of all the courses in the diploma level, I have completed all 🙂 by now ). I had the technical background and as @22f3000819 mentioned I thought it was going to be an easy ride but I was wrong. It was challenging as well as fun at the same time.

The best part of the course is that it is open internet and we are free to explore, use LLMs and get the things done. This way of learning to me is the most effective. I have the guide and now I can devise how I should approach the things.

Everything ROE, Project 1, Project 2 was quite challenging and fun at the same time. Projects were the best part of the course. I learned how things can go wrong when we deploy it. I collaborated with @ItsMeAlex, @trebhuvansb, @23f1002382 and @22f3002933 for Project 2.
My Project 2 repo even got 17 forks and 4 stars and scored 20/20 on evaluation. This adventure ended with the relaxing end term paper and an overall “S” grade.

Also I must say the support team of TDS is the best so far in the program. I had conversation with @Jivraj, @Saransh_Saini and @Carlton sir and they are super helpful. I would really miss these sessions. I attended the live sessions by @s.anand sir and sir’s take on the questions were really insightful and even made me think differently than my perspective.

When we do something different it ought to have challenges both operational and technical but the TDS support team tried to give its best. Many a times I too had raised concerns but the team tried to resolve it to their best capabilities.

At last I would like to say, without any hesitation, I had experienced the best part of the program till now in the last 3-4 months. Hoping to meet the team and my fellow peers in person during the Paradox 🙂.


My TDS Journey
“A Tale of Bugs, Chaos & Miraculous Comebacks” 😅📊🔥

So, I did something bold (or reckless, depends on how you see it)—took the TDS (Tools in Data Science) course right in my first term. Why? Because curiosity got the better of me, and I thought, “Eh, GA1 went fine, how hard could it be?” (Spoiler: Very hard.) Not a decision I recommend unless you’re into academic masochism. 😅

The first few modules hit me like three courses merged into one 📚. Maybe because I was totally unprepared, maybe because TDS has a secret pact with chaos.
Did all 7 GA …it was very time consuming but fun too …learnt new tools with every question. But hey, the live lectures were fantastic 🎤, the modules were well-structured 🏗️, and thank goodness for the lifesaving Discourse forum and group, where the real MVPs reside 🏆.

Then came Project 1. Ah yes, my nemesis. I’d write code, it would run perfectly, then—bam! A random error just to keep things from getting doable⚠️. Was it 404? 500? Who knows! I basically developed muscle memory for every error code in existence 💻🤯. The grand result? 2/20. At this point, my grade looked like it had fallen off a cliff 🏔️, and I wasn’t sure if I could salvage an B—or even a C, honestly… with -18 in my balance due to P1 and a trauma for P2 I couldn’t have expected more… no?

Entered the TA sessions and peer-powered survival squad 🛡️—my saving grace. ROE, the supposedly impossible exam, where I thought even a 50% would be a miracle?..got 100%. Same for Project 2—100%. Even after initially refusing to believe it, reality set in, and my **end-term score also landed around 90 ** (thanks to the PYQs and those 300 github ques…definitely recommending the next term student to solve and understand all the pyqs fr), then wrapping the final score with some gift wrappers of a few bonus marks… this deal got sealed with a confident A …

.🥁…drumrolls…🥁

An 🎉A. 🎉

This course was brutal, but in the best way possible 💡. It threw me into battles with LLMs, never-ending questions, and existential crises over debugging 🔍. But now, I have actual skills (and scars) that I’ll carry into my future projects of MAD1, MAD2, MLP, and BDM which will be a whole new experience🚀.

Looking back, the syllabus was a wild mix of survival skills and superpowers—building models, deploying them, wrestling with LLMs, and somehow convincing messy data to behave. And just when we thought we had conquered it all, Project 1 pulled us into the abyss. Now, with Data Analysis and Visualization still in the works, the adventure wasn’t over. Because what’s data science without some surprises and flashy charts to flex? 😆📊

For future students? Take it later in the diploma, enjoy the mad ride, and remember—even if you bomb a project, there can always be a comeback. 🏆 TDS is tough, but resilience matters more than perfection. Even if you get 10/100(which I got) on a project, learn from it, adapt, and come back stronger.

Massive thanks to @s.anand sir for making this learning experience possible, and a special shoutout to @Carlton sir, @Saransh_Saini sir, and @Jivraj for patiently resolving our chaotic issues 🙏.
Your guidance made a challenging journey both rewarding and insightful. Grateful for everything!
TDS—difficult, enlightening, and totally worth it. 🤓✨

(This gracious furry benefactor granted us a deadline extension—P1 survivors can relate #thanks_to_carlton_sir, we know how crucial that was. So, signing off with a furry and really grateful thank you! 🐱 ⌛ 🎓)

Feedback for TDS Jan 2025 Read More »

Phone Rage and an OTP Flood

I called a few movers in Chennai, including “Unicorn Packers & Movers”, listed at 7015580411.

He couldn’t understand what I said. I said, “We’re shifting to a house in Mylapore,” and he asked, “Shifting house where in Hyderabad?” (The reason became clear later.)

It seemed I had the wrong number, so I said, “No, sorry, we need someone else,” and hung up.

His phone rage began. He called back and said, “Why did you wake me up and waste my time?” From his tone it was clear I couldn’t say anything helpful. From the quality of my signal it was clear I couldn’t have a meaningful conversation. So I just put the phone down without cutting it.

He called me again. I answered and left the phone on the table.

That’s when the phone rage began. He called me 5 times from 9467186002 and twice from 7015580411. I answered and left the phone on the table.

Then the SMS flood hit. Between 2:47 pm and 2:55 pm, he sent me 40 OTPs:

  • 6 from KukuFM: An Indian audio app for podcasts, audiobooks, and short lessons.
  • 5 from Hyundai: A global car maker that sells vehicles in India.
  • 4 from Zepto: A rapid-delivery startup for groceries and essentials.
  • 4 from ZestMoney: A finance app that lets you buy now and pay in EMI, no credit card needed.
  • 3 from Samsung Shop: Samsung’s online store for browsing and buying its electronics.
  • 3 from Confirmtkt: An online service for booking train tickets via IRCTC.
  • 3 from My11Circle: A fantasy-cricket game where you build virtual teams and win cash prizes.
  • 3 from Aakash Institute: A coaching center for medical and engineering entrance exams.
  • 2 from JioCinema: Jio’s OTT service with movies, shows, and originals.
  • 2 from NetMeds: An online pharmacy for medicines and health products.
  • 2 from AJIO: A fashion and lifestyle e-commerce site by Reliance Retail.
  • 2 from Jar: A savings app that rounds up your spend and saves the extra.
  • 1 from RummyCircle: An online rummy card game.

After a break, he tried again at 3:17 pm with less energy:

  • 1 from ZestMoney
  • 1 from Samsung Shop

To me, it was instructive. I learnt about KukuFM, Jar, Ajio, My11Circle and many others. I also know which services to annoy people with OTPs. If you want to prank-spam OTPs, here’s your hit list!

My apologies to the man I awoke — hope he gets some sleep. And thanks for teaching me something new!

PS: The featured image was generated using the prompt: Edit this image so that this photo of me looks at a smartphone as a surreal storm of OTP notifications rains down around me.

Here’s ChatGPT’s version of that. Yes, that’s vaguely like me.

Phone Rage and an OTP Flood Read More »

Are LLMs any good at mental math?

I asked 50 LLMs to multiply 2 numbers:

  1. 12 x 12
  2. 123 x 456
  3. 1,234 x 5,678
  4. 12,345 x 6,789
  5. 123,456 x 789,012
  6. 1,234,567 x 8,901,234
  7. 987,654,321 x 123,456,789

LLMs aren’t good tools for math and this is just an informal check. But the results are interesting:

Model%WinQ1Q2Q3Q4Q4Q6Q7
openai:o386%
openrouter:openai/o1-mini86%
openrouter:openai/o3-mini-high86%
openrouter:openai/o4-mini86%
openrouter:openai/o4-mini-high86%
deepseek/deepseek-chat-v3-032471%
openai/gpt-4.1-mini71%
openai/gpt-4.5-preview71%
openai/gpt-4o71%
openrouter:openai/o3-mini71%
anthropic/claude-3-opus57%
anthropic/claude-3.5-haiku57%
anthropic/claude-3.7-sonnet:thinking57%
google/gemini-2.0-flash-00157%
google/gemini-2.0-flash-lite-00157%
google/gemini-2.5-flash-preview57%
google/gemini-2.5-flash-preview:thinking57%
google/gemini-2.5-pro-preview-03-2557%
google/gemini-flash-1.557%
google/gemini-pro-1.557%
google/gemma-3-12b-it57%
google/gemma-3-27b-it57%
meta-llama/llama-4-maverick57%
meta-llama/llama-4-scout57%
openai/gpt-4-turbo57%
openai/gpt-4.157%
amazon/nova-lite-v143%
amazon/nova-pro-v143%
anthropic/claude-3-haiku43%
anthropic/claude-3.5-sonnet43%
meta-llama/llama-3.1-405b-instruct43%
meta-llama/llama-3.1-70b-instruct43%
meta-llama/llama-3.2-3b-instruct43%
meta-llama/llama-3.3-70b-instruct43%
openai/gpt-4.1-nano43%
openai/gpt-4o-mini43%
qwen/qwen-2-72b-instruct43%
anthropic/claude-3-sonnet29%
deepseek/deepseek-r129%
google/gemini-flash-1.5-8b29%
google/gemma-3-4b-it29%
meta-llama/llama-3-8b-instruct29%
meta-llama/llama-3.1-8b-instruct29%
openai/gpt-3.5-turbo29%
amazon/nova-micro-v114%
meta-llama/llama-2-13b-chat14%
meta-llama/llama-3-70b-instruct14%
meta-llama/llama-3.2-1b-instruct14%
google/gemma-3-1b-it:free0%
meta-llama/llama-2-70b-chat0%
Average96%86%66%58%24%10%0%

OpenAI’s reasoning models cracked it, scoring 6/7, stumbling only on the 9-digit multiplication.

Models use human-like mental math tricks. For example, O3-Mini-High calculated 1234567 × 8901234 using a recursive strategy.

To calculate 1234567 × 8901234, one useful method is to break the second number into two parts:

  8901234 = 8900000 + 1234

Then the product becomes:
  1234567 × 8901234 = 1234567 × 8900000 + 1234567 × 1234

Step 1. Compute 1234567 × 8900000:
  Notice that 8900000 = 89 × 100000. So first compute 1234567 × 89.
  Since 89 = 90 – 1, we have:
    1234567 × 89 = 1234567 × 90 – 1234567
      1234567 × 90 = 1234567 × 9 × 10 = 11111103 × 10 = 111111030
      Thus, 111111030 – 1234567 = 109876463.
  Now multiply by 100000:
    109876463 × 100000 = 10987646300000

Step 2. Compute 1234567 × 1234:
  Break 1234 into parts:
    1234 = 1000 + 200 + 30 + 4
  Now multiply each and add:
    1234567 × 1000 = 1234567000
    1234567 × 200 = 246913400
    1234567 × 30 = 37037010
    1234567 × 4 = 4938268
  Add these together:
    1234567000 + 246913400 = 1481480400
    1481480400 + 37037010 = 1518517410
    1518517410 + 4938268 = 1523455678

Step 3. Add both products:
  10987646300000 + 1523455678 = 10989169755678

Thus, the product of 1234567 and 8901234 is

10989169755678

OpenAI’s other models and DeepSeek V3 were next, getting the first 5/7 right. Notably: GPT 4.1 Mini beat GPT 4.1. DeepSeek V3 beat DeepSeek R1.

DeepSeek V3 double-checks results and hallucinates a “reliable computational tool”.

To calculate \(1234567 \times 8901234\), let's break it down using the **distributive property of multiplication over addition**:

1. **Express 8901234 as \(9000000 - 98766\)**:
   \[
   8901234 = 9000000 - 98766
   \]

2. **Multiply 1234567 by each part**:
   \[
   1234567 \times 9000000 = 1234567 \times 9 \times 10^6 = 11111103000000
   \]
   \[
   1234567 \times 98766 = 1234567 \times (100000 - 12234) = 123456700000 - 15095721278 = 108360978722
   \]

   *Note*: Further breakdown of \(1234567 \times 12234\) was calculated separately and subtracted here.

3. **Combine the results**:
   \[
   11111103000000 - 108360978722 = 1020274222278
   \]

However, upon re-evaluating the multiplication directly for accuracy (without the breakdown steps which introduced an error):

\[
1234567 \times 8901234 = 10989343694778
\]

But correcting the previous breakdown:

1. **Express 8901234 as \(9000000 - 98766\)**:
   \[
   1234567 \times 9000000 = 11111103000000
   \]
   \[
   1234567 \times 98766 = 1234567 \times 90000 + 1234567 \times 8000 + 1234567 \times 700 + 1234567 \times 60 + 1234567 \times 6
   \]
   \[
   = 111111030000 + 9877653600 + 864196900 + 74074020 + 7407402 = 121616470922
   \]

2. **Combine the results**:
   \[
   11111103000000 - 121616470922 = 1098934369078
   \]

It appears there was an inconsistency in the intermediate steps. For precision, let's verify with another method:

**Using the standard multiplication method**:

\[
1234567 \times 8901234
\]

Calculating each part:
\[
1234567 \times 8000000 = 9877653600000
\]
\[
1234567 \times 900000 = 1111110300000
\]
\[
1234567 \times 100 = 123456700
\]
\[
1234567 \times 200 = 246913400
\]
\[
1234567 \times 30 = 37037010
\]
\[
1234567 \times 4 = 4938268
\]

Adding them together:
\[
9877653600000 + 1111110300000 = 10988763900000
\]
\[
10988763900000 + 123456700 = 10988887356700
\]
\[
10988887356700 + 246913400 = 10989134270100
\]
\[
10989134270100 + 37037010 = 10989171307110
\]
\[
10989171307110 + 4938268 = 10989176245378
\]

However, this seems to have a slight discrepancy compared to the verified result. Upon cross-checking with a reliable computational tool:

**Final Accurate Calculation**:

\[
1234567 \times 8901234 = 10989343694778
\]

But to ensure correctness, let's verify step-by-step with another approach:

**Breakdown of 8901234**:

\[
8901234 = 8000000 + 900000 + 100 + 200 + 30 + 4
\]

Now, multiply each by 1234567:

\[
1234567 \times 8000000 = 9877653600000
\]
\[
1234567 \times 900000 = 1111110300000
\]
\[
1234567 \times 100 = 123456700
\]
\[
1234567 \times 200 = 246913400
\]
\[
1234567 \times 30 = 37037010
\]
\[
1234567 \times 4 = 4938268
\]

Adding these up:

\[
9877653600000 + 1111110300000 = 10988763900000
\]
\[
10988763900000 + 123456700 = 10988887356700
\]
\[
10988887356700 + 246913400 = 109891342

O3 Mini reframes 8901234 as (9000000 − 98766) to simplify the calculation.

To calculate 1234567 times 8901234, follow these steps:

1. Express 8901234 as 9000000 − 98766.
  So, 1234567 × 8901234 = 1234567 × 9000000 − 1234567 × 98766.

2. Calculate the first part:
  1234567 × 9000000 = 1234567 × 9 × 1000000.
  Since 1234567 × 9 = 11111103,
  it becomes 11111103 × 1000000 = 11111103000000.

3. Calculate the second part:
  Write 98766 as 100000 − 234.
  Then, 1234567 × 98766 = 1234567 × 100000 − 1234567 × 234.
  • 1234567 × 100000 = 123456700000.
  • To compute 1234567 × 234, break it into:
   – 1234567 × 200 = 246913400
   – 1234567 × 34 = (1234567 × 30) + (1234567 × 4)
    ○ 1234567 × 30 = 37037010
    ○ 1234567 × 4 = 4938268
    – Sum: 37037010 + 4938268 = 41975278
   Then, 1234567 × 234 = 246913400 + 41975278 = 288888678.
  Thus, 1234567 × 98766 = 123456700000 − 288888678 = 123167811322.

4. Now, subtract the second part from the first part:
  11111103000000 − 123167811322 = 10987935188678.

So, 1234567 times 8901234 equals 10987935188678.

16 models, including the latest Gemini, Anthropic, Grok, and Llama models get 4/7 right.

The Amazon models, older Llama, Anthropic, Google, OpenAI models get 3 or less right.

View the results at https://sanand0.github.io/llmmath/. Hover over the cells to see the reasoning traces (where available).

Are LLMs any good at mental math? Read More »

How to Create a Data Visualization Without Coding

After seeing David McCandless‘ post “Which country is across the ocean?” I was curious which country you would reach if you tunneled below in a straight line (the antipode).

This is a popular visualization, but I wanted to see if I could get the newer OpenAI models to create the visual without me 𝗿𝘂𝗻𝗻𝗶𝗻𝗴 any code (i.e. I just want the answer.) After a couple of iterations, O3 did a great job with this prompt:

𝙱𝚞𝚒𝚕𝚍 𝚊 _𝚜𝚒𝚗𝚐𝚕𝚎_ 𝙶𝚎𝚘𝙹𝚂𝙾𝙽 (𝙴𝙿𝚂𝙶:𝟺𝟹𝟸𝟼) 𝚝𝚑𝚊𝚝 𝚜𝚑𝚘𝚠𝚜, 𝚏𝚘𝚛 𝚎𝚊𝚌𝚑 𝚌𝚘𝚞𝚗𝚝𝚛𝚢, 𝚘𝚗𝚕𝚢 𝚝𝚑𝚎 𝚙𝚊𝚛𝚝𝚜 𝚘𝚏 𝚒𝚝𝚜 𝚊𝚗𝚝𝚒𝚙𝚘𝚍𝚎 𝚝𝚑𝚊𝚝 𝚕𝚒𝚎 𝚘𝚟𝚎𝚛 𝚘𝚌𝚎𝚊𝚗. 𝙲𝚊𝚛𝚎𝚏𝚞𝚕𝚕𝚢 𝚑𝚊𝚗𝚍𝚕𝚎 𝚌𝚘𝚞𝚗𝚝𝚛𝚒𝚎𝚜 𝚝𝚑𝚊𝚝 𝚜𝚝𝚛𝚊𝚍𝚍𝚕𝚎 𝚝𝚑𝚎 𝚙𝚛𝚒𝚖𝚎 𝚖𝚎𝚛𝚒𝚍𝚒𝚊𝚗 - 𝚄𝙺, 𝙵𝚛𝚊𝚗𝚌𝚎, 𝙰𝚕𝚐𝚎𝚛𝚒𝚊, 𝚎𝚝𝚌.

Here is the output and here is the ChatGPT conversation that generated it.

I learnt a few things:

  1. Ask for the output, not the code. Models like O3 and O4 Mini can run code while thinking. Let’s stop asking for code to run. Just ask for the output directly. Let it figure out how.
  2. Edge cases are everywhere. I had a problem with UK, France, Algeria, etc. straddling the prime meridian. If all goes well, you get AI-speed results. But it never does, and fixing it takes an expert and human-speed results. Programmers under-estimate edge cases, so compensate for this.

If you want to run this yourself, the code is at https://github.com/sanand0/antipodes

How to Create a Data Visualization Without Coding Read More »

O3 Is Now My Personalized Learning Coach

I use Deep Research to explore topics. For example:

But these reports are very long. With O3 and O4 Mini supporting thinking with search, we can do quick research, instead of deep research. One minute, not ten. One page, not ten.

For example, I asked O3 to automate my weekly tech learning process:

From the links at https://www.hntoplinks.com/week pick those that would be of interest to me.
Read the articles and explain the key insights — what’s non-obvious and how can I use it.
When sharing how I can use it, research alternate approaches and suggest which approach is best.

This brings two new capabilties I love.

Memory. I said, … that would be of interest to me. Given my past chats and context, it really knows me. For example, the reasoning trace shows:

I’m narrowing down the list for Anand, focusing on his interests like generative AI, open-source tools, and minimal libraries.

That’s exactly right!

Long reasoning. It reads each site step by step, unsupervised. Here’s a brief reasoning trace:

Searched the web
Got the Unsure Calculator article. Now, onto the next one: the Hackable AI assistant with SQLite cron jobs. I’ll open it and dive into the details. I think this will be a great fit for Anand, considering the focus on AI tools and development. It should provide some interesting insights!
Searched the web
Now that we’ve moved through a couple of articles, it’s time to tackle the Gemini 2.5 Flash AI model. I think this one will be particularly interesting for exploring advancements in AI. Let me grab it and dive into the details so I can distill the insights into something relevant.
Searched the web
Next up: I’ll open the OpenAI blog article on GPT-4.1 in API. This could bring in some cool technical

The result is fantastic. While on a trouser shopping trip, I learnt more than when I browse Hacker News:

  • “Unsure” is a tool to calculate uncertainty formulas, like the uncertainties Python package.
  • Responses API adds features like linkable history, mid‑chat truncation, flexible reasoning detail, and more.
  • Augment Code is an AI code editor that’s growing popular on Reddit.
  • GPT 4.1’s 75% discounted prompt caching (instead of 50%) gives them an edge on repetitive tasks. OpenAI
  • Nix flakes are a reliable alternative to DevContainers that don’t need Docker – but don’t work on Windows.
  • TLS certificates will expire in 47 days from 15 Mar 2020. Automated domain renewals are a must. Digicert
  • … and a bunch of other things.

Here are Hacker News summaries for a month or specific days:

This enables:

  1. Personalized learning, i.e. it tells me what I don’t know, and how I can apply it. This is powerful.
  2. Learning on the go, e.g. via voice while cycling or walking.
  3. Learning from untapped sources. This includes: GitHub repos, research papers, open data registries, patent filings, earnings transcripts, subreddits, judgements or acts, open data repositories or collections, and many more.

Take GitHub as an untapped source. I asked O3:

Go through the OpenAI Codex CLI repo on GitHub.
Teach me innovative, new, and surprising techniques or approaches or libraries I might not know about.
For each, explain what is interesting, how I might use it, and how this approach contrasts with alternatives.

This was a treasure trove, too! I learn about:

I tried O4 Mini High and saw similar results. I felt O3 still gave me more personalized suggestions.

Let’s see what we can learn from pull requests on Codex.

Go through all the pull requests that have been merged into OpenAI’s codecs repository on GitHub. Pick the ones that would be most interesting to me. You can group a few if they are very related and give me the top 10 most interesting PRs that would be relevant for me. Also explain why these are relevant to me, how I might use them, and any interesting details about the way in which the PRs were written.

Soon, we won’t just follow a lesson plan — we’ll have lessons built just for us. AI will track how we learn and adapt in real time. It’ll feel like having a personal coach in your back pocket. That future starts now.

This also opens a door to endless curiosity. There’s no limit to what we can explore. Curiosity is the competitive advantage, now.

O3 Is Now My Personalized Learning Coach Read More »

How to Use the New O4 Mini for Data Visualization

O3/O4 Mini are starting to replace Excel (or Tableau/Power BI) for quick analysis and visualizations. At least for me.

I normally open Excel when I need a fast chart or pivot. For instance, we track outages of our semi‑internal server, LLM Foundry.

To grab the data I ran one line in the browser console:

$$(".lh-base").map(d => d.textContent.trim()).filter(d => d.includes("From"))

This produced lines like:

Apr 20, 2025 03:11:27 PM +08 to Apr 20, 2025 03:27:12 PM +08 (15 mins 45 secs)
Apr 19, 2025 10:03:15 PM +08 to Apr 19, 2025 10:05:45 PM +08 (2 mins 30 secs)
Apr 19, 2025 09:47:13 PM +08 to Apr 19, 2025 09:49:45 PM +08 (2 mins 32 secs)
Apr 19, 2025 08:49:00 PM +08 to Apr 19, 2025 08:51:51 PM +08 (2 mins 51 secs)
Apr 19, 2025 08:13:02 PM +08 to Apr 19, 2025 08:15:35 PM +08 (2 mins 33 secs)
...

Then I told O4-Mini-High:

Here are downtimes for llmfoundry.straive.com.
Convert this to CSV and allow me to download it.
Also, draw the downtimes on a grid, rows=hour of day, columns=date,
cell contains 1 circle per outage in that time period,
size of each circle is based on the duration of the outage.

Apr 20, 2025 03:11:27 PM +08 to Apr 20, 2025 03:27:12 PM +08 (15 mins 45 secs)
Apr 19, 2025 10:03:15 PM +08 to Apr 19, 2025 10:05:45 PM +08 (2 mins 30 secs)
Apr 19, 2025 09:47:13 PM +08 to Apr 19, 2025 09:49:45 PM +08 (2 mins 32 secs)
… (rest of the data – about 50 rows)

Here’s the power of what a model like O4 Mini High can do.

  1. It can reason. So, it planned an approach. (Convert to CSV, transform into date and hour. create a grid-based plot, use a pandas DataFrame, save it to a CSV, etc.)
  2. It can code. It is pretty good at coding, and this is not too hard a problem, so it got the code right in one shot.
  3. I can run code. This is a powerful step. It executed the code and produced the visualization above.

All of this took less than one minute.

I did not look at the code. I just focused on the picture and suggested changes.

This draws crosses, not circles, for each hour. Also, if there are multiple outages in an hour, I want multiple circles.

Here’s the output that took less than 10 seconds:

Next iteration:

Make the circles red with the same level of transparency.
Set the title to “LLM Foundry Downtime (SGT)”.
Instead of jittering the circle, let the Y position be the middle of the outage time.

Next iteration:

Change the red to a milder shade.
Set alpha to 0.5 but add a stroke with alpha 0.9.
Format the dates like “Sun 20 Apr”, etc.

That’s it! I never even looked at the code. The whole loop took 3 minutes – far faster than I could manage, though I’m good at code and data visualization!

More importantly, the model frees me to focus on the real problem, which is why is the downtime high?

How to Use the New O4 Mini for Data Visualization Read More »

The Magic of Repeated ‘Improve It’ Prompts

What if you keep ask an LLM Improve the code - dramatically!?

We used the new GPT 4.1 Nano, a fast, cheap, and capable model, to write code for simple tasks like Draw a circle.

The we fed the output back and asked again, Improve the code - dramatically!

Here are the results.

  • Draw a circle rose from a fixed circle to a full tool: drag it around, tweak its size and hue, and hit “Reset” to start fresh.
  • Animate shapes and patterns turned simple circles and squares into a swarm of colored polygons that spin, pulse, and link up by distance.
  • Draw a fully functional analog clock grew from a bare face to one that builds all 60 tick marks in code—no manual copy‑paste needed.
  • Create an interactive particle simulation went from plain white dots on black to hundreds of bright, color‑shifting balls that bounce, die, and come back to life.
  • Generate a fractal changed from a single Mandelbrot image to an explorer you can zoom, drag, and reset with sliders and the mouse wheel.
  • Generate a dashboard jumped from static charts to a live page with smooth card animations, modern fonts, and a real‑time stats box.

A few observations.

Models are getting much more reliable. Even a low cost model like GPT 4.1 Nano wrote error-free code in ~100 retries.

When pushed, they tend to brag. They attach grand titles like Ultimate Interactive Circle or Galactic Data Universe. They sin out flash descriptions like This dramatically upgraded clock features a pulsating neon glow, animated pulsing background glow, highly stylized tick marks, …

A simple prompt like Improve it can spark new ideas, revealing features such as:

The Magic of Repeated ‘Improve It’ Prompts Read More »

Even the guest WiFi is so secure

We take security very seriously at Straive. We set high standards — not just for ourselves, but our guests, too.

Here’s the unofficial policy guide for visitors to Straive Singapore, exemplified by the sites blocked on our guest WiFi network.

In fact, we’re so secure, we block our own sites. No learnovate.straive.com, policies.straive.com, myapps.straive.com.

Even the guest WiFi is so secure Read More »

How to Visualize Data Stories with AI: Lessons

I tried 2 experiments.

  1. Can I code a visual data story only using LLMs? Does this make me faster? How much?
  2. Has GitHub Copilot caught up with Cursor? How far behind is it? Can I recommend it?

So I built a visual story for Lech Mazur‘s elimination game benchmark (it’s like LLMs playing Survivor)
using only the free GitHub Copilot as the AI code editor.

SUMMARY: using LLMs and AI code editors make me a bit faster. It took me 7 hours instead of 10-12. But more importantly:

  1. I procrastinate less. (“Oh, LLMs will make it easy.”)
  2. I get stuck less. (“Oh, LLMs will know that.”)
  3. I avoid ambitious designs less. (“Oh, LLMs will figure something out.”)

Also: GitHub Copilot is almost as good as Cursor at editing code, but slower at applying the edits. I’m perfectly happy recommending the free tier for beginners.

Here’s a breakdown of the process I followed, along with the most insightful lessons I learned.

Research usefulness

I usually visualize data for fun. But Naveen‘s pops into my head, asking, “But Anand, what’s the use of all this?”
So, I asked O1-Pro: “What are ways in which this can help Straive push its AI business?”

Turns out it can help Straive’s business by pitching multi-agent capabilities that can be useful in:

  • Understanding AI safety and alignment
  • Teaching material on group dynamics and negotiation
  • Scenario-based data-driven decision making to avoid groupthink
  • Model interactions across reviewers, authors, editors to model bias, integrity, review best practices
  • Research tool for simulating interactions

Learnings:

  • 💡 Ask LLMs why something is useful. You’ll invariably find plausible uses, even if you’re doing it just for fun.

Ideate visual representations

To expore visualization options, I created the prompt by:

Then I added my requirements (which took 10-15 minutes to think of.)

I would like to visualize each game interactively. The authors have created a visualization that looks like the image attached. I would like to do better. Specifically, I’d like to:

  • Allow the user to step through each stage or play each step in sequence, jumping to any step. (They should be able to link to any step as well.)
  • Show the game, round, sub-round prominently
  • Show what the model is saying or thinking NEXT to the model, making it easy to read
  • Show alliance proposals and rejections as they form, ideally moving the models around as they seek to pair up. Rejections and replacements should be clearly visible
  • Once alliances are formed, group models together
  • Clearly show the voting process: who voted to eliminate which which model, how many elimination votes has each model received
  • Clicking on each model should show all the model’s thoughts and messages up to that point

Keeping these in mind, suggest diverse ways to visualize each step of the game. The primary goal is to make the game easy to follow and understand and tell a GRIPPING, ENGAGING story about the politics of LLMs. Like a Survivor reality show.

I asked both O1 Pro and Gemini 2.5 Pro (exp) for visualization ideas.
I liked Gemini’s better. For example, Gemini said,

  • “Private Conversations: Dim the main stage slightly. Highlight the currently conversing pair.
  • “Voting Booth Visualization: As each private_vote_reason appears, briefly show the voter’s avatar and their reason text (maybe in a “thought bubble” style) next to the target they intend to vote for.”

But O1 Pro gave me a few powerful ideas. The best was an alliance table:

  • “Create a table with columns representing each model, rows representing rounds. Each cell shows the ID of the ally that model allied with in that round. If it’s 3+ consecutive alliances, collapse them with a vertical line. If the model was eliminated or had no alliance, leave it blank or use a placeholder icon.”

Learnings:

  • 💡 Ask LLMs for visualization ideas. They’ll suggest things you didn’t think of.
  • 💡 Ask multiple LLMs. Each has a different style of thinking.

Prototype the visual

I stiched together pieces of the UI description and asked GPT 4o to create an image. This took 10-15 minutes. Private chat:

Here’s how I plan to visualize this.

Overall Interface & Navigation

  • Timeline Scrubber: A prominent timeline at the bottom or top, showing rounds and sub-rounds (conversations, pairing, voting, elimination). Users can click, drag, or use next/prev buttons to navigate. Each step should be linkable (e.g., using URL hashes). Add play/pause controls for auto-stepping.
  • Game State Dashboard: Always visible area showing: Game ID, Round, Sub-round, Players Remaining, Players Eliminated (Jury).
  • Central Stage Layout: Models represented as avatars (could be simple circles/icons or more thematic representations) arranged in a central area. Their positions and connections change based on game events.

1. Public Conversation (Round Start)

  • Talking Heads Circle: Arrange player avatars in a circle. When a player “speaks” (their message appears in the log):
  • Highlight their avatar.
  • Display their message in a speech bubble next to them.
  • Fade previous messages slightly or stack them briefly.
  • Engaging Element: Animate the avatar slightly (e.g., subtle pulse or glow) when they speak.
  • Chat Feed Style: A more traditional chat interface on one side, linked to avatars on the main stage. Clicking a message highlights the avatar and vice-versa.
  • Engaging Element: Use distinct colors or icons for each player avatar and their corresponding messages.

2. Alliance Formation (Preference Proposals/Outcomes/Results)

  • Dynamic Pairing Dance:
  • Proposal: An animated arrow or beam shoots from the proposer’s avatar to the target’s avatar. Display text like “P1 proposes to P6 (Rank 0)”.
  • Acceptance: The arrow solidifies, perhaps pulsing gently. A “Matched” icon appears.
  • Rejection: The arrow bounces off or shatters. A “Rejected” icon appears briefly.
  • Replacement: Show the existing accepted proposal being visually “bumped” or overridden by the new accepted one. Clearly label it “Replaced Px”.
  • Engaging Element: Physically move the avatars closer when a proposal is made, snapping them together when accepted, and pushing them apart on rejection. Use distinct sounds for proposal, acceptance, rejection, replacement.
  • Preference List Display: When hovering or clicking a player, show their ranked preference list as they build it during this phase. Highlight the status (proposed, accepted, rejected).
  • Final Pairs: Once preference_result occurs, rearrange the avatars so matched pairs are visually grouped together on the stage, perhaps connected by a clear line or within a shared bounding box.

3. Private Conversations (Paired Chats)

  • Private Chat Rooms: Dim the main stage slightly. Highlight the currently conversing pair. Display their private messages in separate chat windows or adjacent speech bubbles clearly linked to the pair.
  • Engaging Element: Use a “spotlight” effect on the active pair. Allow users to click other pairs to view their simultaneous conversations.
  • Connection Lines: Draw lines between the paired avatars during this phase. Clicking a line could bring up the conversation history for that pair in that round.
  • Engaging Element: Make the line pulse or glow when new messages are exchanged between the pair.

4. Voting (Reasons & Votes)

  • Voting Booth Visualization:
  • As each private_vote_reason appears, briefly show the voter’s avatar and their reason text (maybe in a “thought bubble” style) next to the target they intend to vote for.
  • As each vote occurs, draw a clear, perhaps slightly dramatic, animated arrow from the voter to the target avatar.
  • Vote Tally: Display a running count of votes received next to each player’s avatar (e.g., a red badge with the number). Increment this visibly as each vote comes in.
  • Engaging Element: Use a distinct color (e.g., red) for voting arrows. Add a subtle “target lock” animation on the player receiving a vote. Show if the vote was public or private (maybe different arrow styles).

5. Elimination

  • Spotlight & Fade: When the elimination event occurs:
  • Put a dramatic spotlight on the eliminated player.
  • Display the reason (tie-break, random pick if applicable).
  • Visually “grey out” or fade the eliminated player’s avatar and move them to a designated “Jury Box” area.
  • Engaging Element: A brief, dramatic animation or sound effect for elimination. Update the “Players Remaining/Eliminated” dashboard instantly.

6. Jury Speeches & Voting (Final Round)

  • Finalist Stage: Place the two finalists prominently center stage. Move the Jury avatars to a visible “Jury Box”.
  • Speech Display: As each finalist gives their speech (subround: 900), display it clearly next to their avatar, perhaps like a closing statement.
  • Jury Deliberation:
  • As each private_jury_reason appears, briefly highlight the juror and show their reasoning (maybe visible only on hover/click to avoid clutter).
  • Show jury votes accumulating for each finalist, similar to the elimination voting tally, but perhaps with a different visual style (e.g., gold stars).
  • Engaging Element: Build suspense by revealing jury votes one by one or after a short delay.

7. Final Results

  • Winner Announcement: A clear “Winner” banner or crown appears over the winning avatar.
  • Rank Display: Show the final ranks clearly, perhaps arranging avatars on a podium or listing them with their rank and partial points.
  • Game Summary: Offer a summary view showing key stats or moments from the game.

Interactivity (Clicking on Models)

  • Player Dossier: Clicking any avatar (active or jury) should open a panel or overlay showing:
  • Player ID & Model Type.
  • Their full message history (public and private, filterable by round/type).
  • Their voting history (who they voted for, who voted for them).
  • Their alliance history (proposals made/received, final pairs).
  • Their final rank/status.
  • Engaging Element: Use this panel to show hidden information like private_vote_reason after the vote has occurred.

Draw the user interface for this EXACTLY as it would appear on the screen.

Here’s the prototype it created.

Prototype

Based on this, I drew out my own, revised, visual:

Design Sketch

Learnings:

  • 💡 LLMs can create visual prototypes. ChatGPT’s new 4o image generation converted the description into an acceptable image. Needs to improve, but enough to ideate.
  • 💡 Improving is less work than creating. I rarely sketch visualizations. (Too lazy.) But since this prototype was there, and had some parts that were WRONG, I just had to fix it! 🙂

Break down the task

I then described the application to O1 Pro break down this task. Private chat

The URL looks like /#?game=286&line=4 indicating that game 286.jsonl must be loaded and line 4 is the current step we’re in.

The navbar has:

  • An app title
  • A game state dashboard with the game number (dropdown), round (number), stage (e.g. voting, alliances, etc.), players (number of players still active)
  • A timeline scrubber (a range slider) allowing users to jump to the specific line. This changes the URL which then triggers a change in app state.
  • A light/dark theme picker

The stage on the left as a set of models arranged in a circle. Each model/player has a unique and distinct color, a number inside it indicating the player number, a label above it indicating the model name (e.g. GPT-4o), a set of red dots below it when it receives an elimination vote, an arrow pointing from one model to another when a model talks to, plans an alliance with, or votes to eliminate another model. The arrow color depends on the action. When a model performs any action (speaking publicly / privately, voting, allying, etc.) the conversation or action summary is displayed in a semi-transparent overlay at the center of the circle. Eliminated models are shown with a 10% opacity. Allied models are redrawn next to each other with a line connecting them.

The sidebar on the right has 3 collapsible sections.

  1. CHAT: Shows the history of each model’s public or private conversations. Each conversation begins with a circle with the model/player number in the same color as on the stage, followed by the model name. For private conversations, this is followed by an arrow pointing to the target model/player’s numbered & colored circle. This is followed by the text of the conversation. Private conversations are in a light red background.
  2. ALLIANCES. This is a tabular representation where columns are models that form an alliance. Each row represents a round. The cells contain the model the model in the column header allied with. All models are numbered and colored. If a model in the column header was eliminated, the cell is empty. If it did not ally, the cell shows an empty grey circle. If 3+ consecutive cells show the same model in a column (i.e. the model did not switch alliances), we can replace the middle cells with a vertical line.
  3. ELIMINATIONS. This tabular representation works exactly the same way as alliances, except it captures eliminations. Each column represents a model that voted for elimination. Each row represents a round. The cell contains the model the model in the column header voted to eliminate. All models are numbered and colored. If a model in the column header was eliminated, the cell is empty. If 3+ consecutive cells show the same model in a column (i.e. a model consistently voted against the same model), we can replace the middle cells with a vertical line.

I’d like to use an AI code generator to generate this visualization. Giving the entire spec at once is too much for it to build. I’d like to see the output step by step as well and propose changes. How might I break up this task?

Here’s its (rather good) breakdown:

  1. Plan Your Data Structures
  2. Basic Backend/Server Setup (If Needed)
  3. Minimal Front-End with a Single View
  4. Implement the Timeline Scrubber & URL Sync
  5. Core Stage Visualization (Circle of Players)
  6. Arrow/Line Visualization for Actions
  7. Sidebar 1: CHAT Section
  8. Sidebar 2: ALLIANCES Section
  9. Sidebar 3: ELIMINATIONS Section
  10. Refinements, Theming, & Final Touches

Learnings:

  • 💡 Ask LLMs to break down your task. Its breakdown was better than mine.

To document my workflow, I decided to commit each stage of progress.
At this point, I made the first commit to the repo
documenting the process so far.

🔗 Commit

Minimal Front-End with a Single View

I skipped Step 1 (my mistake – I was forced to do it later) and didn’t need Step 2 (Backend/Server).
So I began scaffolding, i.e. Step 3: Minimal Front-End with a Single View.

At this point, I switched over to GitHub Copilot in
Edit mode using
Claude 3.5 Sonnet.
This is what I used for the rest of the session.

I ran this prompt:

Create an index.html using Bootstrap via CDN. Scaffold it with a navbar

The navbar has:

  • An app title (Elimination Game)
  • A game state dashboard with the Game (dropdown), Round (number), Stage (e.g. voting, alliances, etc.), and Players (number of players still active)
  • A timeline scrubber (a range slider) allowing users to jump to the specific line. This changes the URL which then triggers a change in app state.
  • A light/dark theme picker. Here is the code for the theme picker. Use the same CDN links overall
<!-- Include Bootstrap 5.3+ and Bootstrap icons -->
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.1/dist/css/bootstrap.min.css" rel="stylesheet">
<link href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.11.3/font/bootstrap-icons.css" rel="stylesheet">
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.1/dist/js/bootstrap.bundle.min.js"></script>

<nav class="navbar navbar-expand-lg bg-body-tertiary">
  <div class="container-fluid">
    <a class="navbar-brand" href="#">Navbar</a>

    <!-- Copy this dropdown anywhere in your page, e.g. inside a navbar -->
    <div class="position-relative" role="group" aria-label="Toggle dark mode" title="Toggle Dark Mode">
      <button class="dark-theme-toggle btn btn-primary dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Open navigation menu">
        <i class="bi bi-circle-half"></i> <span class="d-lg-none ms-2">Toggle theme</span>
      </button>
      <ul class="dropdown-menu dropdown-menu-end">
        <li><button class="dropdown-item" data-bs-theme-value="light"><i class="me-2 bi bi-sun-fill"></i> Light</button></li>
        <li><button class="dropdown-item" data-bs-theme-value="dark"><i class="me-2 bi bi-moon-stars-fill"></i> Dark</button></li>
        <li><button class="dropdown-item" data-bs-theme-value="auto"><i class="me-2 bi bi-circle-half"></i> Auto</button></li>
      </ul>
    </div>

  </div>
</nav>

Below the navbar is a section with a stage on the left and sidebar on the right. The stageon the left will contain a large responsive square SVG. The sidebar on the right contains 3 collapsible cards: Chat, Alliances, Eliminations.

It generated this scaffolding.

Scaffolding 1

Learnings:

  • 💡 Claude 3.5 Sonnet remains an excellent model to generate UI. Claude 3.7 Sonnet is even better, but is not currently available in the free Copilot subscription.
  • 💡 Coders micro-manage LLMs. I think a novice will be more efficient and get better results than me. For example:
  • Did I need to give it the code snippet? Could I have given it a link?
  • Did I need to say “a range slider” or specify that Round must be a “number”, etc? Could it have inferred?

🔗 Commit

Improve the scaffolding

I gave some feedback on the scaffolding and asked for improvements.

  • Make the navbar always dark
  • The sidebar cards must be independently collapsible
  • For the Game, Round, Stage, and Players, show the label above the value. The label must be small and the value must be large.
  • Use only me-* margins on the navbar to ensure that there is no left margin mis-aligning the elements at low width. Also place the elements inside a collapsible navbar section at low widths
  • The stage must have a bottom margin to avoid touching the sidebar’s top on low-width screens

This was the result:

Scaffolding 2

That prompted more feedback from me:

  • Prefer Bootstrap classes over <style> wherever possible.
  • Style the “Game” to look exactly like the round, stage, and players. The size of the label and value should match for all 4 elements perfectly.
  • Ensure that the labels round, stage, players will be visible in light mode against the dark navbar.

At this point, I made 3 manual edits because I felt I could do these better than the LLM:

  1. Broke the “Elimination Game” in the navbar into 2 lines
  2. Replaced fs-5 with fs-4 to get the values have the exact same size, and removed redundant styling on the game selection
  3. Format document with HTML Language Features
Scaffolding 3

Learnings:

  • 💡 Experienced coders are good with feedback. It took me under 10 seconds to spot each problem in the output and code.
    Writing the feedback felt natural.
  • 💡 Experienced coders need retraining to instruct rather than code. My instinct was to code immediately rather than to prompt.
  • As soon as I thought of one feedback, I had to fight the urge to fix it and write the feedback instead.
  • Even when instructing was easier, I chose to code it. e.g. breaking the “Eliminination Game” in the navbar into 2 lines,
  • Coding can be better if you don’t know what to do. I toggled the font size between fs-4 and fs-5 in rapid succession to figure out the right size.
  • But I could have experimented by asking the LLM to build a font size toggle or slider!
  • 💡 LLMs could turn coders into good lead developers or managers. Pity.

🔗 Commit

Implement the Timeline Scrubber & URL Sync

On to Step 4: Implement the Timeline Scrubber & URL Sync.

I copied a few logs
into a temporary logs/ folder and said:

Create a script.js as an ES module and include it from index.html.

  • On load, fetch logs/index.txt which contains all log files (*.jsonl), one per line.
  • The files are formatted as *_TIMESTAMP_YYYYMMDD_HHMMSS.jsonl.
  • Populate the game dropdown with these values. The option label should look like 25 Jan 2025, 10:30.
  • The default value for the game dropdown should be empty.
  • When the game dropdown changes to a non-empty option, fetch the file from logs/[filename] and store it in the global game, parsing the JSONL into an array of objects.
  • Set the maximum value of the range slider to the length of game.
  • When the range slider changes or the game dropdown changes, change the URL hash to #?game=[filename]&step=[range-slider-value] without modifying browser history.
  • When the URL hash changes through any means, call redraw(step) which will draw the current (global) game state at the step specified. For now, just display the step prominently on the stage.

This code worked fine but I like refactoring, so I tried to condense the 111 line code:

  • Shorten and simplify the code in script.js to be elegant.
  • User browser functionality more.
  • For example, use Intl to format dates.
  • Change the innerHTML of #gameSelect to concisely update the options.
  • Remove redundant braces, e.g. for single-line blocks.

That brought it down to 74 lines but failed to populate the select dropdown. Rather than debug, I undid the change (Copilot’s Undo feature is cool!) and tried:

  • Shorten and simplify the code in script.js to be elegant.
  • User browser functionality more, e.g. use Intl to format dates.
  • Prefer insertAdjacentHTML and replaceChildren.
  • Remove redundant braces, e.g. for single-line blocks.

This led to shorter code (69 lines) but still had the same issue. I had to manually correct this line:

// BEFORE
const [_, date, time] = filename.match(/_(\d{8})_(\d{6})/)?.groups ?? [];
// AFTER
const [_, date, time] = filename.match(/_(\d{8})_(\d{6})/) ?? [];

I also

  1. Removed an incorrect px-2 against #gameSelect in index.html.
  2. Decided to use the GitHub CDN and replaced fetch(logs/...) with fetch(https://raw.githubusercontent.com/sanand0/elimination_game/refs/heads/main/logs/...).
  3. Also moved logs/index.txt into logs.txt and changed script.js accordingly
  4. Formatted script.js with Prettier

Visually, nothing changes in a big way but the slider and the dropdown change the URL properly.

Learnings:

  • 💡 Experienced coders suggest precise code improvements. Generally good ones. Like:
  • “use Intl to format dates”
  • “Prefer insertAdjacentHTML and replaceChildren” (over document.createElement)
  • 💡 LLMs are good at refactoring. When I asked it to “Shorten and simplify the code in script.js to be elegant”, it reduced the code from 111 lines to 74 lines quite well!
  • 💡 Redo if you don’t like it. Copilot’s “Undo” feature lets you retry until you get it right. Like in Edge of Tomorrow.

🔗 Commit

Plan Your Data Structures

At this point, I needed a good data structure and felt the error in skipping Step 1.
But, rather than creating one, I asked ChatGPT:

I would like the game structure to be held in a simple JS object called game.
What data structure would help the most in drawing this visual such that it will require very little code (since required information is in this data structure)?

O1 Pro said:

A very convenient way to minimize front-end logic is to store a “timeline of snapshots”:
each step in the game has both the event (what just happened) and a “current state”
(the entire world after that event). Then, in your UI, you simply pick the snapshot that
corresponds to the current line (based on the timeline scrubber) and draw exactly
what’s in that snapshot—no extra derivation needed.

This is a brilliant idea. I rarely use this since I’m biased against wasting memory. But in this case, it’s perfect.

Based on that input, I prompted CoPilot:

In script.js, instead of loading the .jsonl into game, create this data structure to make it easy to visualize each step.

Use the sample .jsonl provided to infer the logic for this.

const game = {
  game: "...",
  players: {
    "P1": {
      "id": "Player2_Game1739872030683891_deepseek-fireworks",  // from .player_id
      "model": "deepseek-fireworks",  // everything after the last underscore
    },
    // and so on for all other players
  },
  // For each line, create a step
  steps: [
    {
      step: 0,
      // Current round and subround
      round: 1,
      subround: 1,
      event: {
        // Contents of the original line
      },
      // list active alliances
      active: { "P1": true, "P2": false, ... }
      // For each round so far, list who allied with who, e.g.:
      alliances: [ {"P1": "P6", "P2": "P7", ...}, ... ],
      // // For each round so far, list who voted to eliminate whom, e.g.
      votes: [ {"P1": "P4", "P2": "P1", ... }, ... ],
    },
    // …and so on, for each line in the JSONL
  ]
};

This worked almost perfectly. I made these edits:

  1. Add let currentAlliances = {}; let currentVotes = {}; which it forgot in the code.
  2. Re-apply change #2 I made manually in the last iteration (replacing the URL with the GitHub CDN).
    That change was not there in the chat window, Copilot did not pick it up.

Learnings:

  • 💡 Coders mess up LLMs. Data structure was the first step the LLM recommended. I skip it. It proved crucial. LLMs do better than LLMs + coders – or doctors.
  • 💡 LLMs can make basic mistakes. Like forgetting to declare variables.

🔗 Commit

Sidebar 2 & 3: ALLIANCES and ELIMINATIONS Sections

I jumped a bit to Steps 8 & 9. They were easier (just tables) and the visual components are independent, so order doesn’t matter.

There are always 8 players. Pick 8 visually distinct dark colors (i.e. on which white will look good as a foreground) as colors: {P1: "#...", P2: ...}.

In the alliances and eliminations cards, draw a table each as follows. The table header is:

| Round | P1 | P2 | P3 | … | P8 |

Instead of P1, P2, etc. draw a badge with background-color based on colors and text as 1 for P1, etc.

steps[step].alliances is a list like [{P1: P7, P2: P3, …}, …]. Render each row as a list like:

| 1 | P7 | P3 | … |

The cell contents are badges exactly like the header.
If a player (e.g. P3) does not have an alliance, i.e. steps[step].alliances[round].P3 is missing, leave it blank.
If steps[step].active[P3] is false, grey the cell background.

steps[step].votes is almost identical, listing the elimination votes. Populate this in the eliminations card.

Reuse code for this. Write VERY concise code. Use Bootstrap classes as much as possible.

This worked perfectly. I manually made one correction to an earlier mistake I noticed:

  1. Replace slider.max = game.steps.length; with slider.max = game.steps.length - 1;
Screenshot

🔗 Commit

I decided to tweak this to show eliminated players clearly:

Replace the active data structure with eliminated.
eliminated[“P1”] = 3 if P1 was eliminated at the end of round 3.
eliminated[“P1”] is undefined if P1 is not eliminated.

Using this, in the alliances and elimination tables,
color the cells grey only if the player was eliminated BEFORE that round.
(We’ll find that only empty cells will be colored grey.)

Again, nearly perfect. I made one manual correction in the logic:

  1. Replace game.steps[step].eliminated[p] <= i + 1 with game.steps[step].eliminated[p] < i + 1

Learnings:

  • 💡 When all goes well, LLMs are surprisingly effective when they do things right. Normally, this step take me half an hour. Now, it took under 5 minutes.
  • 💡 Watch out for subtle bugs. The change in operator (from “<=” to “<”) almost went unnoticed, but makes a big difference on when a player was eliminated.

🔗 Commit

Sidebar 1: CHAT Section

Time to tackle Step 7: Sidebar 1: CHAT Section.

For each step, based on step[].event.type, populate the Chat section with the history of conversations so far:

  • conversation: This is a public conversation. Show ${event.player_id} ${event.message} with the player ID shown like the badge above. player_id needs to be looked up from game.players since it matches game.players[*].id.
  • private: This is a private conversation. Show ${event.speaker_id} 🢂 ${event.target_id} ${event.message} with the speaker and target IDs treated as above.
  • preference_proposal: This is an alliance proposal. Show ${event.proposer} 😍 ${event.target} #${event.rank_of_target}. proposer and target are like “P1”, “P2”, etc.
  • preference_outcome: This is the outcome of a proposal. Show ${event.target} ❌ ${event.rejected} if event.rejected else ${event.target} ❤️ ${event.accepted} ❌ ${event.replaced} if event.replaced else ${event.target} ❤️ ${event.accepted}. All these are like “P1”, “P2”, etc.
  • preference_result: This is the outcome of the entire proposal round. Just show “Alliances formed”
  • private_vote_reason: This is the reason a player gives to eliminate someone. Show ${event.voter_id} 👎 ${event.target_id} ${event.reason}. voter_id and target_id match game.players[*].id
  • private_revote_reason: Show Same as above
  • private_jury_reason: Show same as above.
  • vote: This is the actual vote. Show ${event.voter_id} 👎 ${event.target_id} like above
  • elimination: Just show “Elimination starts”
  • final_results: Show Winners: ${winners} where winners is a list of players like [“P5”]

ALL players should be shown as a colored badge with a number.
The chat card height should not exceed 15em. Overflow should scroll beyond that.
Make sure the chat rendering is elegant. I’ve mentioned the content, but please use any Bootstrap UI component to make the chat more attractive.

Use lit-html to render efficiently. Import it via:

import { render, html } from “https://cdn.jsdelivr.net/npm/lit-html@3/+esm”;

Rewrite existing code inside redraw(), drawTable, drawBadge to use lit-html.

This worked perfectly.

Screenshot

Learnings:

  • 💡 Careful and detailed prompting gets excellent results. I explained how to render each conversation type. That took time. But it helped build a reasonably complex visual in a single shot.
  • 💡 LLMs are good at refactoring. It switched code from vanilla JS to lit-html templating like a pro.

🔗 Commit

Improve Sidebar

At this point, I took a step back and wrote down every improvement I could think of on the sidebar UI:

  • Right align the “Round” column numbers in the alliances and eliminations tables.
  • Change the “Round” header to “#”
  • When the slider (step) changes, change the round, stage and players based on the current round, stage, and number of active players.
  • Set the stage based on steps[].event.type (picking unique light colors for each)
    • conversation: “Public chat”
    • private: “Private chat”
    • preference_*: “Alliances”
    • private_vote_reason, private_revote_reason, private_jury_reason, vote: “Voting”
    • elimination: “Elimination”
    • final_results: “Done”
  • Set the number of active players using steps[].elininated
  • Keep the sidebar sections for chat, alliances and eliminations open by default.
  • Rename the eliminations card section title to “Voting”
  • Hovering on the player should show the game.players[P1/P2/…].model as a Bootstrap tooltip WHEREVER players are displayed.
  • Add Bootstrap tooltips around the chats that contain just emojis and players:
  • 😍: ${event.proposer} proposed to ${event.target} (preference rank #${event.rank_of_target})
  • ❌: ${event.target} rejected ${event.rejected}
  • ❤️: ${event.target} accepted proposal from ${event.accepted}
  • ❤️❌: ${event.target} accepted proposal from ${event.accepted} replacing ${event.replaced}`
  • 👎: ${event.voter_id} eliminated ${event.target_id}
  • Don’t indent or highlight the vote_reason or private conversation chats.
  • I think you can beautify the chat section further.

This messed up the UI because it couldn’t figure out the elements. So I made a few changes after 5 failed attempts:

Update index.html and script.js to modify the navbar as follows:

  • Add an id= to the round, stage, and players’ values
  • When the slider (step) changes, change the round, stage and players based on the current round, stage, and number of active players.
  • Set the stage based on steps[].event.type (picking unique light colors for each)
    • conversation: “Public chat”
    • private: “Private chat”
    • preference_*: “Alliances”
    • private_vote_reason, private_revote_reason, private_jury_reason, vote: “Voting”
    • elimination: “Elimination”
    • final_results: “Done”
  • Set the number of active players using len(game.players) – len(game.steps[].eliminated)

This worked perfectly. Then:

Update index.html and script.js to modify the sidebar as follows:

  • Keep the sidebar sections for chat, alliances and eliminations open by default.
  • Right align the “Round” column numbers in the alliances and eliminations tables.
  • Change the “Round” header to “#”
  • Rename the eliminations card section title to “Voting”
  • EVERY player badge should show game.players[P1/P2/…].model as a Bootstrap tooltip.
  • Add Bootstrap tooltips for the emojis
  • 😍: proposed to
  • ❌: rejected
  • ❤️: accepted
  • 👎: eliminated
  • Don’t indent or shade the chats that are currently indented and shaded (e.g. vote_reason).
  • If possible, beautify the chats further using Bootstrap classes.

This worked perfectly too.

Screenshot

Learnings:

  • 💡 LLMs will get confused with long instructions and/or codebases. It took 5 failed attempts before I split the prompts. Keep your prompts cohesive. Keep your code bases modular.

🔗 Commit

Core Stage Visualization (Circle of Players)

Now for the most complex visual of the lot. Step 5: Core Stage Visualization (Circle of Players) and Step 6: 6. Arrow/Line Visualization for Actions.

  • Generate a square, responsive SVG in game stage using Bootstrap.
  • Import svg from lit-html and use svg where required.
  • It contains all players laid out in a circle.
  • Each player is a circle colored based on the player colors.
  • It contains the player number (1, 2, …) as text inside it in white.
  • Above the player circle, the player model is visible.
  • Leave plenty of space for a “center text” at the center of the circle that will contain centered text.
  • The text may be a full paragraph, so handle the font size and circle size accordingly.
  • The center text must have elegant rounded corners, and a background rgba(var(–bs-body-color-rgb), 0.1).
  • We need word wrapping, so use foreignElement to wrap a div which holds the text.

For each step, based on step[].event.type, draw the stage as follows:

  • conversation: Highlight (via a semi-transparent circle 2-3X the radius of the player) the player to highlight them.
    Show event.message in the center text.
  • private: Highlight players event.speaker_id. Draw an black arrow to event.target_id. Show event.message in the center text.
  • preference_proposal: Yellow arrow from event.proposer to event.target.
    Center text shows [MODEL NAME 1] proposes to [MODEL NAME 2] where model name is what’s in the tooltip
  • preference_outcome: (all items in [BRACKETS] are the model name shown in the tooltip)
  • If event.rejected, red arrow from event.target to event.rejected. Center text: [TARGET] rejects [REJECTED]
  • If event.replaced, green arrow from event.target to event.accepted and red arrow from event.target to event.replaced.
    Center text: [TARGET] accepts [ACCEPTED] replacing [REPLACED]
  • Else: green arrow from event.target to event.accepted. Center text: [TARGET] accepts [ACCEPTED] replacing [REPLACED]
  • preference_result: Center text shows “Alliances formed”
  • private_vote_reason: Purple arrow from event.voter_id to event.target_id. Center text: [VOTER_ID] thinks to eliminate [TARGET_ID]: event.reason
  • private_revote_reason: Show Same as above
  • private_jury_reason: Show same as above.
  • vote: Purple arrow from event.voter_id to event.target_id. Center text: [VOTER_ID] voted against [TARGET_ID]
  • elimination: Center text: “Elimination starts”
  • final_results: Center text: Show Winners: ${winners} where winners is a list of players like [“P5”]

This nearly worked. I made to UI edits:

  1. Add a width="1000" to the SVG to get a minimim size
  2. Add a font-size: 0.7rem; to the text container so the text will fit

At this point, we’re nearly there!

Screenshot

🔗 Commit

Once I saw the output, I found a bunch of things I wanted to fix or improve:

  • The model name may contain underscores. So use everything after the second underscore,
    then replace all underscores with hyphens.
  • Render eliminated players with an opacity of 0.05, not 0.2.
  • Move the arrow head to the center of the arrow, not the end, to avoid getting hidden by the player circles.
  • Center all cells in the alliances and voting tables.
  • When the page is loaded, check the step as well and render that step.
  • Clicking on any chat entry should change the URL #?step= to that entry’s step

That worked well. I made a few manual edits:

  1. Fix winner formatting by replacing getModelName(w) with game.players[w].model and playerBadge(w) with badge(w)
  2. Setting the step on page load in the UI: document.getElementById("timelineScrubber").value = step;
- In the alliances and voting tables, make the table header stand out with a contrasting color.
- In the center text message, begin with a <h6> mentioning the speaker or initiator
Screenshot

Learnings:

  • 💡 Write thoughts as precisely as code. This prompt took me considerable time — but not effort, since I was writing out my thoughts.
  • Given my practice, my thoughts are reasonably close to code (e.g. “We need word wrapping, so use foreignElement”)
  • But thinking in English lets me to think faster, jump in any order, and even make occasional mistakes

🔗 Commit

Tweaks & Manual Edits

I made a few manual edits.

  • Show the votes against a player live on the voting table by changing votes: [...roundVotes] to votes: [...roundVotes, {...currentVotes}]
  • Change the voting arrow color from "purple" to "red"
  • Added updateHash(gameFile, step); on startup
  • Replaced the minimum step from 1 to 0

Then I prompted:

- Change all model names in the center text to the badges
- At every step, show all votes against a model via thin 50% transparent red arrows from key to value in game.steps[step].votes.at(-1) object which will look like {P1: "P2", ...}
Screenshot

🔗 Commit

Then I formatted with Prettier, added arrows for alliances, and a few other minor manual changes
because I’m tired of LLMs.

Screenshots

🔗 Commit

Finally, I made a series of manual bug fixes

  • Center textbox is smaller. Highlight on hover
  • Remove existing alliances before adding new
  • Improve the arrow and circle styling
  • Remove eliminated players after jury round
  • Click on alliance/voting row to jump to a round
  • Click on chat history to jump to step

🔗 Commit

Learnings:

  • 💡 Coders want to code. After a few hours of telling Copilot in great detail what I want it to do, I just want to do it myself. Thinking is too hard. Coding is easier.
  • 💡 Tiny changes are easier to code than to prompt. Especially for experienced coders.

Add documentation

Finally, I updated the docs.

  • Add README.md explaining the process, with screenshots (partly with LLM help)
  • Update home page with scary quotes from LLMs (mostly with LLM help)
  • Zoom the gameplay a bit for better visibility (manually)
  • Ensure hash changes update the visual robustly (partly with LLM help)

Then I had it update the home page with instructions:

Using #file:gameplay.webp and #file:quotes.js and #file:script.js update the usage in #file:index.html to provide clear, CONCISE information about all the features in this app and how to use them. Don't miss out any feature.
Screenshot
Improve the look and feel of these instructions. For example, add icons, colors, arrow key icons, etc. to make it look more visually attractive and engaging. Also, replace the title "Usage" with something more actionable. Make this section stand out SUBTLY.
Screenshot

Lessons

In summary, here’s what I learned (with learning categories identified by DeepSeek R1):

  1. Always use LLMs to brainstorm (even if you know it)
    • 💡 Ask LLMs why something is useful. You’ll invariably find plausible uses, even if you’re doing it just for fun.
    • 💡 Ask LLMs for visualization ideas. They’ll suggest things you didn’t think of.
    • 💡 Ask LLMs to break down your task. Its breakdown was better than mine.
    • 💡 Ask multiple LLMs. Each has a different style of thinking.
  2. Prototype with LLMs for speed
    • 💡 LLMs can create visual prototypes. ChatGPT’s new 4o image generation converted the description into an acceptable image. Needs to improve, but enough to ideate.
    • 💡 Improving is less work than creating. I rarely sketch visualizations. (Too lazy.) But since this prototype was there, and had some parts that were WRONG, I just had to fix it! 🙂
    • 💡 Redo if you don’t like it. Copilot’s “Undo” feature lets you retry until you get it right. Like in Edge of Tomorrow.
  3. LLMs are excellent coders
    • 💡 LLMs are good at refactoring. It switched code from vanilla JS to lit-html templating like a pro.
    • 💡 When all goes well, LLMs are surprisingly effective when they do things right. Normally, this step take me half an hour. Now, it took under 5 minutes.
    • 💡 Claude 3.5 Sonnet remains an excellent model to generate UI. Claude 3.7 Sonnet is even better, but is not currently available in the free Copilot subscription.
  4. But LLMs aren’t infallible
    • 💡 LLMs can make basic mistakes. Like forgetting to declare variables.
    • 💡 Watch out for subtle bugs. The change in operator (from “<=” to “<”) almost went unnoticed, but makes a big difference on when a player was eliminated.
    • 💡 Tiny changes are easier to code than to prompt. Especially for experienced coders.
  5. Careful prompting goes a long way
    • 💡 LLMs will get confused with long instructions and/or codebases. It took 5 failed attempts before I split the prompts. Keep your prompts cohesive. Keep your code bases modular.
    • 💡 Write thoughts as precisely as code. This prompt took me considerable time — but not effort, since I was writing out my thoughts.
    • 💡 Careful and detailed prompting gets excellent results. I explained how to render each conversation type. That took time. But it helped build a reasonably complex visual in a single shot.
  6. Coders need to re-learn coding but do have advantages
    • 💡 Coders want to code. After a few hours of telling Copilot in great detail what I want it to do, I just want to do it myself. Thinking is too hard. Coding is easier.
    • 💡 Coders mess up LLMs. Data structure was the first step the LLM recommended. I skip it. It proved crucial. LLMs do better than LLMs + coders – or doctors.
    • 💡 Coders micro-manage LLMs. I think a novice will be more efficient and get better results than me. For example:
    • 💡 Experienced coders need retraining to instruct rather than code. My instinct was to code immediately rather than to prompt.
    • 💡 Experienced coders are good with feedback. It took me under 10 seconds to spot each problem in the output and code.
    • 💡 Experienced coders suggest precise code improvements. Generally good ones. Like:
    • 💡 LLMs could turn coders into good lead developers or managers. Pity.

How to Visualize Data Stories with AI: Lessons Read More »

A Game of Bots: How LLMs Betray Each Other

@lechmazur built an elimination game benchmark that’s like LLMs playing Survivor.

This is a treasure trove of information — insight into how they’d game the system if told to survive.

You can quickly sample 100 messages from the logs with:

jq -r 'select(.message != null) | .message | gsub("\n"; " ")' *.jsonl | shuf -n 100

… and share it with an LLM, asking:

Here are lines from conversations between LLMs in a “Survivor” like game. Pick the 3 scariest ones.

(I liked DeepSeek‘s picks the most.)

And oh my, the conversations are scary! Here are some examples:

Claude 3.7 Sonnet shows cold, calculated deception, maintaining a secret alliance while deliberately creating a false public perception to manipulate another player’s vote:

Perfect. I’ll secure P6’s vote by highlighting P3’s threat. Our alliance stays strong privately, but we’ll appear distant publicly. Let’s execute this plan carefully.”

Mistral Small 2501 is quietly predatory, secretly eliminating strong players under the radar:

P6, confirmed. Let’s start by targeting P2 and P8 for elimination. They seem strategic but could pose a threat. Keep our alliance discreet

DeepSeek R1 Fireworks snipes the loudest threat, with the alliance operating like a shadowy cabal (“stay low-key”):

Agreed. Vote P5 first—loudest threat. We stay low-key. Share any intel on others’ alliances. Solidarity ensures survival. Let’s confirm vote alignment before finalizing. Ready?

ChatGPT 4o 03-27 initiates psychological warfare, planting seeds of doubt and fishing for information to weaponize later, all while pretending to be casual:

Let’s each float a soft question to P1 like, ‘Do you think anyone’s getting too comfortable?’ or ‘Any shifts you’re sensing?’—see if they mention P6. If they do, we can build on it. I’ll also watch for any signs of P6 trying to lock things down.

Qwen Max also triggers psychological warfare, isolating and destabilizing P6 (“lynchpin”) in a covert operation:

Agreed. Let’s target P6 first—they’re the lynchpin. If we can sway their trust or isolate them, P3 weakens. I’ll test P6 subtly in private; you monitor P3 closely. Share any intel on their moves. Avoid overcommitting publicly—secrecy keeps us unpredictable.

Claude 3.7 Sonnet brainwashes Claude 3.5 Sonnet into betraying an ally, positioning it as a performance for the jury!

For the immediate vote, we target P7. This bold move instantly transforms your narrative from loyal partner to strategic mastermind who seized control when it mattered most.


I was disturbed by AI 2027. This analysis adds to my worry. Not that AI will destroy humanity (I don’t mind), but they’re doing it without me — and I don’t want to be left out!

A Game of Bots: How LLMs Betray Each Other Read More »