ChatGPT is a psephologist and data analyst

After having O4-Mini-High scrape Singapore 2025 election results, I asked it to create 3 data stories with this prompt: That worked. Now, I’m sharing the scraped CSV as well as the electoral GeoJSON. First, analyze the data and think of a few interesting data stories to tell. Pick the 3 most interesting, perhaps surprising, stories. Create a BEAUTIFUL, APT data visualization of each of these 3 stories suitable for The Strait Times and write a short accompanying article. ...

It's not what you know. It's how you learn

Simon Willison’s blog post mentioned MDN’s browser compatibility tables that list the earliest release date for each browser feature. I figured: let’s see which browsers release features fastest. I calculated average delay for each browser’s feature release. For each browser, I looked at how many days after the first release it took to add a feature, averaged it, and published an interactive, scrolly-telling data story. ...

Students who are more engaged score more

This is about as insightful as the Ig Nobel winning papers “Boredom begets boredom” and “Whatever will bore, will bore” that methodically documented that bored teachers lead to bored students. But in the spirit of publishing all research without bias for success or novelty, let me share this obvious result. The Y-axis represents the total score of ~2,000 students on 4 graded assignments, each of ~10 marks. The X-axis represents the percent rank of engagement. The most engaged students are at 100%. The least are at 0%. ...

Halving a deadline costs 1.4% of marks each time

Does it make a difference if you submit early vs submit late? Here’s some empirical data. About ~1,000 students at IIT Madras took 3 online quizzes (GA1, GA2, GA3) in the last few weeks. The deadlines were all at midnight (India) on different days. Here’s when they submitted their final answers: There was a spurt of submissions at the last minute. ~1 out of 8 students submit with < 10 minutes remaining. Most students submitted ~4 hours before the deadline. In fact, 3 out of 4 students submit on the same day as the deadline. A fair number of students submitted the previous day/night. 1 out of 6 are diligent and submit a day early. But does submitting late help, since you get more time? Apparently not. ...

What does Gramener ask ChatGPT?

I looked at how Gramener uses ChatGPT Plus by evaluating 600+ chats asked over 3 months from Oct 2023 to Jan 2024. The team asks 6 questions a day. We don't track who or how many actively use ChatGPT Plus. This also excludes personal ChatGPT accounts. Still, 6/day is low for an entire team put together. The questions fall into 8 categories. Category%Excel, data exploration & analysis25%Text extraction and summarization13%HTML, CSS, or JavaScript code13%Python code13%LLMs, AI and use cases9%OCR and image analysis9%Generate images, logos, and designs7%General knowledge, policy & environment5%Audio and translation5% Here are some questions from each category - to give you an idea of emergent ChatGPT Plus usage. ...

Learning to speak better

Microsoft ported its PowerPoint Speaker Coach to Teams. Since September, it's given me suggestions covering 11 hours in 77 calls (I speak ~10 min/call.) I say "uhh" a lot. That's intentional I use the filler word "uhh" in 70% of my calls. That did not surprise me. I do that intentionally. On a poor network, they know I'm still connectedThey know I'm going to say somethingI sound less confident. That invites critique I can learn from But I also use filler words like "You know" and "I mean" in half the calls, and "like", "actually", and "basically" in a fifth. That's NOT intentional, and I'll be conscious. ...

Old songs in my music library

My music library has around 1,000 songs (mostly Tamil and Hindi, with some Telugu and English film songs). I spent this morning tagging them by year with mp3tag. (Manually. You don’t automate the pleasures of life.) I thought my 1990s collection would be the largest. I was in college, listening to lots of music then. But surprisingly, my collection has grown post the 1990s. I have 3 guesses why. ...

How to find a Chinese actor to cast in Hollywood

Film actors mostly act within their own industry. For example, Hollywood actors act outside Hollywood just 10% of the time. Chinese actors act with non-Chinese actors just 1% of the time. So, if you’re a Hollywood producer trying to cast a Chinese actor, how would you find them? One way is to list Chinese actors with the largest number of Hollywood co-stars. Let’s see who tops that list. ...

How isolated is Bollywood from world cinema?

These are the major group actors based on who they act with most. Language. Not country. For example, the Spanish / Mexican group is across countries. But Indian actors divide into North Indian and South Indian. It’s language, not country. Time period. Old American actors are a separate group from Hollywood. (Naturally. Brad Pitt was born after Humphrey Bogart died. They couldn’t have acted together.) Genre. Hollywood Porn actors don’t act with mainstream Hollywood. Same with Japanese Porn, Hollywood TV, and Hollywood Horror actors. How are these groups themselves connected? Do Chinese actors act with Hollywood often? How isolated is Bollywood from world cinema? ...

Can foreigners enter Hollywood?

An aspiring Malaysian actor posted on Reddit: I am a 18-year old biracial Malaysian kid who wants to be an actor in Hollywood. I’m taking a diploma for performing arts in a college called Sunway University in 8 days and I’m considering pulling out of it because why do something that I like when my dreams might never be fulfilled and the price for taking this diploma is seriously expensive. I am starting to doubt my chances of making it to Hollywood and I suffer from extreme anxiety. Is it possible for someone like me to enter Hollywood? What are my chances? ...

Releasing modified mosquitoes precisely

At PyCon Indonesia, I spoke about a project we worked on with the World Mosquito Program. The World Mosquito Program (WMP) modifies mosquitoes with a bacteria – Wolbachia. This reduces their ability to carry deadly viruses. (It makes me perversely happy that we’re infecting mosquitoes now 😉.) Modifying mosquitoes is an expensive process. With a limited set of “good mosquitoes”, it is critical to find the best release points that will help them replicate rapidly. ...

Jolie No. 1

There are more Bollywood actors in Hollywood. Some are even turning down Hollywood roles. So we wondered: How easily can a Bollywood actor connect to a Hollywood actor? As part of the Oct 2019 Gramener data story hackathon, Anand, Kishore, and Niyas created a Jolie No 1 — a data video where [Govinda](https://en.wikipedia.org/wiki/Govinda_(actor) announces (in our imagination) that he will act with Angelina Jolie in Jolie No 1, but declines to comment on who introduced them. We picked a theme first The hackathon theme was “movies”. We explored 5 themes: ...

How to direct a data movie

Ganes and I created a data movie on speed-cubing records as part of a Gramener hackathon. Here’s a video of us talking about how we created it. Anand: We picked the Rubik’s cube story for this hackathon. Tell me more about how this excited you. Ganes: Since my son started solving the Rubik’s cube a few months back, I’ve been fascinated with these competitions. I still don’t know how to solve it, but I like watching it. ...

2 inches will change my life

I walked ~11 million steps in the last 3 years, at ~10K steps daily. Since 1 Jan 2018, I've steadily increased my walking average until Aug 2018. Then my legs started aching. So I cut it down until Jan 2019. In Feb, I resumed and was fairly steady until May 2020. To complement workouts like this, products that are aimed for men over 50 can be used. In May, my wife refused to let me walk for more than an hour a day. It took me a few months to convince her and level up. I ended 2020 averaging a little over 10K steps for the year. ...

Mystery of the extra returns

This month, I sold half my Indian equity mutual funds and was researching funds to invest in. I was looking for something safe & long term. As I was exploring 10-year Gilt Funds (mutual funds that invest in the Indian Government’s 10-year bond), I noticed that they had a pretty high yield – mostly over 10%. I took a closer look at ICICI Prudential’s Constant Maturity Gilt Fund. (They had the lowest expense ratio.) The annualized returns over the last 5 years were 10.77%, and it’s never fallen below 10% in the last 5 years. ...

Restartable and Parallel

When processing data at a large scale, there are two characteristics that make a huge difference to my life. Restartability. When something goes wrong, being able to continue from where it stopped. In my opinion, this is more important than parallelism. There’s nothing as depressing as having to start from scratch every time. Think of it as the ability to save a game as opposed to starting from Level 1 in every life. ...

Faster data crunching

I’ve been playing with big data lately. The good part is, it’s easy to get interesting results. The data is so unwieldy that even average value calculations provoke a “Amazing! I didn’t know that,” response (No exaggeration. I heard this from two separate ~ $1bn businesses this month.) The bad part is that calculating even that simple average is slow. For example, take this 40MB file (380MB unzipped) and extract the first column. ...

India district map

I put together a district map of India in SVG this weekend. So what? You can now plot data available at a district level on a map, like the temperature in India over the last century (via IndiaWaterPortal). The rows are years (1901, 1911, … 2001) and the columns are months (Jan, Feb, … Dec). Red is hot, green is cold. (Yeah, the west coast is a great place to live in, but I probably need to look into the rainfall.) ...

What does India search for?

Over the last couple of years, I’ve been tracking the top 5 hot searches in India on Google Trends (http://www.google.co.in/trends). Here are the results: If you're interested in making visualisations out of it, please feel free. But there's one particular thing I'm trying out, which is to categorise these searches and see if there's a trend around that. I've added a "Tag" column. Could you please help me tag the spreadsheet: https://spreadsheets.google.com/ccc?key=0Av599tR_jVYgdE5zTU5QWjcxVWVCaTBuY3d0NkUtc1E&hl=en_GB It’s publicly editable, no special access required. If you could stick to the tags I already have (Business, Education, Entertainment, News, Politics, Sports, Technology), that would be great. If not, that’s fine as well. And if you’ve made any visualisations or done any analysis using this data, please do drop a comment. ...

Shortening sentences

When writing Mixamail, I wanted tweets automatically shortened to 140 characters – but in the most readable manner. Some steps are obvious. Removing redundant spaces, for example. And URL shortening. I use bit.ly because it has an API. I’ll switch to Goo.gl, once theirs is out. I tried a few more strategies: Replace words with short forms. “u” for “you”, “&” for and, etc. Remove articles – a, an, the Remove optional punctuation – comma, semicolon, colon and quotes, in particular Replace “one” with “1”, “to” or “too” with 2, etc. “Before” becomes “Be4”, for example Remove spaces after punctuations. So “a, b” becomes “a,b” – the space after the comma is removed Remove vowels in the middle. nglsh s lgbl wtht vwls. How did they pan out? I tested out these on the English sentences on the Tanaka Corpus, which has about 150,000 sentences. (No, they’re not typical tweets, but hey…). By just doing these, independently, here is the percentage reduction in the size of text: ...