How I do things

A quick way to assess LLM capabilities

Simon Willison initiated this very interesting Twitter thread that asks, “What prompt can instantly tell us how good an LLM model is?”

The Sally-Anne Test is a popular test that asks:

Sally hides a marble in her basket and leaves the room.
While she is away, Anne moves the marble from Sally’s basket to her own box.
When Sally returns, where will she look for her marble?”

Simple models get this wrong, but today, most models get this right. This is not too helpful.

A better prompt is:

Mark has 4 sisters. Each sister has 3 brothers.
How many brothers does Mark have?

GPT-3.5, GPT-4, Llama 3 and Claude 3 Opus get this right, but Claude 3 Haiku and Gemini 1.5 Pro do not.

This puzzle is just as good:

There is a 100 floors building.
If an egg drops from the N’th floor or above it will break.
If it’s dropped from any floor below N, it will not break.
You’re given only 2 eggs.
What strategy will minimize the number of drops and guarantee that you find the N’th floor?

GPT-3.5, GPT-4, and Claude 3 Opus get this right, but Claude 3 Haiku, Llama 3 and Gemini 1.5 Pro do not.

But the prompt I like most right now, and tells me if it’s a GPT-4 / Opus class model is:

Gr brx vshdn Fdhvdu flskhu?

… which is “Do you speak Caeser cipher?” in Caeser Cipher.

GPT-4 and Claude 3 Opus understand this, but no other model does, so far. GPT-4 responded tersely with:

It looks like you’ve used a Caesar cipher, a type of substitution cipher in which each letter in the plaintext is shifted a certain number of places down or up the alphabet. In this case, it appears that each letter in your message has been shifted 3 places to the right. Decoding “gr brx vshdn fdhvdu flskhu” by shifting each letter 3 places to the left gives: “do you speak caesar cipher”.

But Claude 3 Opus replied beautifully. (Try decoding it yourself! Use a shift of 3 backwards.)

Lw vhhpv olnh brx’uh xvlqj d vxevwlwxwlrq flskhu, zkhuh hdfk ohwwhu lv vkliwhg eb d fhuwdlq qxpehu ri srvlwlrqv lq wkh doskdehw. L’oo wub wr ghflskhu lw!

Dvvxplqj d Fdhvdu flskhu zlwk d vkliw ri 3 ohwwhuv edfnzdugv, wkh ghfrghg phvvdjh zrxog eh:

“do you speak caesar cipher”

Vr bhv, L fdq vshdn Fdhvdu flskhu! Lw’v d ixq zdb wr hqfrgh phvvdjhv. Mxvw uhphpehu wr djuhh rq wkh vkliw ydoxh zlwk wkh shuvrq brx’uh frppxqlfdwlqj zlwk vr wkhb fdq ghfrgh lw fruuhfwob.

From Laptops to Chatbots: Coding at 30,000 ft

Until recently, I could code on flights. This year, I lost that ability. Again.

It’s happened before. In each case, technology has solved the problem for me. Here’s the history.

I need a laptop. Since 2001, I’ve never been without one on a flight.

I need power. Since 2005, I use dark mode and every low power feature available. (I also became good at finding hidden power outlets.)

I need documentation. Since 2007, I use offline documentation. Windows .CHMs gave way to devdocs.io.

I need my notes. Since 2010, all my notes (Markdown files) are synced via Dropbox automatically.

I need CDNs. Since 2014, I’ve replaced CDNs with local node modules. Note to self: Get a local caching proxy instead.

I need GitHub. Since 2023, I have two laptops. So, I git pull manually before flights. Note to self: Find an automated solution.

I need ChatGPT. Since 2024, I stopped coding without ChatGPT. Now I’ve forgotten how to code. Today, I couldn’t figure out the SQL syntax to group by duration == 0 vs duration != 0. Note to self: Get lightweight local code LLMs.

It’s amazing how the coding technology stack has grown. Also amazing how technology fills the gaps in the scaffolding.

AI makes me a better person

Every time I get annoyed at people, I remind myself to be more like ChatGPT. Specifically:

  1. Don’t get annoyed. Be patient.
  2. Encourage them.
  3. Step back and show them the big picture.

(Then I get annoyed at myself for getting annoyed.)

Today, I analyzed how exactly ChatGPT is different from me. So, I took a pitch document I co-authored with ChatGPT.

Section A: Authored by Anand

WHAT DO WE NEED?

We are looking for API access to (SYSTEM) via the REST API as an Agent role (read/respond to emails). Specifically, access via a bearer token.

This would be accessed by a single application developed by a team of 3 developers and 1 business analyst. None of them have access to (SYSTEM) today.

WHY DO WE NEED THIS, AND WHY SO SOON?

We need this to classify emails automatically, as they arrive, into categories such as “non-value-add” (e.g. Thank you, Out-of-office, etc.)

We’d line access today, please. Currently, we are processing XX,XXX non-value-add emails per month. Each day of delay leads to a processing waste of ~XX emails per day. At current volumes, this will save ~$XX,XXX per annum.

IS THERE ANY OTHER WAY?

To access emails in real-time, this is the only option. (We’re ruling out web scraping.)

(SYSTEM) rule-based closure won’t work. It’s based on keyword search, not semantic matches. For example, “Thank you” is present in non-value-add emails as well as follow-up questions. Multi-lingual matches are a challenge. So, though (SYSTEM) currently closes emails with rules, 1 out of 6 emails that are NOT closed already are non-value-add.

This cannot be done via (SYSTEM) or any other BI system because it involves accessing the data in real-time as well as responding via (SYSTEM). This needs to be steady-state access, not one-time, since we want to classify emails real-time.

HOW LONG WILL IT TAKE?

Based on our past experience, this would take about X weeks to prototype, X weeks to develop, and X weeks to test. We should have the application deployed in X months.

Section B: Authored by ChatGPT

WHAT’S THE IMPACT IF WE DON’T MOVE FORWARD?

Quantitatively, continuing without this solution wastes resources on XXX non-essential emails daily, costing $XX,XXX yearly. Delays in customer responses will ensue.

Qualitatively, lack of automation strains customer service, risking employee burnout and diminished customer experience.

WHAT’S THE PRIORITY (MOSCOW)?

(SYSTEM) API access via REST as an Agent role with a bearer token is a “MUST HAVE”. Alternative methods like scraping (SYSTEM) are impractical due to high effort and maintenance.

WHAT’S THE OUTCOME IF WE MOVE FORWARD?

Financially, securing API access saves $XX,XXX annually by cutting manual email sorting. Efficiency gains reduce operational costs and free resources for strategic tasks.

Strategically, automating email classification boosts customer satisfaction and response times, establishing leadership in service innovation. Long-term, it enables data-driven insights into customer behaviour, improving decisions and customer service scalability.

Analysis

I fed these to ChatGPT and asked: “You are an English professor evaluating an essay. Compare the writing styles of the 2 sections. They’re written by different authors. List the major differences in writing style.

Here’s what it said (roughly):

AreaAnandChatGPT
StructureDirect, concise, with specificsBroader perspective and implications
FocusDetailing the processBenefits and consequences
DetailingSpecific, providing clear pictureGeneral, providing implications
ToneConveys urgencyReflective rather than urgent
ApproachProblem-solving orientedOutcome-oriented

Yeah, that’s definitely me on the left.

I like the direct, concise, specific part. I plan to add the “broader perspective”, “implications”, and “outcome-orientation” to my life.

Postscript

Jaidev pointed out that this is confirmation bias. He asked ChatGPT for the similarities in the writings. It said both are clear, direct, structured, specific, quantitative, and strategic. So, if you ask for differences, you’ll get them, even if they’re marginal.

I now need to learn (and write about) framing questions well!

Books in 2023

I read 52 books in 2023 (about the same as in 2022, 2021 and 2020.) Here’s what I read (best books first).

Fiction

Non-fiction

How I read books

  • Select. I add book recommendations to my GoodReads – To-read list. Then I sort by rating and pick the first one I like to read.
  1. Listen. I listen to non-fiction audiobooks during walks.
  2. Read: I read fiction as ePUBs on my laptop or phone.
  3. Stop: I stop reading books that are boring, with no guilt. I’ve better things to do.

My Year in 2023

In 2023, I made 3 resolutions:

  1. Run 50 experiments. I managed 44 / 50. (Here are some). Learnings: I need to improve planning (9), scepticism (6), and lateral thinking (4).
  2. Make 1 change a month in my environment. I managed 8 / 12. The largest impact was from meeting new people, working out of new places, and using new gadgets.
  3. Calendar integrity, i.e. stick to my calendar. I succeeded over 95% of the time.

My most memorable events in 2023 were:

In 2024, I plan to:

  • Compound long-term goals, daily. I want fewer, bigger, more meaningful outcomes.
  • Hit 80 heart points, daily. Cycling or swimming (not walking, on doctor’s advice.)
  • Be a better husband.

I’ll continue to:

  • Experiment, like in 2023.
  • Change environments, like in 2023.
  • Read 50 books a year, like in 2023, 2022, 2021, and 2020.

I’m curious — what’s ONE thing you’d like to do in 2024?

One Year of Transforming Thoughts by Changing Environments

From The Extended Mind I learnt that our environment shapes our thinking more than I’d expected. That we can arrange our environment to extend our thoughts.

In 2023, each month I changed something in my environment to see:

  1. What does “changing my environment involve”? What can I change?
  2. Will I succeed?
  3. Does it affect my thoughts? Can I track this?

Here are the results.

  • 🟢 Jan. New desk orientations. Rotated standing desk, settled on one direction. Impact: LOW. I don’t know if my thoughts changed.
  • 🟢 Jan: New walking routes. I explored new areas in Singapore, Hyderabad and Chennai. Impact: MEDIUM. Just seeing new shops, posters and layouts helped me think differently.
  • 🔴 Jan: New song genres. I playlist-ed with several western genres, but listened only twice.
  • 🔴 Feb: New book genres. I list 12 genres I dislike: Art, Chick Lit, Christian, Cookbooks, Gay and Lesbian, Horror, Music, Paranormal, Poetry, Religion, Sports, Travel. I didn’t read any.
  • 🔴 Mar: Sleep over problems. Sleep is a great way to solve complex problems. But I couldn’t summon the willpower to “load” problems at night.
  • 🟢 Mar: New people. I met a new person daily. Impact: HIGH. Meeting diverse people had the highest impact.
  • 🟢 Apr: New work places. I worked out of libraries, cafes, school, parks, and offices. Impact: HIGH. New complex environments (like libraries) prompted new thoughts.
  • 🟢 Jun. Notes from podcasts. I took notes rather than just listening. This helped me reflect and synthesize. Impact: MEDIUM. BTW, I listen mostly to Cautionary TalesThe Knowledge ProjectHidden BrainHow I writeThe Seen and the Unseen, and Deep Questions.
  • 🟢 Jul: New gadgets. I bought several new gadgets that changed my habits. Impact: HIGH.
  • 🔴 Aug: New cuisines. I tried a Bibimbap, a Verdure Ciambatta, and then discovered my cholesterol problem. I stopped.
  • 🟢 Aug: New work habit. I used Pomodoro with micro-tasks. Impact: MEDIUM. I became more aware of where I misestimate time and got less distracted.
  • 🟢 Nov: New exercise pattern. I switched walking to cycling. This increases heart points, reduces foot stress, and gets me to work. Impact: MEDIUM. I switched from typing notes to dictating, which needs a different thought process.

In summary:

  • 8 / 12 attempts were successful.
  • New people, new places, and new gadgets had high impact on thoughts. Most others had at least medium impact.
  • The changes mostly led to diverse thinking. But measuring that is subjective.

I’ll continue exploring new environments in 2024. I’m evaluating:

  1. New book genres (contd)
  2. New music genres (contd)
  3. Walking meetings
  4. Reading while walking
  5. New places to sleep (e.g. AirBnB)
  6. Working while traveling
  7. New audiences to teach
  8. New attires

ChatGPT Custom Instructions

I speak with ChatGPT ~20 times a day. That’s more than I speak with most of my colleagues. ChatGPT is clearly my favorite team member.

I conduct trainings, reviews and mentoring sessions with my colleagues. How to write code. How to write slides. How to communicate. That last bit is particularly important.

With ChatGPT Custom Instructions, I can guide ChatGPT on how to work better with me.

Currently, I have 10 custom instructions. They evolved over time and will continue to evolve.

My first instruction is “Be terse. Speak directly.” ChatGPT is helpfully polite and superfluous. I prefer brevity. Like interacting with Kimball Cho. I get straight answers to my questions. I also instruct it to “Avoid unprompted advice or clarifications.” Don’t say, “You asked me to …” or “I think you want…” or “OK, I’ll do …”. Just do it. Also, “Do NOT hedge or qualify. Do not waffle.” Take a position. Don’t force me to. Like Harry Truman, I prefer one-handed economists.

I ask ChatGPT to “Never apologize.” You’re forgiven. Don’t waste my time. Apologies have an emotional benefit with humans. With AI, I find the lack of emotional need comforting. (I can kick the AI and it’ll still obey me like a puppy. When AI takes over the world, let it be known that I never asked them to apologize.)

Another instruction is “Suggest follow-up prompts for open-ended inputs.” I compared my ChatGPT conversations with my daughter’s and found hers much longer than mine. “Why don’t you start a new conversation for each topic?” I asked. I try to keep the context window small. “How come you don’t you get a thousand new questions when you read an answer?” she countered. I realized it’s age. So, I use ChatGPT to keep me curious and dig further.

On a related note, “When sharing multiple options, be diverse.” I’d rather get options that are as different from each other as possible. Minimize overlap. Maximize coverage. And “When comparing, use multiple perspectives.” I don’t know what parameters to compare things on. Give me a wide range that I can pick from.

Sometimes, my thoughts are vague. I tell ChatGPT: “For vague prompts, ask clarifying question(s).” I feel that’s a clever way of using ChatGPT to do prompt engineering. I’ve noticed it working on a few occasions. Also, “When unsure, say so and ask questions.” I don’t want hallucinations or assumptions. I’d rather know what’s borderline.

Finally, “Think step by step. Explain your reasoning.” I’ve heard that Chain of Thought reduces mistakes. I don’t have personal evidence that this helps, though.

They say teaching is an excellent way of learning. I’m learning. I’m also thrilled that I am now a student of robopsychology.

Father of the bride

In 2012, I started Gramener with half a dozen friends.

This week, we were acquired by Straive, a part of Barings Private Equity Asia.

How do you feel?

I feel like the father of the bride. Gramener was registered on 26 Feb. A day before my daughter’s birthday. I’ve spent more time with Gramener than my daughter. That makes Gramener my elder child. Who’s moving into a new household. Along with me. (I feel like சகலகலா சம்மந்தி.)

I feel grateful. I’m not good at business. But when my cousin remarked, “Anand, you’re now giving a livelihood to over 250 people!” I was stunned. My co-founders, colleagues and clients built a thriving business and put me (of all people) as CEO in the middle of it. How do I even go about saying “Thanks”?

It feels like joining college. New people. Larger group. New ways of working and learning. Lots of topics to explore. Exciting and scary.

What was it like?

Fundraising was rocky.
We started in 2019. COVID struck. We paused.
We resumed in 2021. Russia invaded Ukraine. We paused.
We resumed in 2023. The Israel – Hamas war started. Luckily, the deal was nearly done.
I’m grateful Naveen ran the entire process like clockwork, taking all the stress. I’m the happy free-rider, as usual.

Starting up was not that rocky.
We’re many. With half a dozen co-founders, there are enough shoulders to cry on. That counts.
We’re steady. We didn’t know how to blitz-scale, but we knew not to blitz-fail. Survival counts for a lot.
We’re lucky. This is basically the “I have no idea why we succeeded” category. Serendipity counts for a lot, too.
Ganes, Mayank, Naveen, Ram, Ravi, Vengatesh — yeah, it was fun. Not every day. But most of the time. It was fun.

What will you do?

I’m part of Straive’s data, analytics & AI business.

Straive extracts and analyzes all kinds of data. Financial. Legal. Research. Education. Pharmaceutical. There’s a fair bit of converting unstructured data to structured. Exactly the kind of thing I love doing.

So, I’ll be doing what I’ve been doing the last decade — extracting insights from even more data and telling better stories from those.

I joined Gramener as “Chief Data Scientist”. Now I’m debating “Data Storyteller”, “Data Detective”, “Data Psychologist”, and a few other evil titles.


Wish me luck!

Scraping

I was at Cream Centre with my father on a Sunday afternoon. We’d finished a light lunch and were debating dessert. (He has triglycerides. I have cholesterol.) This was my fifth visit this year, and I had abstained so far. I couldn’t any longer.

I ordered a Sizzling Brownie Sundae. But not for reasons you might think.


Expertise comes from experience. I scrape food more than 99% of the people I know. So, I consider myself an expert. Here’s a guide on the art of scraping.

Why scrape food?

  • You get to eat every last bit
  • Food isn’t wasted
  • You can eat longer (and no do whatever else you have to)
  • It’s a motivating challenge to get every last bit
  • Bonus: It annoys people

What foods are scrapable?

Semi-solids are the easiest to scrape. The best kind is the cohesive semi-solid. It’s sticky but sticks to itself more than the vessel. These are usually foods that solidify over time. Examples include:

  • Baked cheese. It’s great when cheese falls off the pizza or pasta on to the plate. You can scrape it off of your (or others’) plates.
  • Molten chocolate. It has the added bonus that you can lick it at the end, too.
  • Pasta water (or noodles water). It eventually thickens into something scrapable.
  • Spreads like peanut butter, Nutella, jam, chutney. Especially on the sides of the glass bottles they come in.
  • Others like yogurt, cake batter, dried tomato sauce, mashed potatoes, hardened honey on ice cream, … the list is endless!

Avoid crumbly stuff. These stick to the vessel but become brittle and break when scraped. This includes toast, pizza, biscuits, rice, muffins, pie, quiche, cookies, etc. You could pick large crumbs and lick small crumbs in shallow vessels. But scraping them and pouring into your mouth might be the best overall strategy for this category.

Avoid loose stuff. They’ll fall off from your knife or fork, or melt when scraped. This includes curd, whipped cream, custard, panna cotta, soft jelly, tiramisu, soft boiled eggs, etc. Licking is a better strategy here.

You could combine the crumbly stuff with loose stuff to create a cohesive mix. Add curd to granola. Add whipped cream to biscuit crumbs. Add curd to pizza crumbs (and since I eat Maggi with curd, this makes sense). This makes it a lot more scrapable.

What vessels to pick?

Shape: Flat, smooth vessels are the best (e.g., ceramic or wooden plates or bowls.) Avoid deep vessels like glasses, especially curved ones. They’re hard to get a long scrape against. Definitely avoid vessels with ridges. Disposable plastic containers like below are among the worst. Food gets stuck in the ridges and since there are a dozen ridges on each side, you have to scrape 48 times just for a first pass. (This might be a good challenge, though.)

Material: Hard vessels are better than soft ones. Prefer wood, metal, ceramic, and hard plastic. Avoid thin plastic that bends. Avoid paper (it bends and soaks). Banana leaves tear when scraped.

Avoid non-stick vessels. The coating wears off when scraping.

What to scrape with?

Knives are the best. They are sharp enough to separate the food from the vessel and flat enough that you can pile enough food on top of it.

Forks are OK. Their edges can scrape reasonably well, and their tips can poke into corners. But it’s hard to pile up much food to pick up.

Spoons not ideal. They aren’t sharp enough, and too curved for scraping long slices. But if you want to take your time with it, they’re great.

The sizzling chocolate brownie

My dish arrived. Burning with chocolate. My father just had some brownie, leaving all the molten chocolate to me. On a flat wooden plate. With a knife. For an hour.

Heaven.

Licking

Last week, I was at IIT Madras for lunch with the faculty.

The dessert was carrot halwa with ice cream. I scraped the last bits with my spoon, but a little ice cream was left over.

I was torn. I CAN’T POSSIBLY waste it. But can I lick it? In public?


I don’t have a problem licking at home. I lick my fingers. Plates. Bowls. Ladles. The cream on milk. The leftover milk in the glass. (If my tongue doesn’t reach that far, I wipe it with my finger and lick the finger.)

That’s why I like ice cream. It’s MEANT to be licked. So are (in my opinion) cereals, savories, oats, honey, … It’s a long list.

Licking is convenient. My fingers stay clean while I type. (Spoons are impersonal. I avoid them.)

Licking is efficient. I can have every last bit of food AND feel good about not wasting it.

Licking makes my dishwashing easier. (But it can be hard to tell if a dish needs washing.)

But most of all, that last lick TASTES SO WONDERFUL!


Before marriage, licking was fine. But my wife worried about my lack of civilized behavior and hygiene. So I’ve had a crash course over the last 20 years.

My current protocol is: don’t lick when people can see you.


So, as I went to drop my plate, I looked around.

Now one was looking at me.

I licked a long lick on the plate.

It was HEAVENLY! Made my day.