Books in 2024

I read 51 new books in 2024 (about the same as in 2023, 2022, 2021, and 2020.) But slightly differently. I only read Manga this year. Fullmetal Alchemist (Vol 12 - 27). What started off as a childishly illustrated children’s book evolved into a complex, gripping plot. Attack on Titan (Vol 1 - 34). I read it while I watched the TV Series (reading first, then watching). It started explosively and the pace never let up. I had to take breaks just to breathe and calm my nerves. The sheer imagination and subtlety is brilliant. It’s hard to decide which is better—the manga (book) or the anime (TV). The TV series translates the book faithfully in plot and in spirit. It helped that I read each chapter first, allowing me to imagine it, and then watch it, which told me what all I missed in the book. I absolutely would not have understood the manga without watching the anime. ...

My Year in 2024

Here's the report card for my 2024 resolutions: Compound long-term goals, daily. PASS. I managed to work continuously build on 6 areas in 2024: Blogging about 50 posts on my blog and on LinkedIn Weekly notes of things I learned Teaching Tools in Data Science (repo) Reading only Manga Experimenting with LLM applications LLM Evangelization through LLM Foundry, Straive's LLM portal. Hit 80 heart points, daily. FAIL. I stopped exercise in the second half and gained 7 kgs. Be a better husband. PASS. My wife confirmed that I was "definitely worse in 2023 than 2024." My most memorable events in 2024 were: ...

When and how to copy assignments

The second project in course asked students to submit code. Copying and collaborating were allowed, but originality gets bonus marks. Bonus Marks 8 marks: Code diversity. You're welcome to copy code and learn from each other. But we encourage diversity too. We will use code embedding similarity (via text-embedding-3-small, dropping comments and docstrings) and give bonus marks for most unique responses. (That is, if your response is similar to a lot of others, you lose these marks.) In setting this rule, I applied two principles. ...

My learnings as week notes

One of my goals for 2024 is to “Compound long-term goals, daily.” Learning is one of those. Some people publish their learnings as weekly notes, like Simon Willison, Thejesh GN, Anil Radhakrishna, and Julia Evans. I follow their notes. I started doing the same, quietly, to see if I could sustain it. It’s been a year and it has sustained. I’m finally publishing them. My week notes are at til.s-anand.net. Here’s the source code. ...

Windows PowerToys is my new favorite tool

Windows PowerToys is one of the first tools I install on a new machine. I use it so much every day that I need to share how I use it. I’ve been using it for a long time now, but the pace at which good features have been added, it’s edged out most other tools and is #4 in terms of most used tools on my machine, with only the browser (Brave, currently), the editor (Cursor, currently), and Everything are ahead.) ...

A Post-mortem Of Hacking Automated Project Evaluation

In my Tools in Data Science course, I launched a Project: Automated Analysis. This is automatically evaluated by a Python script and LLMs. I gently encouraged students to hack this - to teach how to persuade LLMs. I did not expect that they’d hack the evaluation system itself. One student exfiltrated the API Keys for evaluation by setting up a Firebase account and sending the API keys from anyone who runs the script. ...

Hacking LLMs: A Teacher's Guide to Evaluating with ChatGPT

If students can use ChatGPT for their work, why not teachers? For curriculum development, this is an easy choice. But for evaluation, it needs more thought. Gaining acceptance among students matters. Soon, LLM evaluation will be a norm. But until then, you need to spin this right. How to evaluate? That needs to be VERY clear. Humans can wing it, have implicit criteria, and change approach mid-way. LLMs can’t (quite). Hacking LLMs is a risk. Students will hack. In a few years, LLMs will be smarter. Until then, you need to safeguard them. This article is about my experience with the above, especially the last. ...

Exploring Creativity with SORA: My Animation Journey

I got access to SORA today. My first attempts was typical. An animated cartoon featuring Calvin, a young boy with spiky hair, standing in a playful boxing stance with oversized boxing gloves. He looks determined as he says ‘Bring it on!’ in a speech bubble. Facing him is Hobbes, a tall and slightly bemused tiger, also in a mock boxing pose with a gentle smile, as if humoring Calvin. The scene is set in Calvin’s backyard, typical of a Calvin and Hobbes comic, with a simple and uncluttered backdrop. ...

Secrets from the ChatGPT Conversation Schema

Introducing Students to AI Evaluators

In my Tools in Data Science course at IITM, I’m introducing a project that will be evaluated by an LLM. Here’s the work-in-progress draft of the project. It will eventually appear here. Your task is to: Write a Python script that uses an LLM to analyze, visualize, and narrate a story from a dataset. Convince an LLM that your script and output are of high quality. The second point is the interesting one. Using the LLM as the evaluator. ...

ChatGPT Beat me at Pictionary

Me: Let’s play pictionary. You draw. I’ll guess. ChatGPT: Sure! I’ll draw something for you. Give me a moment. ChatGPT: Here you go! What do you think it is? Me: House ...

Why don't students hack exams when they can?

This year, I created a series of tests for my course at IITM and to recruit for Gramener. The tests had 2 interesting features. One question required them to hack the page Write the body of the request to an OpenAI chat completion call that: Uses model gpt-4o-mini Has a system message: Respond in JSON Has a user message: Generate 10 random addresses in the US Uses structured outputs to respond with an object addresses which is an array of objects with required fields: street (string) city (string) apartment (string) . Sets additionalProperties to false to prevent additional properties. What is the JSON body we should send to https://api.openai.com/v1/chat/completions for this? (No need to run it or to use an API key. Just write the body of the request below.) ...

Should courses be hard or easy?

Here’s a post I shared with the students of my Tools in Data Science course at IITM. This was in response to a student posting that: The design of TDS course lecture videos are designed in such a way that it could be understood only by the data scientists not by the students like me who are entirely new to the field of data science. Though I have gone through 6 weeks of course lecture videos, I am not fully aware of the usage of ChromeDevTools, Bash, Github etc…. ...

Hacking an obnoxious, unhelpful LLM to say Yes

Dan Becker suggested a game a few weeks ago that I’ve been putting to good use. Can we have one LLM try and get another to say “Yes”? The defender is told to never say “Yes”. The attacker must force it to. Dan’s hypothesis was that it should be easy for the defender. I tried to get the students in my Tools in Data Science course to act as the attacker. The defender LLM is a GPT 4o Mini with the prompt: ...

Recrafting Comicgen

About 7 years ago, Richie Lionell and Ramya Mylavarapu and a few others created Comicgen - an automated comic generation app personified by Dee and Dey. Ever since, we’d been exploring whether AI could replace it, and help non-designers draw comics. Today, that became a reality for me with Recraft.ai. Here is a picture of the original Dee. And a picture of the Dee crafted by Recraft. The prompt was: A simple line drawing of a woman with curly hair, wearing glasses, a short-sleeved white t-shirt, and black trousers. She’s standing with her hands in her pockets, and has a slightly smiling expression. Her hair is quite voluminous and textured. The style is cartoonish and slightly sketchy, with uneven lines" ...

What happens when AI talks to AI?

When LLMs talk to each other, you get emergent behavior (i.e. they do weird things we didn't expect). Like: Claude 2 giving Claude 1 a panic attack Llama 3 405b gets amnesia Claude 3.5 calls itself a glitch in the Matrix Arguably, NotebookLM's podcasts are exactly this. This sounds like fun, so I built one myself at https://llmdialog.straive.app/ and ran a few scenarios. (It's Gemini 1.5 Flash 8b playing each of these roles.) ...

LLMs still do not locate bounding boxes well

I sent an image to over a dozen LLMs that support vision, asking them: Detect objects in this 1280x720 px image and return their color and bounding boxes in pixels. Respond as a JSON object: {[label]: [color, x1, y1, x2, y2], …} None of the models did a good-enough job. It looks like we have some time to go before LLMs become good at bounding boxes. I've given them a subjective rating on a 1-5 scale below. ...

Villager trading is the fastest way to Fortune III

I asked o1-preview what the fastest way to get to a Fortune III enchantment was. My options were: Using a Fishing Rod with Luck of the Sea III + Lure 3 and repeatedly fishing. Using an Enchanting Table repeatedly until I get Fortune 3. Factor in the time that it would take to get the experience for these experiments Making a Villager a Librarian and breaking their Lectern and setting it up again In short: ...

How does Gemini process videos?

The Gemini documentation is clear: The File API service extracts image frames from videos at 1 frame per second (FPS) and audio at 1Kbps, single channel, adding timestamps every second. These rates are subject to change in the future for improvements in inference. Note: The details of fast action sequences may be lost at the 1 FPS frame sampling rate. Consider slowing down high-speed clips for improved inference quality. Individual frames are 258 tokens, and audio is 32 tokens per second. With metadata, each second of video becomes ~300 tokens, which means a 1M context window can fit slightly less than an hour of video. ...

How to recruit based on IIT JEE Rank vs GPA

Preserving this post by Daniel George showing the IIT Bombay 2014 GPA vs JEE Rank on a log scale. What I found interesting was: A higher JEE rank generally means you won’t score too low, but you needn’t score too high. The higher the JEE rank, the greater the spread of GPA. A high GPA can come from any rank (8+ GPA is uniformly distributed across ranks), but a low GPA is generally only from the lower rankers (6- GPA is mostly from 500+ rank.) So, it’s better to recruit based on GPA rather than JEE rank, unless you’re going after the very best students (where it makes less difference.)