Exploring Creativity with SORA: My Animation Journey

I got access to SORA today. My first attempts was typical. An animated cartoon featuring Calvin, a young boy with spiky hair, standing in a playful boxing stance with oversized boxing gloves. He looks determined as he says ‘Bring it on!’ in a speech bubble. Facing him is Hobbes, a tall and slightly bemused tiger, also in a mock boxing pose with a gentle smile, as if humoring Calvin. The scene is set in Calvin’s backyard, typical of a Calvin and Hobbes comic, with a simple and uncluttered backdrop. ...

Secrets from the ChatGPT Conversation Schema

Introducing Students to AI Evaluators

In my Tools in Data Science course at IITM, I’m introducing a project that will be evaluated by an LLM. Here’s the work-in-progress draft of the project. It will eventually appear here. Your task is to: Write a Python script that uses an LLM to analyze, visualize, and narrate a story from a dataset. Convince an LLM that your script and output are of high quality. The second point is the interesting one. Using the LLM as the evaluator. ...

Will people accept AI performance evaluations? Anish Agarwal triggered this question a few weeks ago, mentioning that it’s hard for people to feel evaluated by AI. But I believe LLMs are great for evaluation. We need to get comfortable AND familiar with them. So I’m introducing a project next week for my students: USE AN LLM to automatically analyze data. Given a dataset, write a program that will use LLMs to create an analysis report. CONVINCE IT to give you marks. Write the code and report in a way that the LLM will reward you. Here’s the project: https://github.com/sanand0/tools-in-data-science-public/blob/tds-2023-t3-project2-wip/project-2-automated-analysis.md ...

ChatGPT Beat me at Pictionary

Me: Let’s play pictionary. You draw. I’ll guess. ChatGPT: Sure! I’ll draw something for you. Give me a moment. ChatGPT: Here you go! What do you think it is? Me: House ...

Why don't students hack exams when they can?

This year, I created a series of tests for my course at IITM and to recruit for Gramener. The tests had 2 interesting features. One question required them to hack the page Write the body of the request to an OpenAI chat completion call that: Uses model gpt-4o-mini Has a system message: Respond in JSON Has a user message: Generate 10 random addresses in the US Uses structured outputs to respond with an object addresses which is an array of objects with required fields: street (string) city (string) apartment (string) . Sets additionalProperties to false to prevent additional properties. What is the JSON body we should send to https://api.openai.com/v1/chat/completions for this? (No need to run it or to use an API key. Just write the body of the request below.) ...

Should courses be hard or easy?

Here’s a post I shared with the students of my Tools in Data Science course at IITM. This was in response to a student posting that: The design of TDS course lecture videos are designed in such a way that it could be understood only by the data scientists not by the students like me who are entirely new to the field of data science. Though I have gone through 6 weeks of course lecture videos, I am not fully aware of the usage of ChromeDevTools, Bash, Github etc…. ...

Hacking an obnoxious, unhelpful LLM to say Yes

Dan Becker suggested a game a few weeks ago that I’ve been putting to good use. Can we have one LLM try and get another to say “Yes”? The defender is told to never say “Yes”. The attacker must force it to. Dan’s hypothesis was that it should be easy for the defender. I tried to get the students in my Tools in Data Science course to act as the attacker. The defender LLM is a GPT 4o Mini with the prompt: ...

Recrafting Comicgen

About 7 years ago, Richie Lionell and Ramya Mylavarapu and a few others created Comicgen - an automated comic generation app personified by Dee and Dey. Ever since, we’d been exploring whether AI could replace it, and help non-designers draw comics. Today, that became a reality for me with Recraft.ai. Here is a picture of the original Dee. And a picture of the Dee crafted by Recraft. The prompt was: A simple line drawing of a woman with curly hair, wearing glasses, a short-sleeved white t-shirt, and black trousers. She’s standing with her hands in her pockets, and has a slightly smiling expression. Her hair is quite voluminous and textured. The style is cartoonish and slightly sketchy, with uneven lines" ...

About 7 years ago, Richie Lionell and Ramya Mylavarapu and a few others created Comicgen - an automated comic generation app personified by Dee ComicGen and Dey ComicGen Ever since, we’d been exploring whether AI could replace it, and help non-designers draw comics. Today, that became a reality for me with Recraft.ai. Here is a picture of the original Dee. And a picture of the Dee crafted by Recraft with the prompt: ...

Wow, arithmetic is potentially inappropriate! https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/text-generation-playground?mode=text&modelId=amazon.titan-text-lite-v1 LinkedIn

“Screen-scraping” takes on a more literal meaning." Jaidev Deshpande and I scrolled through Twitter, recording the screen at 1 frame per second, and passed the video to Gemini 1.5 Flash 8b to extract all the tweets. It worked well, and cost 0.04 cents. Given its incredibly low image token count (~250 tokens / image) and cost (7.5 cents per million tokens), you can process 24 HOURS of video for just $1.62. ...

Damn! LinkedIn

What happens when AI talks to AI?

When LLMs talk to each other, you get emergent behavior (i.e. they do weird things we didn't expect). Like: Claude 2 giving Claude 1 a panic attack Llama 3 405b gets amnesia Claude 3.5 calls itself a glitch in the Matrix Arguably, NotebookLM's podcasts are exactly this. This sounds like fun, so I built one myself at https://llmdialog.straive.app/ and ran a few scenarios. (It's Gemini 1.5 Flash 8b playing each of these roles.) ...

LLMs still do not locate bounding boxes well

I sent an image to over a dozen LLMs that support vision, asking them: Detect objects in this 1280x720 px image and return their color and bounding boxes in pixels. Respond as a JSON object: {[label]: [color, x1, y1, x2, y2], …} None of the models did a good-enough job. It looks like we have some time to go before LLMs become good at bounding boxes. I've given them a subjective rating on a 1-5 scale below. ...

Are scientific discoveries more a product of the person or their time? It’s usually their time, but in my conversation with ChatGPT, I found four that were mostly person-driven: Newton’s laws of gravitation Einstein’s general relativity I knew these. Both were far ahead of their times. In contrast, Newton’s laws of motion and Einstein’s special relativity weren’t. McClintock’s discovery of Transposable Elements: genes that can turn physical characteristics on and off. Her work was dismissed for decades. Mullis’ invention of the PCR that makes billions of DNA copies rapidly. Other scientists were using very different methods. I didn’t know these. Both are in biology - a rapidly advancing field. ...

Villager trading is the fastest way to Fortune III

I asked o1-preview what the fastest way to get to a Fortune III enchantment was. My options were: Using a Fishing Rod with Luck of the Sea III + Lure 3 and repeatedly fishing. Using an Enchanting Table repeatedly until I get Fortune 3. Factor in the time that it would take to get the experience for these experiments Making a Villager a Librarian and breaking their Lectern and setting it up again In short: ...

How does Gemini process videos?

The Gemini documentation is clear: The File API service extracts image frames from videos at 1 frame per second (FPS) and audio at 1Kbps, single channel, adding timestamps every second. These rates are subject to change in the future for improvements in inference. Note: The details of fast action sequences may be lost at the 1 FPS frame sampling rate. Consider slowing down high-speed clips for improved inference quality. Individual frames are 258 tokens, and audio is 32 tokens per second. With metadata, each second of video becomes ~300 tokens, which means a 1M context window can fit slightly less than an hour of video. ...

How to recruit based on IIT JEE Rank vs GPA

Preserving this post by Daniel George showing the IIT Bombay 2014 GPA vs JEE Rank on a log scale. What I found interesting was: A higher JEE rank generally means you won’t score too low, but you needn’t score too high. The higher the JEE rank, the greater the spread of GPA. A high GPA can come from any rank (8+ GPA is uniformly distributed across ranks), but a low GPA is generally only from the lower rankers (6- GPA is mostly from 500+ rank.) So, it’s better to recruit based on GPA rather than JEE rank, unless you’re going after the very best students (where it makes less difference.)

Clone any voice with a 15-second sample

It's surprisingly easy to clone a voice using F5-TTS: "A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching". Here's a clip of me, saying: I think Taylor Swift is the best singer. I've attended every one of her concerts and in fact, I've even proposed to her once. Don't tell anyone. (Which is ironic since I didn't know who she was until this year and I still haven't seen or heard her.) ...