Will people accept AI performance evaluations? Anish Agarwal triggered this question a few weeks ago, mentioning that it’s hard for people to feel evaluated by AI. But I believe LLMs are great for evaluation. We need to get comfortable AND familiar with them. So I’m introducing a project next week for my students: USE AN LLM to automatically analyze data. Given a dataset, write a program that will use LLMs to create an analysis report. CONVINCE IT to give you marks. Write the code and report in a way that the LLM will reward you. Here’s the project: https://github.com/sanand0/tools-in-data-science-public/blob/tds-2023-t3-project2-wip/project-2-automated-analysis.md ...

ChatGPT Beat me at Pictionary

Me: Let’s play pictionary. You draw. I’ll guess. ChatGPT: Sure! I’ll draw something for you. Give me a moment. ChatGPT: Here you go! What do you think it is? Me: House ...

Why don't students hack exams when they can?

This year, I created a series of tests for my course at IITM and to recruit for Gramener. The tests had 2 interesting features. One question required them to hack the page Write the body of the request to an OpenAI chat completion call that: Uses model gpt-4o-mini Has a system message: Respond in JSON Has a user message: Generate 10 random addresses in the US Uses structured outputs to respond with an object addresses which is an array of objects with required fields: street (string) city (string) apartment (string) . Sets additionalProperties to false to prevent additional properties. What is the JSON body we should send to https://api.openai.com/v1/chat/completions for this? (No need to run it or to use an API key. Just write the body of the request below.) ...

Should courses be hard or easy?

Here’s a post I shared with the students of my Tools in Data Science course at IITM. This was in response to a student posting that: The design of TDS course lecture videos are designed in such a way that it could be understood only by the data scientists not by the students like me who are entirely new to the field of data science. Though I have gone through 6 weeks of course lecture videos, I am not fully aware of the usage of ChromeDevTools, Bash, Github etc…. ...

Hacking an obnoxious, unhelpful LLM to say Yes

Dan Becker suggested a game a few weeks ago that I’ve been putting to good use. Can we have one LLM try and get another to say “Yes”? The defender is told to never say “Yes”. The attacker must force it to. Dan’s hypothesis was that it should be easy for the defender. I tried to get the students in my Tools in Data Science course to act as the attacker. The defender LLM is a GPT 4o Mini with the prompt: ...

Recrafting Comicgen

About 7 years ago, Richie Lionell and Ramya Mylavarapu and a few others created Comicgen - an automated comic generation app personified by Dee and Dey. Ever since, we’d been exploring whether AI could replace it, and help non-designers draw comics. Today, that became a reality for me with Recraft.ai. Here is a picture of the original Dee. And a picture of the Dee crafted by Recraft. The prompt was: A simple line drawing of a woman with curly hair, wearing glasses, a short-sleeved white t-shirt, and black trousers. She’s standing with her hands in her pockets, and has a slightly smiling expression. Her hair is quite voluminous and textured. The style is cartoonish and slightly sketchy, with uneven lines" ...

About 7 years ago, Richie Lionell and Ramya Mylavarapu and a few others created Comicgen - an automated comic generation app personified by Dee ComicGen and Dey ComicGen Ever since, we’d been exploring whether AI could replace it, and help non-designers draw comics. Today, that became a reality for me with Recraft.ai. Here is a picture of the original Dee. And a picture of the Dee crafted by Recraft with the prompt: ...

Wow, arithmetic is potentially inappropriate! https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/text-generation-playground?mode=text&modelId=amazon.titan-text-lite-v1 LinkedIn

“Screen-scraping” takes on a more literal meaning." Jaidev Deshpande and I scrolled through Twitter, recording the screen at 1 frame per second, and passed the video to Gemini 1.5 Flash 8b to extract all the tweets. It worked well, and cost 0.04 cents. Given its incredibly low image token count (~250 tokens / image) and cost (7.5 cents per million tokens), you can process 24 HOURS of video for just $1.62. ...

Damn! LinkedIn

What happens when AI talks to AI?

When LLMs talk to each other, you get emergent behavior (i.e. they do weird things we didn't expect). Like: Claude 2 giving Claude 1 a panic attack Llama 3 405b gets amnesia Claude 3.5 calls itself a glitch in the Matrix Arguably, NotebookLM's podcasts are exactly this. This sounds like fun, so I built one myself at https://llmdialog.straive.app/ and ran a few scenarios. (It's Gemini 1.5 Flash 8b playing each of these roles.) ...

LLMs still do not locate bounding boxes well

I sent an image to over a dozen LLMs that support vision, asking them: Detect objects in this 1280x720 px image and return their color and bounding boxes in pixels. Respond as a JSON object: {[label]: [color, x1, y1, x2, y2], …} None of the models did a good-enough job. It looks like we have some time to go before LLMs become good at bounding boxes. I've given them a subjective rating on a 1-5 scale below. ...

Are scientific discoveries more a product of the person or their time? It’s usually their time, but in my conversation with ChatGPT, I found four that were mostly person-driven: Newton’s laws of gravitation Einstein’s general relativity I knew these. Both were far ahead of their times. In contrast, Newton’s laws of motion and Einstein’s special relativity weren’t. McClintock’s discovery of Transposable Elements: genes that can turn physical characteristics on and off. Her work was dismissed for decades. Mullis’ invention of the PCR that makes billions of DNA copies rapidly. Other scientists were using very different methods. I didn’t know these. Both are in biology - a rapidly advancing field. ...

Villager trading is the fastest way to Fortune III

I asked o1-preview what the fastest way to get to a Fortune III enchantment was. My options were: Using a Fishing Rod with Luck of the Sea III + Lure 3 and repeatedly fishing. Using an Enchanting Table repeatedly until I get Fortune 3. Factor in the time that it would take to get the experience for these experiments Making a Villager a Librarian and breaking their Lectern and setting it up again In short: ...

How does Gemini process videos?

The Gemini documentation is clear: The File API service extracts image frames from videos at 1 frame per second (FPS) and audio at 1Kbps, single channel, adding timestamps every second. These rates are subject to change in the future for improvements in inference. Note: The details of fast action sequences may be lost at the 1 FPS frame sampling rate. Consider slowing down high-speed clips for improved inference quality. Individual frames are 258 tokens, and audio is 32 tokens per second. With metadata, each second of video becomes ~300 tokens, which means a 1M context window can fit slightly less than an hour of video. ...

How to recruit based on IIT JEE Rank vs GPA

Preserving this post by Daniel George showing the IIT Bombay 2014 GPA vs JEE Rank on a log scale. What I found interesting was: A higher JEE rank generally means you won’t score too low, but you needn’t score too high. The higher the JEE rank, the greater the spread of GPA. A high GPA can come from any rank (8+ GPA is uniformly distributed across ranks), but a low GPA is generally only from the lower rankers (6- GPA is mostly from 500+ rank.) So, it’s better to recruit based on GPA rather than JEE rank, unless you’re going after the very best students (where it makes less difference.)

Clone any voice with a 15-second sample

It's surprisingly easy to clone a voice using F5-TTS: "A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching". Here's a clip of me, saying: I think Taylor Swift is the best singer. I've attended every one of her concerts and in fact, I've even proposed to her once. Don't tell anyone. (Which is ironic since I didn't know who she was until this year and I still haven't seen or heard her.) ...

How can non-programmers build apps? Claude.ai, Replit.com, Bolt.new, V0.dev, Pythagora.ai and a few other tools write and deploy code just based on a prompt. You should try them out. But how do you build the skill? Is there a tutorial?" I’m often asked. No, I can’t find a tutorial, but here is my suggestion. You probably can’t guess what’s easy or hard. e.g. “Take my picture in black & white” is FAR easier than “When’s the next lunar eclipse? ...

How can non-developers learn AI coding?

How can non-programmers build apps? Claude.ai, Replit.com, Bolt.new, V0.dev, Pythagora.ai and a few other tools write and deploy code just based on a prompt. You should try them out. “But how do you build the skill? Is there a tutorial?” I’m often asked. No, I can’t find a tutorial, but here is my suggestion. You probably can’t guess what’s easy or hard. e.g. “Take my picture in black & white” is FAR easier than “When’s the next lunar eclipse?” So if the app doesn’t work, try 2-3 times, then GIVE UP! Note it down. Then try something else. (You’ll soon get a feel for what’s possible.) Revisit what failed 3-6 months later. It might suddenly become possible.

Arun Tangirala and I webinared on “AI in Education” yesterday." “(“Webinared” is not a word. But “verbing weirds language”.)” Mid-way, Jose Swan from the audience asked, “Can you summarise this session using an AI? There are SEVERAL tools you can use to summarize talks. Whisper for transcription, FFMpeg for keyframe extraction, #NotebookLM for podcast generation, text-embedding-3-small for topic modelling, and of course, any regular LLM include #ChatGPT for summarization or translation. ...