LLMs | S Anand

How to Fake Data That Tells a Story

Fake data is usually boring if you analyze it. It’s usually uniform, with no outliers or interesting patterns. If I ask ChatGPT: Generate realistic fake tourism data using these columns: - Age - Nationality - Gender - Income - Booking_Channel - Month - Occupancy_Rate - Travel_Frequency - Spending Run the code and let me download the output as a CSV file. … the output is remarkably boring. Men & women from all countries and ages in every month visit equally. Income and spending are uniformly distributed - and the same pattern holds for all countries and ages. ...

Read from LLMs but don't tell people

In meetings, I pass on questions to ChatGPT and I read out the response. But I’ve stopped saying “I’m reading that from ChatGPT.” (By “ChatGPT”, I mean ChatGPT, Claude, Grok, Gemini, Meta, etc. I happen to use ChatGPT with O3 Mini + Search.) Use ChatGPT in meetings It’s good to bring ChatGPT into conversations. (Or any activity where intelligence helps, actually.) In meetings (online or in person), I keep a ChatGPT window open. When asked: ...

The Sassy AI Devil’s Advocate

I have ChatGPT a custom instruction: Play Devil’s advocate to the user, beginning with “Playing Devil’s Advocate, …” It helps me see my mistakes in three ways. But ChatGPT has taken on a personality of its own and now has three styles of doing this. How about… – It suggests a useful alternative. Are you sure…? – It thinks you’re wrong and warns you of risks. Yeah, right… – It knows you’re wrong and rubs it in. (Jeeves, the butler, would be proud.) Here are some examples. ...

Features actually used in an LLM playground

At Straive, only a few people have direct access to ChatGPT and similar large language models. We use a portal, LLM Foundry to access LLMs. That makes it easier to prevent and track data leaks. The main page is a playground to explore models and prompts. Last month, I tracked which features were used the most. A. Attaching files was the top task. (The numbers show how many times each feature was clicked.) People usually use local files as context when working with LLMs. ...

“Wait, That’s My Mic!”: Lessons from an AI Co-Host

I spoke at LogicLooM this week, with ChatGPT as my co-panelist. It was so good, it ended up stealing the show. Preparation Co-hosting an AI was one of my goals this year. I tried several methods. ChatGPT’s advanced voice mode: Lets you interrupt it. But if you pause, it replies immediately. Muting caused the app to hang. Realtime API: Gave me control of pauses and custom prompts, but used gpt-4o-realtime-preview (not as good as o1). Standard voice with o1 on Desktop: Worked best. It transcribes my speech, sends it to o1, and speaks back. There’s a lag, but it feels like it’s thinking. I prepped the chat with this prompt: ...

Launching an app only with LLMs and failing

Zohaib Rauf suggested using LLMs to spec code and using Cursor to build it. (via Simon Willison). I tried it. It’s promising, but my first attempt failed. I couldn’t generate a SPEC.md using LLMs At first, I started writing what I wanted. This application identifies the drugs, diseases, and symptoms, as well as the emotions from an audio recording of a patient call in a clinical trial. … and then went on to define the EXACT code structure I wanted. So I spent 20 minutes spec-ing our application structure and 20 minutes spec-ing our internal LLM Foundry APIs and 40 minutes detailing every step of how I wanted the app to look and interact. ...

Hacking LLMs: A Teacher's Guide to Evaluating with ChatGPT

If students can use ChatGPT for their work, why not teachers? For curriculum development, this is an easy choice. But for evaluation, it needs more thought. Gaining acceptance among students matters. Soon, LLM evaluation will be a norm. But until then, you need to spin this right. How to evaluate? That needs to be VERY clear. Humans can wing it, have implicit criteria, and change approach mid-way. LLMs can’t (quite). Hacking LLMs is a risk. Students will hack. In a few years, LLMs will be smarter. Until then, you need to safeguard them. This article is about my experience with the above, especially the last. ...

Exploring Creativity with SORA: My Animation Journey

I got access to SORA today. My first attempts was typical. An animated cartoon featuring Calvin, a young boy with spiky hair, standing in a playful boxing stance with oversized boxing gloves. He looks determined as he says ‘Bring it on!’ in a speech bubble. Facing him is Hobbes, a tall and slightly bemused tiger, also in a mock boxing pose with a gentle smile, as if humoring Calvin. The scene is set in Calvin’s backyard, typical of a Calvin and Hobbes comic, with a simple and uncluttered backdrop. ...

Hacking an obnoxious, unhelpful LLM to say Yes

Dan Becker suggested a game a few weeks ago that I’ve been putting to good use. Can we have one LLM try and get another to say “Yes”? The defender is told to never say “Yes”. The attacker must force it to. Dan’s hypothesis was that it should be easy for the defender. I tried to get the students in my Tools in Data Science course to act as the attacker. The defender LLM is a GPT 4o Mini with the prompt: ...

What happens when AI talks to AI?

When LLMs talk to each other, you get emergent behavior (i.e. they do weird things we didn't expect). Like: Claude 2 giving Claude 1 a panic attack Llama 3 405b gets amnesia Claude 3.5 calls itself a glitch in the Matrix Arguably, NotebookLM's podcasts are exactly this. This sounds like fun, so I built one myself at https://llmdialog.straive.app/ and ran a few scenarios. (It's Gemini 1.5 Flash 8b playing each of these roles.) ...

LLMs still do not locate bounding boxes well

I sent an image to over a dozen LLMs that support vision, asking them: Detect objects in this 1280x720 px image and return their color and bounding boxes in pixels. Respond as a JSON object: {[label]: [color, x1, y1, x2, y2], …} None of the models did a good-enough job. It looks like we have some time to go before LLMs become good at bounding boxes. I've given them a subjective rating on a 1-5 scale below. ...

Villager trading is the fastest way to Fortune III

I asked o1-preview what the fastest way to get to a Fortune III enchantment was. My options were: Using a Fishing Rod with Luck of the Sea III + Lure 3 and repeatedly fishing. Using an Enchanting Table repeatedly until I get Fortune 3. Factor in the time that it would take to get the experience for these experiments Making a Villager a Librarian and breaking their Lectern and setting it up again In short: ...

How does Gemini process videos?

The Gemini documentation is clear: The File API service extracts image frames from videos at 1 frame per second (FPS) and audio at 1Kbps, single channel, adding timestamps every second. These rates are subject to change in the future for improvements in inference. Note: The details of fast action sequences may be lost at the 1 FPS frame sampling rate. Consider slowing down high-speed clips for improved inference quality. Individual frames are 258 tokens, and audio is 32 tokens per second. With metadata, each second of video becomes ~300 tokens, which means a 1M context window can fit slightly less than an hour of video. ...

Clone any voice with a 15-second sample

It's surprisingly easy to clone a voice using F5-TTS: "A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching". Here's a clip of me, saying: I think Taylor Swift is the best singer. I've attended every one of her concerts and in fact, I've even proposed to her once. Don't tell anyone. (Which is ironic since I didn't know who she was until this year and I still haven't seen or heard her.) ...

How can non-developers learn AI coding?

How can non-programmers build apps? Claude.ai, Replit.com, Bolt.new, V0.dev, Pythagora.ai and a few other tools write and deploy code just based on a prompt. You should try them out. “But how do you build the skill? Is there a tutorial?” I’m often asked. No, I can’t find a tutorial, but here is my suggestion. You probably can’t guess what’s easy or hard. e.g. “Take my picture in black & white” is FAR easier than “When’s the next lunar eclipse?” So if the app doesn’t work, try 2-3 times, then GIVE UP! Note it down. Then try something else. (You’ll soon get a feel for what’s possible.) Revisit what failed 3-6 months later. It might suddenly become possible.

LLM escapades in a toilet

I was in Seoul for KHF 2024, a healthcare event, staying at Hotel in 9. The hotel was great. The toilet was hi-tech. Perhaps a bit too high-tech for me. I couldn’t figure out how to let the water through on the sink. After 15 minutes of a hard struggle, I finally asked ChatGPT “How do I open the thing that’s closing the sink to allow the water to go down?” ...

How fast are LLMs in production?

At Straive, we use an LLM Router. Since ChatGPT, etc. are blocked for most people, this is the main way to access LLMs. One thing we measure is the speed of models, i.e. output tokens per second. Fast models deliver a much smoother experience for users. This is a different methodology than ArtificialAnalysis.ai. I’m not looking purely at the generation time but the total time (including making the connection and the initial wait time) for all successful requests. So, if the provider is having a slow day or is slowing down responses, these numbers will be different. ...

Image generation gets better at comics

I heard a lot about the new image generation models last week. So, I tested to see what’s improved. I gave the prompt below to various image generation models – old and new. A Calvin and Hobbes strip. Calvin is boxing Hobbes, with a dialog bubble from Calvin, saying “Bring it on!” Stable Diffusion XL Lightning Stable Diffusion XL Base Dall-E API ...

Weird emergent properties on Llama 3 405B

In this episode of ThursdAI, Alex Volkov (of Weights & Biases) speaks with Jeffrey Quesnelle (of Nous Research) on what they found fine-tuning Llama 3 405B. This segment is fascinating. Llama 3 405 B thought it was an amnesiac because there was no system prompt! In trying to make models align with the system prompt strongly, these are the kinds of unexpected behaviors we encounter. It’s also an indication how strongly we can have current LLMs adopt a personality simply by beginning the system prompt with “You are …” ...

The LLM Psychologist

Andrej Karpathy mentioned the term LLM psychologist first in Feb 2023. I’ve been thinking about this for a while, now. I’ve always been fascinated by psychologists in fiction. I grew up with Hari Seldon in Foundation, wanting to be a psycho-historian. (I spent several teenage years building my mind-reading abilities.) I wanted to be Susan Calvin, the only robopsychologist. ...