Which is the most neurotic / emotional #LLM? I ran the Big 5 personality test on a bunch of LLMs (for my TEDx MDI Gurgaon talk in August.) Here are the results. https://sanand0.github.io/llmpersonality/ Claude 3 Haiku and Llama 3 8b consider themselves the most emotional models. In fact, some of Llama 3 8b’s quotes are hilarious: Get stressed out easily. - 4. Moderately Accurate (I can get stressed, but I’m working on managing my stress levels) ...
October 6, 2024
Things I Learned - 06 Oct 2024
This week, I learned: ffmpeg on WASM works but is unstable and hard to use. You can’t use it in a CDN without CORS issues, since it loads ffmpeg-core via a worker. It often runs into buffer allocation issues. Exotel and Plivo provide voice & SMS services in India (like Twilio). Plivo is more customer friendly. Uber’s H3, Google’s S2, and GeoHash are geocoding systems. H3 offers uniform cell sizes and better distance measurement S2 offers higher precision (factoring in Earth’s curvature) for exact location matches GeoHash is the simplest There’s a movement towards embeddable databases on the cloud. MotherDuck is hosted DuckDB. Turso is hosted SQLite (with local sync, multi-tenant) StarBase DB is SQLite with an API on top of Cloudflare Durable Objects. Software 2.0 by Andrej Karpathy. This is fundamentally altering the programming paradigm by which we iterate on our software, as the teams split in two: the 2.0 programmers (data labelers) edit and grow the datasets, while a few 1.0 programmers maintain and iterate on the surrounding training code infrastructure, analytics, visualizations and labeling interfaces. Adaptive UI ideas: Adaptive Fields: Show only required fields based on what the user field so far. Smart Inputs: Dropdowns and auto-complete based on user’s context. Smart Themes: Change font size, contrast, theme guessing the user’s age and preferences. Dynamic Menus: Show what they might need to do next. Like Nokia’s right button, but using LLMs. Smart Tooltips: Check what the user’s doing (delays, confusions, previous clicks, current actions) and show relevant tips. Personalized Layout: Show only the relevant sections of the app. E.g. based on what they’re doing. Smart Charts: Create the right chart that solve the user’s question. Adaptive Back-end Dynamic APIs: Create endpoints on the fly based on user needs Dynamic Indexing: Create & update indices on the fly based on user needs Dynamic Schema: Create & update schema on the fly based on user needs Dynamic Migration: Migrate to a new database or OS or language as required Dynamic Queries: Create SQL/NoSQL queries to solve the user problem Dynamic RBAC: Figure out who needs permissions and why. Add OR REMOVE access as required Dynamic Logging. Log what’s required. Explain why it’s logged and what’s happening. Fix code that raised the error Dynamic Caching. Cache what’s likely to be required. Evict what may not be required. Figure out cache keys. Aider LLM Leaderboards show which LLMs code better. As of now, o1-preview > claude-3.5 sonnet on code editing claude-3-opus > claude-3.5-sonnet on code refactoring deepseek-coder-v gpt-4o-mini sucks. Jaro-Winkler Distance is a string matching algorithm that weights the start of a string higher. Passing the feed of the following to NotebookLLM is a good way to get caught up with news and summaries. A blog / WhatsApp group (e.g. The Generative AI Group, Sithamalli, etc.) A Google Group / mailing list (e.g. genainews, datameet) YouTube channels (e.g. Vertiasium, GitHub) Hacker News top stories Research papers Emails (skipping marketing emails) OpenAI Evals and Distillation has a clever design. They just convert filtered history to .JSONL files that can be an input to either. Speak is a language learning app based on OpenAI’s Realtime API. OpenAI’s Realtime API can be used in a text-to-text chat mode without needing to send the entire context. If the pricing works out right, this can be far cheaper than sending the entire conversation context. Ref Matching addresses with just embeddings works well. Combine it with simple hard rules. Ref OpenAI’s prompt caching works for images too – both linked and embedded Quotes on Graph RAG from a Generative AI WhatsApp Group. “Damn so literally nobody uses Graph RAG yet. Good to know.” ~Sumba “A big four consulting firm uses GraphRAG to retrieve related documents and excerpts from governance and compliance docs.” ~Vinayak Hegde (Microsoft) “Graph RAG is expensive and unnecessary in most of the cases.” ~Utkarsh Saxena ChatGPT’s advanced mode includes: “…you can use various regional accents and dialects.” Ref Source But the API can “laugh, whisper, and adhere to tone direction.” Ref Hume API (INR 6/min) is far cheaper than OpenAI’s real-time chat (6c/min input + 24c/min output) Devika is an open-source clone of Devin. DuckDB runs inside Pyodide Hungarian Jews have genetic diseases that increase their IQ. Gaucher’s disease, Torsion dystonia. People don’t like hard stuff like maths or science, so richer societies have fewer scientists Ethan Mollick feels Claude 3.5 Sonnet is better at style and critiquing blog posts than OpenAI’s o1 (which is better at reasoning.) News is going to be crazily disrupted again with voice mode. I can just listen to the topic I want In Singapore Airlines, You can’t wear your seatbelt loose You have to keep the laptop in the pocket in front, not on your lap, during takeoff You can’t charge during takeoff They verify if you ask for a veg meal and place a sticker on your seat Coders are more likely to edit LLM code. Non-coders don’t have that bad habit. Vaishnavi and Ranjeet edited code Indal and Koustav didn’t Coders are likely to get more out of an LLM because they know what it can do. But some non-coders will get more out of an LLM because they don’t know what it can’t do. E.g. Indal trying for a confetti animation, which is hard but do-able “You have to put in a lot of work to become productive at AI coding.” Simon Willison