This week, I learned:

  • ffmpeg on WASM works but is unstable and hard to use.
    • You can’t use it in a CDN without CORS issues, since it loads ffmpeg-core via a worker.
    • It often runs into buffer allocation issues.
  • Exotel and Plivo provide voice & SMS services in India (like Twilio). Plivo is more customer friendly.
  • Uber’s H3, Google’s S2, and GeoHash are geocoding systems.
    • H3 offers uniform cell sizes and better distance measurement
    • S2 offers higher precision (factoring in Earth’s curvature) for exact location matches
    • GeoHash is the simplest
  • There’s a movement towards embeddable databases on the cloud.
    • MotherDuck is hosted DuckDB.
    • Turso is hosted SQLite (with local sync, multi-tenant)
    • StarBase DB is SQLite with an API on top of Cloudflare Durable Objects.
  • Software 2.0 by Andrej Karpathy.
    • This is fundamentally altering the programming paradigm by which we iterate on our software, as the teams split in two:
      • the 2.0 programmers (data labelers) edit and grow the datasets, while
      • a few 1.0 programmers maintain and iterate on the surrounding training code infrastructure, analytics, visualizations and labeling interfaces.
    • Adaptive UI ideas:
      • Adaptive Fields: Show only required fields based on what the user field so far.
      • Smart Inputs: Dropdowns and auto-complete based on user’s context.
      • Smart Themes: Change font size, contrast, theme guessing the user’s age and preferences.
      • Dynamic Menus: Show what they might need to do next. Like Nokia’s right button, but using LLMs.
      • Smart Tooltips: Check what the user’s doing (delays, confusions, previous clicks, current actions) and show relevant tips.
      • Personalized Layout: Show only the relevant sections of the app. E.g. based on what they’re doing.
      • Smart Charts: Create the right chart that solve the user’s question.
    • Adaptive Back-end
      • Dynamic APIs: Create endpoints on the fly based on user needs
      • Dynamic Indexing: Create & update indices on the fly based on user needs
      • Dynamic Schema: Create & update schema on the fly based on user needs
      • Dynamic Migration: Migrate to a new database or OS or language as required
      • Dynamic Queries: Create SQL/NoSQL queries to solve the user problem
      • Dynamic RBAC: Figure out who needs permissions and why. Add OR REMOVE access as required
      • Dynamic Logging. Log what’s required. Explain why it’s logged and what’s happening. Fix code that raised the error
      • Dynamic Caching. Cache what’s likely to be required. Evict what may not be required. Figure out cache keys.
  • Aider LLM Leaderboards show which LLMs code better. As of now,
    • o1-preview > claude-3.5 sonnet on code editing
    • claude-3-opus > claude-3.5-sonnet on code refactoring
    • deepseek-coder-v
    • gpt-4o-mini sucks.
  • Jaro-Winkler Distance is a string matching algorithm that weights the start of a string higher.
  • Passing the feed of the following to NotebookLLM is a good way to get caught up with news and summaries.
    • A blog / WhatsApp group (e.g. The Generative AI Group, Sithamalli, etc.)
    • A Google Group / mailing list (e.g. genainews, datameet)
    • YouTube channels (e.g. Vertiasium, GitHub)
    • Hacker News top stories
    • Research papers
    • Emails (skipping marketing emails)
  • OpenAI Evals and Distillation has a clever design. They just convert filtered history to .JSONL files that can be an input to either.
  • Speak is a language learning app based on OpenAI’s Realtime API.
  • OpenAI’s Realtime API can be used in a text-to-text chat mode without needing to send the entire context. If the pricing works out right, this can be far cheaper than sending the entire conversation context. Ref
  • Matching addresses with just embeddings works well. Combine it with simple hard rules. Ref
  • OpenAI’s prompt caching works for images too – both linked and embedded
  • Quotes on Graph RAG from a Generative AI WhatsApp Group.
    • “Damn so literally nobody uses Graph RAG yet. Good to know.” ~Sumba
    • “A big four consulting firm uses GraphRAG to retrieve related documents and excerpts from governance and compliance docs.” ~Vinayak Hegde (Microsoft)
    • “Graph RAG is expensive and unnecessary in most of the cases.” ~Utkarsh Saxena
  • ChatGPT’s advanced mode includes: “…you can use various regional accents and dialects.” Ref Source
    • But the API can “laugh, whisper, and adhere to tone direction.” Ref
  • Hume API (INR 6/min) is far cheaper than OpenAI’s real-time chat (6c/min input + 24c/min output)
  • Devika is an open-source clone of Devin.
  • DuckDB runs inside Pyodide
  • Hungarian Jews have genetic diseases that increase their IQ. Gaucher’s disease, Torsion dystonia.
  • People don’t like hard stuff like maths or science, so richer societies have fewer scientists
  • Ethan Mollick feels Claude 3.5 Sonnet is better at style and critiquing blog posts than OpenAI’s o1 (which is better at reasoning.)
  • News is going to be crazily disrupted again with voice mode. I can just listen to the topic I want
  • In Singapore Airlines,
    • You can’t wear your seatbelt loose
    • You have to keep the laptop in the pocket in front, not on your lap, during takeoff
    • You can’t charge during takeoff
    • They verify if you ask for a veg meal and place a sticker on your seat
  • Coders are more likely to edit LLM code. Non-coders don’t have that bad habit.
  • Coders are likely to get more out of an LLM because they know what it can do. But some non-coders will get more out of an LLM because they don’t know what it can’t do.
    • E.g. Indal trying for a confetti animation, which is hard but do-able
  • “You have to put in a lot of work to become productive at AI coding.” Simon Willison