This week, I learned:
- ffmpeg on WASM works but is unstable and hard to use.
- You can’t use it in a CDN without CORS issues, since it loads ffmpeg-core via a worker.
- It often runs into buffer allocation issues.
- Exotel and Plivo provide voice & SMS services in India (like Twilio). Plivo is more customer friendly.
- Uber’s H3, Google’s S2, and GeoHash are geocoding systems.
- H3 offers uniform cell sizes and better distance measurement
- S2 offers higher precision (factoring in Earth’s curvature) for exact location matches
- GeoHash is the simplest
- There’s a movement towards embeddable databases on the cloud.
- MotherDuck is hosted DuckDB.
- Turso is hosted SQLite (with local sync, multi-tenant)
- StarBase DB is SQLite with an API on top of Cloudflare Durable Objects.
- Software 2.0 by Andrej Karpathy.
- This is fundamentally altering the programming paradigm by which we iterate on our software, as the teams split in two:
- the 2.0 programmers (data labelers) edit and grow the datasets, while
- a few 1.0 programmers maintain and iterate on the surrounding training code infrastructure, analytics, visualizations and labeling interfaces.
- Adaptive UI ideas:
- Adaptive Fields: Show only required fields based on what the user field so far.
- Smart Inputs: Dropdowns and auto-complete based on user’s context.
- Smart Themes: Change font size, contrast, theme guessing the user’s age and preferences.
- Dynamic Menus: Show what they might need to do next. Like Nokia’s right button, but using LLMs.
- Smart Tooltips: Check what the user’s doing (delays, confusions, previous clicks, current actions) and show relevant tips.
- Personalized Layout: Show only the relevant sections of the app. E.g. based on what they’re doing.
- Smart Charts: Create the right chart that solve the user’s question.
- Adaptive Back-end
- Dynamic APIs: Create endpoints on the fly based on user needs
- Dynamic Indexing: Create & update indices on the fly based on user needs
- Dynamic Schema: Create & update schema on the fly based on user needs
- Dynamic Migration: Migrate to a new database or OS or language as required
- Dynamic Queries: Create SQL/NoSQL queries to solve the user problem
- Dynamic RBAC: Figure out who needs permissions and why. Add OR REMOVE access as required
- Dynamic Logging. Log what’s required. Explain why it’s logged and what’s happening. Fix code that raised the error
- Dynamic Caching. Cache what’s likely to be required. Evict what may not be required. Figure out cache keys.
- This is fundamentally altering the programming paradigm by which we iterate on our software, as the teams split in two:
- Aider LLM Leaderboards show which LLMs code better. As of now,
- o1-preview > claude-3.5 sonnet on code editing
- claude-3-opus > claude-3.5-sonnet on code refactoring
- deepseek-coder-v
- gpt-4o-mini sucks.
- Jaro-Winkler Distance is a string matching algorithm that weights the start of a string higher.
- Passing the feed of the following to NotebookLLM is a good way to get caught up with news and summaries.
- A blog / WhatsApp group (e.g. The Generative AI Group, Sithamalli, etc.)
- A Google Group / mailing list (e.g. genainews, datameet)
- YouTube channels (e.g. Vertiasium, GitHub)
- Hacker News top stories
- Research papers
- Emails (skipping marketing emails)
- OpenAI Evals and Distillation has a clever design. They just convert filtered history to .JSONL files that can be an input to either.
- Speak is a language learning app based on OpenAI’s Realtime API.
- OpenAI’s Realtime API can be used in a text-to-text chat mode without needing to send the entire context. If the pricing works out right, this can be far cheaper than sending the entire conversation context. Ref
- Matching addresses with just embeddings works well. Combine it with simple hard rules. Ref
- OpenAI’s prompt caching works for images too – both linked and embedded
- Quotes on Graph RAG from a Generative AI WhatsApp Group.
- “Damn so literally nobody uses Graph RAG yet. Good to know.” ~Sumba
- “A big four consulting firm uses GraphRAG to retrieve related documents and excerpts from governance and compliance docs.” ~Vinayak Hegde (Microsoft)
- “Graph RAG is expensive and unnecessary in most of the cases.” ~Utkarsh Saxena
- ChatGPT’s advanced mode includes: “…you can use various regional accents and dialects.” Ref Source
- But the API can “laugh, whisper, and adhere to tone direction.” Ref
- Hume API (INR 6/min) is far cheaper than OpenAI’s real-time chat (6c/min input + 24c/min output)
- Devika is an open-source clone of Devin.
- DuckDB runs inside Pyodide
- Hungarian Jews have genetic diseases that increase their IQ. Gaucher’s disease, Torsion dystonia.
- People don’t like hard stuff like maths or science, so richer societies have fewer scientists
- Ethan Mollick feels Claude 3.5 Sonnet is better at style and critiquing blog posts than OpenAI’s o1 (which is better at reasoning.)
- News is going to be crazily disrupted again with voice mode. I can just listen to the topic I want
- In Singapore Airlines,
- You can’t wear your seatbelt loose
- You have to keep the laptop in the pocket in front, not on your lap, during takeoff
- You can’t charge during takeoff
- They verify if you ask for a veg meal and place a sticker on your seat
- Coders are more likely to edit LLM code. Non-coders don’t have that bad habit.
- Coders are likely to get more out of an LLM because they know what it can do. But some non-coders will get more out of an LLM because they don’t know what it can’t do.
- E.g. Indal trying for a confetti animation, which is hard but do-able
- “You have to put in a lot of work to become productive at AI coding.” Simon Willison