This week, I learned:

  • What to use for hosting: ChatGPT
    • GitHub Pages: Static websites, medium files
    • Cloudflare Pages: Static websites, global delivery
    • Vercel: Frontend frameworks (e.g. Next.js) with high DX and ISR, small files
    • Netlify: JAMstack projects, minimal back-end, moderate files
    • Glitch: Small static projects
    • Render: Full-stack apps requiring databases and server-side compute
    • Firebase Hosting: Small sites, limited large files
    • Archive.org: Public archival, large files
    • Google Drive: File sharing, large files
    • Dropbox: File sharing, moderate files
    • Cloudflare R2: Static assets, large file delivery
  • Anthropic defines agents. Building effective agents + Cookbook
    • Augmented LLMs are LLMs enhanced with augmentations such as retrieval, tools, and memory.
    • Workflows are systems where LLMs and tools are orchestrated through predefined code paths.
      • Prompt chaining: Pipe each LLM output to the next LLM. A->B->C->Z. E.g. Write report, then translate. Extract results, then verify them. Successively ask follow-up questions.
      • Routing: One LLMs decides which other LLM to call next. A->B|C|D->Z. E.g. Evaluate complexity, then pick the right model. Classify request time, then pick the right prompt.
      • Parallelize: Sectioning (and Orchestrator-workers): Break tasks into independent subtasks, then aggregate. A->B+C+D->Z. E.g. Evaluate contracts against different clauses in parallel.
      • Parallelize: Voting: Run same task multiple times, then vote. A->B+B+B->Z. E.g. Review code for prompt injection using different prompts. Evaluate content safety with different thresholds.
      • Evaluator-optimizer: One model checks another in a loop. A->B->A->B->…->Z. E.g. Literary translation. Self-healing code. Policy violation checks.
      • Human-in-the-loop Checkpoints: The workflow explicitly requests human review at certain stages. A->B->(Human)->C->Z. E.g. Sensitive content review. High-stakes decision making. Ambiguous tasks.
    • Agents are LLMs that dynamically direct their own processes and tool usage, consulting tools or the user as needed.
  • To download YouTube subtitles, use: yt-dlp -q --skip-download --convert-subs srt --write-sub --sub-langs "en" --write-auto-sub --print "requested_subtitles.en.url" "$url" Simon Willison
  • o1-preview diagnoses better than doctors. Harvard
  • OpenAI’s release of ephemeral tokens via sessions (valid for 1 minute) are a useful way of exposing apps for public demos. Currently it works only for the Realtime API, though.
  • SpreadsheetLLM is a way of encoding spreadsheets in an LLM friendly format. It’s good for 1K+ rows. For lower, Markdown > XML > HTML. However, Table Meets LLM suggests that HTML > XML > Markdown, so this is unclear.
  • #HARD prompt. Ask video generators like SORA to generate text in videos. It is of average quality.
  • GPT 4o Mini Realtime was released. A realtime conversation will cost ~50c/hr. About 36c for input, 72c for output. (I extrapolated from the 6c/min audio input cost for GPT 4o Realtime when it was $100/MTok. GPT 4o Mini Realtime is $10/MTok input and $20/MTok output.)
  • This is an interesting way to understand software. Generate a Mermaid sequence diagram showing interactions based on this code. Ref
  • The King James Bible and all Harry Potters, each, are about $1M tokens (rounded off).
  • markdown2 is the new de facto Markdown library for Python.
  • Claude 3.5 Sonnet is way ahead of competition on the LMSYS Webdev Arena
  • Raspberry Pi 5 has a faster CPU, more RAM and GPU, 4K support, multiple USB 3 ports
  • Government websites like the official press releases cannot be crawled from outside India. Hence the need for server farms in India!