This week, I learned:
- What to use for hosting: ChatGPT
- GitHub Pages: Static websites, medium files
- Cloudflare Pages: Static websites, global delivery
- Vercel: Frontend frameworks (e.g. Next.js) with high DX and ISR, small files
- Netlify: JAMstack projects, minimal back-end, moderate files
- Glitch: Small static projects
- Render: Full-stack apps requiring databases and server-side compute
- Firebase Hosting: Small sites, limited large files
- Archive.org: Public archival, large files
- Google Drive: File sharing, large files
- Dropbox: File sharing, moderate files
- Cloudflare R2: Static assets, large file delivery
- Anthropic defines agents. Building effective agents + Cookbook
- Augmented LLMs are LLMs enhanced with augmentations such as retrieval, tools, and memory.
- Workflows are systems where LLMs and tools are orchestrated through predefined code paths.
- Prompt chaining: Pipe each LLM output to the next LLM. A->B->C->Z. E.g. Write report, then translate. Extract results, then verify them. Successively ask follow-up questions.
- Routing: One LLMs decides which other LLM to call next. A->B|C|D->Z. E.g. Evaluate complexity, then pick the right model. Classify request time, then pick the right prompt.
- Parallelize: Sectioning (and Orchestrator-workers): Break tasks into independent subtasks, then aggregate. A->B+C+D->Z. E.g. Evaluate contracts against different clauses in parallel.
- Parallelize: Voting: Run same task multiple times, then vote. A->B+B+B->Z. E.g. Review code for prompt injection using different prompts. Evaluate content safety with different thresholds.
- Evaluator-optimizer: One model checks another in a loop. A->B->A->B->…->Z. E.g. Literary translation. Self-healing code. Policy violation checks.
- Human-in-the-loop Checkpoints: The workflow explicitly requests human review at certain stages. A->B->(Human)->C->Z. E.g. Sensitive content review. High-stakes decision making. Ambiguous tasks.
- Agents are LLMs that dynamically direct their own processes and tool usage, consulting tools or the user as needed.
- To download YouTube subtitles, use:
yt-dlp -q --skip-download --convert-subs srt --write-sub --sub-langs "en" --write-auto-sub --print "requested_subtitles.en.url" "$url"Simon Willison - o1-preview diagnoses better than doctors. Harvard
- OpenAI’s release of ephemeral tokens via sessions (valid for 1 minute) are a useful way of exposing apps for public demos. Currently it works only for the Realtime API, though.
- SpreadsheetLLM is a way of encoding spreadsheets in an LLM friendly format. It’s good for 1K+ rows. For lower, Markdown > XML > HTML. However, Table Meets LLM suggests that HTML > XML > Markdown, so this is unclear.
- #HARD prompt. Ask video generators like SORA to generate text in videos. It is of average quality.
- GPT 4o Mini Realtime was released. A realtime conversation will cost ~50c/hr. About 36c for input, 72c for output. (I extrapolated from the 6c/min audio input cost for GPT 4o Realtime when it was $100/MTok. GPT 4o Mini Realtime is $10/MTok input and $20/MTok output.)
- This is an interesting way to understand software.
Generate a Mermaid sequence diagram showing interactions based on this code.Ref - The King James Bible and all Harry Potters, each, are about $1M tokens (rounded off).
- markdown2 is the new de facto Markdown library for Python.
- Claude 3.5 Sonnet is way ahead of competition on the LMSYS Webdev Arena
- Raspberry Pi 5 has a faster CPU, more RAM and GPU, 4K support, multiple USB 3 ports
- Government websites like the official press releases cannot be crawled from outside India. Hence the need for server farms in India!