Things I Learned - 11 May 2025

I discovered how double-checking LLM outputs can slash error rates and compared Anthropic's new search tool pricing. I also found snapdom for element capture, explored Gemini's prompt caching, and documented some prompt evaluation frameworks.

This week, I learned:

snapdom is a fast, light, element capture alternative to html2canvas but doesn’t work well with non-CORS images or iframes.
Sli.dev is a Markdown slide language. Similar to Marp
Don’t split your code into microservices until you need to scale. Ref
Vibe coding is like getting others’ code to work, which is exactly what most devs do. Simon Willison #ai-coding
Tofu Yakitori is a Japanese dish. It’s like a dhokla. Marinated tofu cubes brushed with that sweet‑savory tare (soy, mirin, sake, a hint of sugar), then grilled until caramel‑charred. One of the better (tasty + different) dishes I’ve had recently. I used ChatGPT to remind me of the dish name.
Trust, attitudes and use of artificial intelligence surveyed ~1,000 people across 47 countries on their views on AI. PDF
- Emerging economies trust and use AI more. It’s an opportunity to leapfrog.
- 26% of students use AI daily (vs 17% employees). Efficiency is the main benefit.
Gemini APIs now have automatic caching for 75% cost reduction if message is >1K (Flash) or >2K (Pro) tokens. Ref
YOLO is much better than Gemini at object detection. Use for pro-processing. Ref
Using [[n]] is probably the best citation format for inline search references in RAG. ChatGPT
⭐ Double-checking is surprisingly efficient since LLM hallucinations are mostly uncorrelated. LLMs perform human tasks (e.g. classifying customer support messages) at ~85% accuracy. This might be unacceptable. But by asking 2 moderately correlated LLMs and double-checking discrepancies, we reduce automation by ~20% but reduce errors to 0.25%. Triple-checking reduces automation by ~25% but errors to under ~0.01%! Ref
Anthropic introduces web search in the API at $10 / 1K searches. Here’s how it compares:
- $0.1: DuckDuckGo Search API (RapidAPI) (monthly pricing)
- $3: Brave Search API
- $5: Google Custom Search JSON API
- $15: SerpAPI
- $10: Zenserp
- $10: Anthropic Web Search Tool
- $25: Bing Search API
- $35: Gemini API
- $35: OpenAI API
India attacked Pakistan!
⭐ When writing notes, summarize at the end of the day the learnings and next steps.
GitHub does not let you control the cache duration, but there are many creative workarounds. ChatGPT
- HTML meta tags: <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate">
- Use a service worker (blog)
- Proxy through a CDN. Cloudflare, Netlify
- Move to another static host: S3 + CloudFront, Heroku, Vercel, Surge, Firebase Hosting
Notes from the PromptEvals paper:
- Good evals must be:
  - Objectively MEASURABLE (even if by an LLM). Otherwise, we won’t know if it’s right.
  - Directly RELEVANT to the input/prompt. Otherwise, we’re not evaluating the input.
- Typical evals fall into 6 categories
  - Structured output: Adhere to a schema (Markdown, HTML, DSL, JSON + Schema)
  - Multiple choice
  - Length constraints: N characters, words, sentences, list items, etc.
  - Semantic constraints: Exclude terms, topic relevance, follow grammar, etc.
  - Stylistic constraints: Style, tone, persona
  - Prevent hallucinations: Factual accuracy. Instruction following

Related