June 8, 2025

This week, I learned: There’s a very interesting HN discussion on the AI coding of CloudFlare Workers OAuth Provider. My takeaways: #ai-coding Write very comprehensive specs. Use LLM to create the specs. Reviewing is a skill we need to develop. Understanding others’ code takes effort. But LLM code is easier to review because it’s immediate and has no ego. Unit tests are critical. Use LLMs for well understood specs, APIs, platforms and libraries to really save time. Logic-less stuff like Markdown, JSON and HTML templates are a LOT easier to verify. Do more of that. We can only make so many decisions in a day. AI coding saves us that effort. Experts are not experts in every area. They benefit from LLMs in other areas. LLMs are great for rubber ducking. Speaking and speccing really help. LLMs make mistakes. So do most humans. LLM speed makes coding more exhausting. Use LLMs to understand codebases. AI coding could reduce demand for developers. E.g. Sysadmin demand plummeted with cloud infra and infrastructure-as-code. But, niche use cases could grow, like how demand for photographers grew despite point-and-shoot cameras. Transaction cost of hiring even 1 person is high and that will likely be a bottleneck. Plus people can use LLMs themselves, so that will dampen niche demand. Google Introduced Google Vids last year. It’s a video creator styled like PowerPoint. Looks promising. FastMCP looks like an easy way to build MCPs. (Yet to try it) O3 and to a lesser extent, Claude Sonnet 4, are the models that can accurately summarize complex subjects and create a list of links without hallucinations. Ref Claude Trace lets you record all interactions with Claude Code. Elevenlabs now supports emotion and interruption. Ref Thinking longer alone is not enough to scale intelligence. We need better models, too. Ref Indian High Court judgements are now available as a public dataset on AWS and updated periodically. Ref A few observations in AI code editors’ styles. O3 is better at finding bugs than Jules, which tends to try and fix them rather than discover them. Codex writes more minimal edits in PRs than Jules, which is more verbose. Claude Code remains the best at faithfully creating and updating front-end apps. Deep Research is great for fact-checking my notes! ChatGPT Web bench evaluates LLMs in web development. Claude Sonnet remains ahead. Vision language models heavily rely on past training and miss changes they don’t expect. Ref Pure CSS tooltips are possible. Julia Evans Google has an OAuth Playground which is a convenient way to get a temporary OAuth token. At the moment, the best speech to text for Android appears to be ChatGPT’s transcription. The default Android text to speech (which I thought was good) no longer feels adequate. Gemini mis-hears and doesn’t wait till I’m done. Whisper ASR has poor noise cancellation and a 30 second limit. anyascii is a better alternative to unidecode. It supports more characters and also supports transliteration. I use it to strip out non-ASCII in ChatGPT’s output. Commit DeepWiki creates docs for humans GitHub repos. Example. It’s verbose, human-facing, and does not understand the nuances of context and implications. Context7 creates llms.txt for LLMs. Example. It’s concise, example-oriented, and works only if there are code snippets relevant (e.g. API calls) that can be generated from the codebase. Like creating an llms.txt automatically, e.g. https://context7.com/textualize/textual/llms.txt #ai-coding We will move towards an organization structure where developers are embedded with business teams rather than working as a separate group. Sort of like embedded executive assistance instead of a central typing pool. Making AI Work