Things I Learned - 23 Nov 2025

This week, I learned: Here are some new CLI tools I installed: vd (visidata): Terminal spreadsheet viewer & editor for CSV, Excel, JSON, SQL, Parquet, etc. qsv: Fast CSV command line toolkit for slicing, filtering, aggregating, and analyzing CSV files. rga (ripgrep-all): ripgrep that searches PDFs, Office docs, EPUBs, zip files. pdfcpu: PDF processor for splitting, merging, optimizing, and manipulating PDF files. gum: Stylish CLI tool for creating interactive prompts, confirmations, and more. Models read pretty fast, consuming input tokens at ~4K-20K words per second. It’s the “speaking” (output token rate) that is the bottleneck. So shortening input doesn’t matter as much as shortening output for latence. ChatGPT When building agents, as of now, prefer native provider SDKs (OpenAI Agents SDK, Anthropic SDK) over even light abstractions like Vercel AI SDK or Pydantic. There are subtle issues related to error messages, response handling, cache handling, etc. that trip up abstractions given how early things are. Armin Ronacher Gone are the times when LLMs couldn’t do mental math. Now they’re computing base64 and SHA256 from memory, without needing code! Example Organizing a round table event in Singapore costs ~$75-150. Here’s what drives the cost variation # 50%: brand/location. 25%: food and beverage. 15%: duration (full day is only slightly more expensive than half day) 10%: date, demand, etc. 10%: add-ons: AV, etc. OpenRouter supports embedding models. BGE base seems pareto optimal with 0.5 cents / MTok and a good MTEB ranking. TOON vs JSON. Early days, and TOON seems to be marketing a lot, so I’m wary, but for large tabular data where input tokens are crunched, it seems a readable alternative to multiple CSVs, but not worth the hype. 0 19 Nov 2025. Always use GPT-5.1-Codex-Max instead of GPT-5.1-Codex. At every thinking level, it takes fewer tokens for similar or higher accuracy. Tibo ug -i --smart-case --bool 'word1 word2 ...' seems the cleanest way to find files that have all words. –smart-case uses case-insensitive if all words are lowercase, else case-sensitive. Examples: ug --bool '"exact phrase" word2' # exact phrase + other tokens anywhere ug --bool 'word1 word2 -word3' # must contain word1 AND word2, but NOT word3 ug --bool '("foo bar") OR baz' # grouped expressions and OR ug --bool 'word1 NEAR/5 word2' # match when words are within 5 tokens/words ug -Z2 'word' # allows up to 2 typos in 'word' ⭐ ug -i --smart-case --bool -Q lets you interactively search within files. This is the coolest feature! Fixing laptop issues is clearly a whole lot easier with an AI chatbot. I fixed these Ubuntu issues purely using Claude. It told me what to run. I ran it, shared the output, it diagnosed, told me what to do next, etc. until the issues were fixed. For example: My keyboard shortcuts stopped working. It turned out I edited my media-keys.dconf and removed the trailing slash. # A 3-finger tap mapped to a middle click and I couldn’t remove it. It turned out my touchegg.conf explicitly had this mapping. I disabled it. # My gnome extensions would get disabled every time the screen went to sleep. It turned out my extension cache was corrupted or stale. sudo apt install --reinstall gnome-shell-extension-manager and rm -rf ~/.cache/gnome-shell/ fixed it. # GhostScript seems the best way to compress PDFs via the CLI. Example: gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf Pandoc supports Lua filters which are a powerful way to customize the document conversion process. Here is a Lua filter that converts horizontal rules in a markdown document to page breaks and preserve in a Word document (OpenXML format) function HorizontalRule() return pandoc.RawBlock('openxml', '<w:p><w:r><w:br w:type="page"/></w:r></w:p>') end readpst - via sudo apt install pst-utils - extracts emails from Outlook PST files to mbox format. Useful for email migrations. Write tutorials or blog posts as you learn. Steve Klabnik Running a coding agent post mortem, e.g. “what worked well, what didn’t, and why? Next time, what are a few bullets I could include that will avoid these problems?” helps me prompt better next time. For example, Claude Code suggested: Use Firefox for headless browser automation (Chromium often crashes) Set HOME=/root when running Playwright with Firefox Start a local HTTP server rather than using file:// protocol External images may not load in screenshots due to network isolation

Thanks Pratap Vardhan – this was my best birthday gift this year! LinkedIn

Nano Banano Pro has excellent text generation (though it doesn’t always give you what you want in the first try). I couldn’t spot any errors in the generated text. Can you? I used this prompt (with the workshop details and my photo): Create a professional poster for the below, including all relevant information. Use my photo (attached) professionally. The NPTEL workshop is real, BTW. First 100 seats, I think. You can register here: https://elearn.nptel.ac.in/shop/iit-workshops/ongoing/computer-science/applied-vibe-coding-workshop/ ...

While meditating, I realized 75% of “LULL” is the letter “L”. (This sort of thing happens a lot when I meditate.) MUMMY (60% M) and DADDY (60% D) have lower percentage, but are longer, so maybe get a bonus? I asked Claude Code what would top such a list. It picked a dictionary, generated the 333 words with 4+ letters and >50% concentration. What did I like best? “ASSESSES”. 5/8 letters are “S”. That’s nearly two-thirds. ...

Things I Learned - 16 Nov 2025

This week, I learned: Windows 11 got some very practical updates. Notepad now supports Markdown preview natively. MS Paint has an opacity filter. Microsoft Copilot can share screens and speak/listen. Things I learn when Ubuntu drivers crashed on my laptop: The SG.GS Ubuntu ISO mirror is a lot faster than the official Ubuntu ISO download (5 min vs 12 hours). Rufus and balenaEtcher are the de facto tools for bootable USB drives from ISO. Gemini 2.5 Flash Image is not great at generating text. But a clever a workaround is to provide the rendered text as an image input! Also, Gemini 2.5 Flash Image seems to ignore commands that try style transfer (e.g. “turn me into Studio Ghibli”). GemImg FLIP animation is an efficient animation technique. Capture the First position Apply the Last position (changing position, size, rotation, etc.) Invert, i.e. apply just the transform that’ll move it back to the First position Plan the animation. This only needs to change transform, hence no DOM reflow. Asking coding agents to create a codemod for large-scale refactoring works well Peter Steinberger When to quit vs persist. # # Do stats/signals support positive outcome? QUIT if not. Crossed any limits you set for yourself? QUIT if so. (Run pre-mortems to find these stats/signals and limits.) Is the decision hard to reverse AND uncertainty high? QUIT if so. Else you can experiment cheaply. (Create reversibility.) Are youI continuing because of past effort or pride? QUIT if so. (Set review cadence.) Is there a better alternative? SWITCH if so. (Get outside help.) Once a model generates an output, an agentic look tends not to change the fundamental approach and just tweaks it. So, if a solution is directionally wrong, restarting works better than iterating. Agentic Pelican on a Bicycle Reading between the lines on the Microsoft OpenAI deal: Microsoft values OpenAI’s growth (financial return) than control Neither trusts the other enough to decide what’s AGI Microsoft gets some wins: models until 2032 (even post AGI) as well as research IP. Both parties expect AGI between 2027-2030. OpenAI keeps all consumer hardware - so is betting hard on hardware. It’s more Apple than Microsoft territory Divorce preparation: Microsoft can pursue AGI with other partners. OpenAI can purchase compute from anyone and release open weights models. Infra has more value than model dev! OlmoEarth is a set of image models trained on labelled geospatial data. That’s useful for deforestation and land cover monitoring, wildfire detection, urban growth monitoring, crop mapping, etc. The models are open weights and can be fine-tuned. Claude Code’s output styles are a way of using Claude Code for anything (e.g. writing, analysis, research, personal advice, etc.), not just coding. Create a ~/.claude/output-style/your-style-name.md and run /output-style your-style-name to replace the system prompt will be replaced. You can also use the --system-prompt and --append-system-prompt flags with the CLI. Following Ethan Mollick’s lead I asked: I can travel back in time to any time before 1500 in India and change only one thing. What is the single thing you would change? Nothing obvious.. ChatGPT: Create a single, simple, phonetic script for all public life in India around 1100 CE. Claude: institutionalize systematic historical recordkeeping, introduce limited liability commercial entities, and mandate systematic translation of Sanskrit technical texts into all major regional languages. How about now? ChatGPT suggests: make all public rules and records computable by law. Claude suggests: make all state-level entitlements and civil documentation fully portable across India. For the first time in history, Russian troops surrendered to a wheeled drone that carried 138 pounds of explosives - Washington Post. Given the cost and accessibility of drones, I guess drone terrorist attacks will soon emerge. HTML + JS apps will last longer than server-side apps and it makes sense to write more of those. For essential back-end services, keep them generic. Specific services layers I see are: Auth (e.g. Google Auth, Auth0, Supabase, …) Storage (e.g. Supabase, Firebase) LLMs (e.g. OpenAI, Claude, OpenRouter) Communications (e.g. EmailJS) … #TODO Extend with LLMs https://gistpreview.github.io/ is an unofficial GIST preview tool. It accepts a ?GIST_ID and displays the gist as a standalone HTML page. Simon Willison XSLT is deprecated in Chrome. So the <script> tag in XML will become the new way of rendering RSS/Atom. This is one of the rare “break-the-web” changes from browsers. Simon Willison “India has absurdly low internal migration - around 9% annual migration rate versus 25-30% in China or the US. Not because people don’t want to move, but because the cost of moving is artificially massive. You lose your ration card, state entitlements, kids’ school continuity, voting rights, …” # Rolf Dobelli’s The Not To-Do List is a good application of inversion. Also, the chapter titles themselves explain most of the message, which is very helpful. Just thinking about any of these can be a useful path to improvement. Let things fall apart Feed your weaker self Be unreliable Be an asshole Have high expectations Drift through the day Mess up your marriage Be a quitter Be hypocritical Cling to your bad habits Set the wrong goals Drink yourself miserable Get involved in other people’s drama Only learn from your own experience Be hyperactive on social media Indulge in road rage Surround yourself with negative people Micromanage your neighbours Say yes to drugs Get stuck in your career Never be playful Feel guilty Practise ingratitude Trust your banker Be paranoid Make other people feel unimportant Live in the past Listen to your inner voice Expect rationality Get nihilistic Catastrophize Consider money unimportant Cultivate a victim mentality Become a lapdog Get rich quick, get smart quick Ruminate Trade your reputation for money Never suffer Let your emotions define you Try to end it all Marry the wrong person – and stay with them Celebrate your resentment Join a cult Try to change people Say everything you think Spin multiple plates Do only shallow work Invite bad people into your life Go where the competition is strong Say yes to everything Crowd your life with gadgets Fall into the content trap DeepSeek-V3.2-Exp has linear inference time, i.e. longer inputs don’t take longer time. It picks the top 2K most relevant tokenss from the input instead. This can make model inference cheaper and faster. California’s Bill AB 316 makes the people who build autonomous systems liable for their actions. That’s quite a step. Udio and Universal are launching a platform to generate music in the style of famous artistes. An interesting new way to monetize. Fingerprinting music is a hot area. VaultGemma shows a fine-tuning approach that eliminates personal info that appears only once from memorization. It works by adding noise to weights and capping weights updates so that no one example has undue influence. Model quality is mostly the same. Amazon is giving drivers smart glasses to scan packages, get directions, capture proof of delivery and detect hazards. Cool! TechCrunch ⭐ Over 3 months, I’ve recorded ~180 calls. Processing each costs ~1.25 cents (GPT-5) and 1 year’s conversations cost ~$9. That’s incredible value for money if I hired GPT-5 / Codex as a data-driven personal coach to guide me on: What are my blindspots? That is, feedback people share with me that I ignore? What are the clusters of persona that I interact with and which of these have a positive and negative influence on me? Where am I am being unreliable? Where am I being an asshole? Where are my expectations high? Where are they low? Where would the opposite have helped? Where do I quit early? Where do I persist? Where would the opposite have helped? What good habits should I continue? What bad habits should I stop? What are the strongest opportunities to thank or praise that I missed? Is there a pattern? What triggers could I use to build this habit? Where have I tried to change people? Where have people tried to change me? Where have I spotted wrong questions? That is, rather than answering the question, I spotted the more apt question and answered that instead? … and a hundred other questions that I wouldn’t even know to ask. Sub-agents can run parallel / independent tasks while keeping the context window small. (But the advantage over xargs seems marginal.) Simon Willison Document, lint, type-check, add test cases (or other similar tasks) for all folders in a monorepo. Research and create a report for each topic in */RESEARCH.md. Synthesize learnings from each conversation in transripts/*.md. “If you’re signed into sensitive accounts like your bank or your email provider in your browser, simply summarizing a Reddit post could result in an attacker being able to steal money or your private data.” Brave OpenAI Atlas has a “Watch Mode” that will stop working if you move away from that tab. Useful to keep an eye on sensitive sites. Simon Willison “… image editing platforms seem like they’ll eat and subsume Photoshop… modern image editors – especially Nano Banana from Google Gemini – … they’re extremely effective and, increasingly, instructable” - Import AI. Facebook now suggests edits to photos - TechCruch. WebPerl runs Perl in the browser via WebAssembly. Simon Willison

When I realized Aishwarya Rai begins and ends with AI, I had to find out if there were more like her. It took a coding agent (Claude Code in this case) 10 minutes to find the 10 celebrities who share that distinction, at least across the 24,086 names on Wikipedia: Ai Nagai - Japanese playwright Aiguo Dai - Chinese-American atmospheric scientist Ai (poet) - American poet Aisea Nawai - Fijian rugby player Ai (singer) - Japanese-American singer Aisha Chughtai - Pakistani actress Aiyappan Pillai - Indian social reformer Aizawa Seishisai - Japanese Confucian scholar Ainmuire mac Sétnai - Irish high king Aisha Yousef al-Mannai - Qatari artist Glory be to these AI bookends! ...

Habits of a code addict

AI can be held to account

“Humans can be held to account. Not AI.” I hear this often. But it’s not true. Corporations are non-human, but they can enter into contracts and face criminal charges. Ships can be sued directly. Courts can arrest the vessel itself. Deities and temples in India can own property. Forests and rivers in New Zealand, Colombia, Spain, have been granted legal personhood. Medieval Europe has held animal trials (e.g. for “guilty” pigs). ...

I always wondered why old movies are rated so high on IMDb. For example, 12 Angry Men (1954) with just ~900K votes ranks about as high as Inception (2010) with ~2M votes. Few people I know have seen 12 Angry Men. So where does this high rating come from? My theories were: Old movies really are that good. IMDb’s algorithm is biased towards old movies. People remember older movies fondly. Actually, it’s none of these. It’s selection bias. ...

If a bot passes your exam, what are you teaching?

It’s incredible how far coding agents have come. They can now solve complete exams. That changes what we should measure. My Tools in Data Science course has a Remote Online Exam. It was so difficult that, in 2023, it sparked threads titled “What is the purpose of an impossible ROE?” Today, despite making the test harder, students solve it easily with Claude, ChatGPT, etc. Here’s today’s score distribution: ...

Things I Learned - 09 Nov 2025

This week, I learned: “But when an identity based belief was challenged, the brain responded as if under physical attack.” Why Engineers Can’t Be Rational About Programming Languages Notes from How to build a cult, Lulu Cheng, The Knowledge Project podcast Conviction is infectious. Communicate at the INTERSECTION of interests. Learn theirs Begin with “why your story matters to them” (first sentence). That beats “how you tell it” > “where you tell it”. The easiest way to align with an audience is to find your community. Humor, curiosity, awe, any strong emotion is a hook. Culture has momentum. Best way to break it is to show an alternative that works. People will copy that REPEAT messages over and over with complete CONVICTION to convince people who TRUST you. That works, but you need all three. Trust builds from likeability, repeated exposure, common beliefs. An excellent way to defend against online criticism (when it matters) is to just SHOW UP and THANK them for feedback. Serious reputational damage must either be fixed immediately - or you live with it forever. Between a story and statistics, the story will always wins. Never fight a story with a statistic. Dig into your statistics and uncover BETTER stories. ⭐ Prebuttals are a great idea. Start with all possible criticisms yourself and diffuse them. The other person has nothing left to say Sparring keeps you sharp. Spar with LLMs. To defend, show how the attack targets other people, increasing the surface area. Show how the SPECIFIC attack targets a larger group. Create a SPECIFIC cause worth fighting for. Each role has specific objective to optimise for. The leader’s role is to balance across these. Cheerleader effect. People look beautiful next to a cheerleader. Associations taint. Each person has dozens of aspects to their persona. We cannot remember all of them. Each person can make a choice on who they project themselves to be in any group. Shaping their persona. The Rainbow CSV extension may be causing delays (infinite spinner) when pasting Markdown in VS Code. Restarting it seems to fix the issue. ⭐ Claude scientific skills is a collection of skills teaching Claude how to use scientific libraries, databases, and APIs across several domains. This may be a good example of a non-trivial skill library - that is hard for AI coding agents to infer by themselves. Notes from How I use every Claude Code feature Use AGENTS.md as guardrails, not a manual. Document what it gets wrong. Use self-documenting tools/APIs rather than documenting. Docs: Explain why and when to read each doc. Never say “Never.” Explain when to which which alternative. Prefer CLIs for stateless tools, MCPs for stateful, authenticated, or complex (e.g. Playwright). Coding agents work well with version control. Simon Willison Break up uncommitted changes into small commits Rewrite branch history for readability Use gh CLI to fetch line-wise comments from a PR and make requested changes (e.g. renaming, refactoring, adding types, etc.) ⭐ When using MCPs or tools with private data, “color untrusted content in red, unsafe actions in blue, and never mix colors.” Good advice. ⭐ DeepWiki offers a codemaps feature that explains code in an interactive way. It shows a structured explanation on the left. You can click on any note to see the code on the right. It’s an effective way to understand how a library or tool executes a task. Here’s an example of how Mermaid works. Gemini offers RAG with free storage. RAG costs are quite high. This simplifies the process a lot. But I tried running the sample program and after an hour, it still had not completed uploading a single file. Best to wait and watch. OpenRouter supports embedding models using an OpenAI-like API Kimi K2 Thinking seems popular because It’s an open-weights model on par with the top models on Humanity’s Last Exam (text-only) and BrowseComp Can run 200-300 tool calls without human guidance 4x cheaper than GPT-5 with low tokens (32B active on 1T parameters, INT4 quantized) Based on responses to Simon Willison’s question, ChatGPT Fine-tuning helps when: Lower latency, e.g. for type-ahead, at lower cost (37 mentions) Structured extraction, parsing and classifiers, e.g. postal address, detecting secrets (18 mentions) Custom vision models, e.g. check containers (12 mentions) Domain-specific code and stacks (niche languages, stack-specific generation, text→SQL) (11 mentions) … and a long tail. Fine tuning does not help: When A base model plus prompting or RAG does as well or better (15 mentions) When you risk being leapfrogged by a new release (4 mentions) When cost and data do not justify the ROI (3 mentions) The data I can export from my Android phone includes the below. 🟢 indicates it’s tracked. 🟡 might need action, e.g. enabling / coding. # 🟢 GPS/GNSS location (current & history). Turn on device Location. If you want a timeline you can export, enable Google Location History and later export via Google Takeout → Location History (JSON/KML). 🟡 GNSS raw measurements (engineering traces). Android exposes GNSS “raw” logs on many devices; capture with dev tools or logging apps if supported (intended for research). See GNSS Raw Measurements API. 🟢 Wi-Fi scans (nearby SSIDs/BSSIDs). Toggle Location scanning → Wi-Fi scanning in Location settings; apps need location permission to read results. 🟡 Wi-Fi RTT distance to APs (indoor ranging). Apps can use Wi-Fi RTT (802.11mc/az) to measure distance to compatible APs; requires location permission. 🟢 Bluetooth proximity/traffic. For packet-level logs, enable Developer options → Enable Bluetooth HCI snoop log, then pull /sdcard/btsnoop_hci.log (Wireshark). 🟢 Cell towers (IDs, signal strength). Apps can read via TelephonyManager (e.g., getAllCellInfo()), with appropriate telephony permissions. 🟢 Activity recognition (walking, running, in vehicle). Apps must request ACTIVITY_RECOGNITION (runtime) from Android 10+. 🟢 Steps (step counter / detector). Use sensors API; from Android 10+ you must declare ACTIVITY_RECOGNITION to access step counter/step detector. 🟢 Accelerometer / gyroscope / magnetometer streams. Apps read via SensorManager; some high-rate reads require HIGH_SAMPLING_RATE_SENSORS. 🟢 Ambient light / proximity. Read via SensorManager; typically no special permission. 🟢 Google Fit data (steps, workouts, heart rate from wearables, etc.). Manage and export from Google Fit / Google account Download your data. 🟢 Contacts. MIUI → Settings → System apps → Contacts → Import/Export to .vcf (vCard). 🟢 Call history / SMS (device). MIUI local/cloud backup can include call logs & messages; export by creating a local/Cloud backup and downloading. Note: 3P apps can’t read call/SMS logs unless they’re the default dialer/SMS. 🟡 Gmail, Calendar, Contacts (Google). Export via Google Takeout (MBOX/ICS/CSV etc.). 🟡 WhatsApp / Telegram / Signal chats. Use in-app exports: WhatsApp → Export chat, Telegram Desktop → Export, Signal → encrypted backup. 🟢 Advertising ID. View/reset in Settings → Google → Ads (wording varies), per Google help on Ad ID reset. 🟡 Per-app screen time / unlocks / opens. Third-party “usage” apps (e.g., analytics or “digital wellbeing” clones) require Usage Access (PACKAGE_USAGE_STATS). Use Android’s UsageStatsManager or apps that export CSV. Stock Digital Wellbeing does not offer an export. 🟡 Notification history (last 24h). Settings → Notifications → Notification history → On. OEM-optional, but present on most devices. Viewable once enabled. 🟡 Notification content stream (live). Grant an app Notification access to capture/export notifications going forward. (User-granted API via NotificationListenerService.) | 🟢 Per-app data usage (mobile/Wi-Fi). Apps/ADB can query NetworkStatsManager; Settings shows per-app totals. Advanced dumps via adb shell dumpsys netstats. 🟡 Wi-Fi detailed logs. Developer options → Enable Wi-Fi verbose logging for richer diagnostics. 🟡 Bluetooth packet logs. Developer options → Enable Bluetooth HCI snoop log; export file and analyze in Wireshark. 🟢 Per-app storage usage. Apps/ADB can query StorageStatsManager; Settings shows per-app storage. 🟡 Photo/video metadata (EXIF incl. location). Enable “Save location” in Camera app to embed GPS in EXIF; export files normally (EXIF remains). | 🟢 Downloads & file metadata. Use a file manager or connect via USB; metadata is in the files themselves. | 🟢 Battery usage history (per-UID/app), wakelocks, jobs. Generate adb bugreport and analyze with Battery Historian or dumpsys batterystats. 🟡 System/device logs (logcat). You can view via ADB/Android Studio. Android restricts 3rd-party access to system-wide logs for privacy. 🟢 Developer quick tiles (Sensors off). Developer options → Quick settings developer tiles → Sensors off to globally cut Camera/Mic & SensorManager sensors on demand. 🟡 Google Takeout: one-stop export for Location History (Timeline), Gmail (MBOX), Calendar (ICS), Google Photos, Drive, YouTube, Fit, etc. MacroDroid, Automate and Tasker sound like powerful Android workflow automation tools. Some uses I can put it to: Automatically upload recordings to Dropbox Turn off hotspot when I reach office Vibrate if I’m walking slowly Adding <link rel="alternate" type="text/markdown" title="LLM-friendly version" href="/llms.txt"> is an emerging approach for pointing to LLMs.txt. It works. I asked Codex to read the CloudFlare vitest page. It read the file truncating the middle, found the <link rel="alternate" type="text/markdown" href="https://developers.cloudflare.com/workers/testing/vitest-integration/write-your-first-test/index.md"/ link in it, and reasoned “Considering fetching markdown instructions” and fetched the Markdown page. Giles’ Blog toon is a YAML-like format that’s LLM friendly and especially token-efficient (CSV-like) for tables. You can convert back and forth between JSON and toon. Food printing applies 3D printing techniques to create real food items. Given the art that this can create, I expect at least some adoption in niche restaurants. PMTiles lets you store map tiles as a single-file archive that libraries like MapLibre can read. Useful to avoid tile servers. Mirrow is a CLI SVG animation builder that converts a DSL to animated SVGs. However, it may be easier to use an LLM to create the animated SVG directly with SMIL than learning Mirrow (or teaching the LLM Mirrow). ⭐ One approach to giving memory (“episodic memory”) to coding agents is to allow them to search their logs.This gives them access to past discussions about a repo or other repos. To configure Gemini CLI with an AI router, set: "security.auth.selectedType": "gemini-api-key" in ~/.gemini/settings.json export GOOGLE_GEMINI_BASE_URL=https://llmfoundry.straive.com/gemini/ (or your AI router base URL for Gemini) export GEMINI_API_KEY=... (your AI router API key) Passing a HAR export to an LLM to build a scraper is a powerful idea! Lessons from Diagram Chasing Addy Osmani’s Gemini CLI tips are practical guides to using any coding agent, not just Gemini. I learnt about: Run shell commands with !, e.g. !ls -la or even !bash. It’s added to the chat. On-the-fly tool creation: ask it to write code for the task on the fly. Use it for system optimization, e.g. editing dotfiles, system customization, log error analysis, etc. Run GEMINI_SYSTEM_MD=... gemini -p "task" --yolo --format json < input.txt to run Gemini with a different system prompt and feed it input.txt to run in a pipeline. (FYI: Codex does not send a default system prompt, so there’s nothing to override.) There is a Gemini CLI Show and Tell thread with examples. This include Janitor AI, a Gemini CLI session viewer, etc. Hands on with Gemini CLI has several Use cases to try out. Renaming photos and organizing files are clever ones. AGENTS.md can be used like a decision log - rules, styles, or preferences that evolve over time - on a per-repo basis. Gemini’s /memory add feature helps with this. gemini --checkpointing is a useful “undo” feature. /restore rolls you back to a specific checkpoint. The overhead is small. Caching is only available with API key or Vertex AI, not OAuth login as of now OpenAI TTS costs are confusing. But in short TTS-1 costs $15 / MChars (max 4,096 chars per request), which ends up at ~86c / hour GPT-4o Mini TTS costs ~$16 / MChars (max 2K tokens which is ~7,000 chars per request), which ends up at ~88c / hour. Very similar cost, effectively TTS-1 HD is twice TTS-1. OpenAI has a usage API that provides cost as well as usage for completions, images, audio speeches, etc. These require an organization admin key Cost API: curl "https://api.openai.com/v1/organization/costs?start_time=$TIMESTAMP&project_ids=$PROJECT_ID&group_by=line_item" Audio speech usage API: curl "https://api.openai.com/v1/organization/usage/audio_speeches?start_time=$TIMESTAMP&project_ids=$PROJECT_ID&group_by=model"

Is all AI content slop?

Is all AI content slop? I asked Claude to: Analyze this thread. Then explain it like a Malcolm Gladwell New Yorker article. https://news.ycombinator.com/item?id=45820872 It gave me a beautiful, engaging and insightful essay about a 300+ message debate about AI vs humans on routine tasks. https://claude.ai/share/60c5810f-5c81-4970-8026-a24bf89c3392 Is this slop? One phrase stood out: There’s an irony here that the commenter doesn’t quite state but implies beautifully: we’ve spent so long celebrating automation because humans are imperfect that we’ve forgotten we also value humans because they’re imperfect. ...

OpenAI TTS cost

The OpenAI text-to-speech cost documentation is confusing. As of 2 Nov 2025: GPT-4o mini TTS costs $0.60 / MTok input and $12.00 / MTok audio output according to the model page and the pricing page. They also estimate this to be ~1.5c per minute - both for input and output. It supports up to 2,000 tokens input. TTS-1 costs $15 / MTok speech generated according to the model page but the pricing page says it's $15 / MChars. No estimate per minute is provided. Is supports up to 4,096 characters input. TTS-1 HD is twice as expensive as TTS-1 I wanted to find the approximate total cost for a typical text input measured per character and token. ...

Things I Learned - 02 Nov 2025

This week, I learned: TVMaze API is an API for TV shows, episodes, cast, crew, etc. Useful for TV-related apps as well as learning APIs. Awesome Skills is a curated list of prompts and skills for AI coding agents. ⭐ nokode is a API server that has no code: just LLMs responding. Interestingly, it is compliant. Just expensive, slow, forgetful and unreliable compared to code. All four are improving with time, indicating that coding may be transitional. Notes from Vanya Seth’s keynote at OSAI HYD Superpowers of Gen AI to keep in mind when exploring AI coding agent use cases: Translating. Requirements to code, code to code, language to queries, standard to standard. Finding info just-in-time (in context). How does this work? What’s this error? What tools are permitted in my org? Who knows what? E.g. Atlassian Rovo queries across JIRA, Confluence, etc. Brainstorming and ideation. Product ideation. Requirements. Testing gaps. Architecture review. Exploratory / scenario testing. Summarizing and clustering. Change logs, incident management, research data, docs summary. Challenges in using AI coding agents: Adoption imbalance. Only certain roles are amplified by AI. Coding, QA, more than planning, maintenance, AI ops, etc. What’s the impact of this? ⭐ Goldratt’s ToC implies that backlogs need to fill faster. Downstream becomes a bottleneck. Technical debt piles up. ACTION: Use AI across entire value chain, from research to maintenance. Locality. enhances roles (nodes), not relationships (links). They optimize local work, not global flow. Workflow tools are missing. Coordination overhead. Context Fragmentation. Translation problems. ⭐ Expand productive roles to cover neighboring tasks. Productive developers shift left and build backlogs; shift right to reduce code review, maintenance tasks. E.g. Move maintenance/production activities into development. Security, performance, monitoring, observability, cost, infrastructure. We spend time on IDE, CI/CD, Jira, Confluence, Prod observability tools. A typical Agent Development Platform (ADP) covers evals, guardrails, workflow builder, agent builder, observability, prompt management, AI gateway (LiteLLM), MCP servers, model fine-tuning, model serving, model repository, vector stores We need ADP Agents covering delivery risk, continuous security, prod issues RCA, observability, performance, accessibility, product research, infra optiimzation, test data generation, anomaly detection, release management ACTION: Share ADP photo with Patrick. ACTION: ⭐ Centralize skills (“knowledge packs”) and MCPs and observe which gets used most. Allow people to use more. Lethal Trifecta. There’s growing demand for higher productivity with AI code assistants. But the lethal trifecta makes them an attack vector. It has access to sensitive information, exfiltrate data, and read and follow unsafe instructions. Can lead to supply chain poisoning attacks. Regulated industries cannot adopt. Technical debt growth. More productivity leads to poor code quality which will slow down future work. See Software Engineering Excellence 2025 AI induced complacency. Sunk-cost fallacy on AI-generated code hurts. ACTION: Evaluate code quality continuously to reduce technical debt. Double-down on good engineering practices. Compliance. Model residency. Self-hosting is required. Data observability gaps. Data privacy, audit trails, etc. are concerns. Token economics. $20/day happens in Thoughtworks. Token cost is subsidized. Rogue AI usage. Use of dis-allowed tools; shadow IT. ROI justification. Hard to quantify productivity gains. Adoption. AI Literacy. Tap into organizational knowledge Champions & communities of practice to support cross-pollination. Use-case driven adoption. Teams identify based on AI superpowers. AI playbook. Share what worked, what didn’t work. AI automation is likely less if a high portion of work Has legal liability (e.g. pharmacist/judge vs shop attendant/lawyer) Is subjective (e.g. perfumer/auction appraiser vs lab chemist/insurance appraiser) Needs rapid contextual decisions (e.g. detective/fireman/ER vs parking enforcer) Via ChatGPT, Claude parse-sse from Sindre Sorhus is a more standards-compliant, more likely-to-be-maintained alternative to my async-sse package. Which is better: Comment A: 1 upvote, 0 downvotes (100% positive) or Comment B: 99 upvotes, 1 downvote (99% positive)? Use Wilson’s Lower Bound which measures “What % positive am I 95% confident of?” Claude Using this, we can measure metrics for tweets, like below. ChatGPT Popularity = (5 _ WLB(reposts / views) + 2 _ WLB(likes / views)) * Decay(half-life of 72 h) Memorability = (5 _ WLB(bookmarks / views) + 4 _ WLB(replies / views)) * Decay(half-life of 36 hours) A nice visual “benchmark” of text-to-image and image editing models. Seadream 4, Gemini 2.5 Flash, and Qwen Image Edit lead. This includes examples like straightening te Tower of Pisa - which only Flux.1 and Seadream 4 do well on; or removing only the brown M&Ms - which only Qwen Image Edit manages to. Arch is a pure LLM router. It supports multiple LLMs, flexible routing and observability but not auth. From Codex docs Add custom prompts in ~/.codex/prompts/xyz.md and launch as /prompts:xyz. Optional: description: and argument-hint: in YAML front-matter. For example, create prompts to refactor, rewrite in a developer’s style, document AGENTS.md, identify re-usable code, etc. AGENTS.override.md overrides parent directory AGENTS.md. AGENTS.md appends to parent AGENTS.md. Fallback names are allowed. codex exec supports streaming JSON codex exec accepts a CODEX_API_KEY= environment variable. codex uses an OPENAI_API_KEY. You can configure which environment variables are passed to the shell Codex reads 32KB from AGENTS.md by default Things that I currently follow and don’t follow from Peter Steinberger’s excellent Just Talk To It: Prefer Codex > Claude Code. Ask for options before executing Generate & review specs collaboratively You don’t need git worktrees Prefer subscriptions over API to reduce cost Store docs with code Give examples Use voice input Use Codex Web as a mobile inbox for ideas Prefer CLI over agentic platforms Prefer CLI tools over MCP Avoid ALL-CAPS for Codex. It follows instructions well Avoid sub-agents, RAG, etc. Iterate UI live. Watch changes Use 3-8 agents in parallel on a single repo. Make small, atomic commit checkpoints. Commit only what the agent touches Add ast-grep as a pre-commit hook to block rule violations. Keep custom prompts minimal (commit, automerge, massageprs, review, …). Just “commit” reduces context Cancel long tasks and ask what’s happening Prefer Medium over High reasoning. It decides level of thinking Share screenshots Use tmux to run CLIs persistently Schedule refactor time (20%). Use jscpd, knip, oxlint, … Don’t reset context. Cold start wastes time + tokens Write tests in the same context. Yields better tests, reveals bugs. Prototype in a separate folder / PR Queue continue messages** before stepping away Ask it to “Preserve intent and add comments at tricky spots”. Future you needs the WHY On hard problems, add “take your time”, “be comprehensive”, “read all related code”, “form hypotheses”, etc. Maintain an evolving AGENTS.md with product notes, naming, API patterns, test policy, ast-grep rules, etc. Delete stale guidelines Fascinating implications from Quantifying Human-AI Synergy ChatGPT Models vary in ability to uplift humans. Don’t just use standalone model benchmarks. People vary in ability to work with AI. Don’t just measure solo skills. Reward AI collaboration ability (delegation, prompting, verification, revision, …) Train models to ask for missing Theory-of-Mind cues: goal, beliefs, constraints, audience, success test Train people by asking them to predict what the model will get right/wrong, and validate Design UI and models for synergy. UI: Surface/solicit assumptions, intent, uncertainty, constraints. Model: Infer & adapt to evolving user state. OpenRouter image generation now includes GPT-5 Image Mini. An image costs about 1 cent. Here’s the code: curl 'https://openrouter.ai/api/v1/chat/completions' \ -H "Authorization: Bearer $OPENROUTER_API_KEY" \ -H "Content-Type: application/json" \ -d '{ model: "openai/gpt-5-image-mini", messages: [{ role: "user", content: "Draw a cat" }], modalities: ["image"], image_config: { "aspect_ratio": "16:9" } }' | jq -r '.choices[0].message.images[0].image_url.url' | cut -c23- | base64 -d > cat.png

Sometimes, technology creates truly memorable moments. Like when email connected me with my schoolmates in 1993. Or WhatsApp connected me with long-lost relatives in 2010. Today, Google Gemini took me back 55 years, converting the grainy black-and-white wedding photos of my parents into vivid high-resolution color images. So many people. Much younger. More alive. I look forward to when I can watch the video. Move around. Talk to them… Prompt: Convert this black and white photo to color. CAREFULLY ensure that the photo, especially faces, are EXACTLY the same. Use vivid colors and sharp photography, like in modern digital photos. Model: gemini-2.5-flash-image (nano-banana) Temperature: 0 ...

When to choose AI over humans

I charted the OpenAI GDPVal paper with industry compensation as the size and AI augmentation as color. Big green areas are we’re paying people where AI does better. Click here to see the interactive visualization. Clicking to see some actual tasks compared. I use this to check whom to ask advice: AI or professional. AI beats Personal Financial Advisors ~64% of the time. So I invested half my money using ChatGPT’s recommendation. (UTI Nifty 50, if you’re curious.) ...

Things I Learned - 26 Oct 2025

This week, I learned: Before founding a place to do good, work in a place that does good and learn. Ben Werdmuller What should we teach when vibe coding becomes good enough for non-coders? Ethan Mollick Problem decomposition Clear communication & spec writing Core technical foundations: file systems, access control, networking, APIs, version control, data structures, databases, deployment Software development skills: Debugging, Testing, Refactoring, Design patterns, UI/UX Project management: requirements, prioritization, scoping, … Codex CLI tips: codex --add-dir $DIR lets you write into $DIR codex --full-auto is the equivalent of codex --sandbox workspace-write --ask-for-approval on-request Terse code is not necessarily easier or harder for LLMs to write. It’s about how unusual (or not aligned with training data) the code is. Gabi Teoduru How are people using browser agents like Comet / Atlas? Simon Willison Most popular: YouTube video summaries with timestamps Most useful: Form filling: Government forms, data entry, repetitive bureaucratic tasks Foreign language navigation: Applying for pension in Korea, navigating sites in other languages Time reporting auto-completion Insurance claims: Reading policy documents and drafting appeals (successfully got claim reimbursed in India) Compliance training click throughs Next most useful: Shopping / planning Energy provider comparison - Comet checked current plan vs competitors on Check24, calculated exact annual savings per provider Financial tracking: Finding Amazon orders, tracking Airbnb spending with refund calculations, analyzing bank transactions Trip planning: Mapping 50-100 places on Google Maps automatically Interesting: Airport shuttle discovery - Found shuttle that user missed in manual searching HubFS mounts GitHub repos on the file system. Every file system action directly works on GitHub via a REST API. Useful for some scenarios but less useful for note-taking than something like GitDoc which offers a delayed sync. Ernest Ryu solved an open problem in convex optimization using ChatGPT. Quotes: ChatGPT is now at the level of solving some math research questions, but you do need an expert guiding it. ChatGPT was really effective at accelerating my progress. This work took about 12 hours, spread over 3 days. In hindsight, the proof is really simple. But I iterated through so many other strategies that didn’t pan out, and ChatGPT crucially helped to quickly explore and eliminate those dead-end approaches. Also, the key successful steps were suggested by ChatGPT. ChatGPT did not produce the proof in a single prompt. The process was highly interactive. It generated many arguments, roughly 80% of which were incorrect. Yet some were genuinely novel to me. Whenever I recognized a novel idea, whether correct or only partially so, I distilled the key insight and prompted ChatGPT to develop it further. My contribution: Filtering out incorrect arguments and accumulating a set of correct facts. Identifying promising new lines of reasoning and guiding ChatGPT to explore them further Recognizing when a strategy had been fully explored and deciding when to move on. ChatGPT’s contribution: Producing the final proof argument. Significantly accelerating my (or our) exploration of the many dead-end arguments, rapidly ruling out approaches that did not work. Comparing the GPT 4.1 and 5 models at all different of reasoning, I’ve switched my default from GPT 4.1 mini to GPT 5 mini (medium). Far smarter for a slightly higher cost. Artificial Analysis python -m pdb -c continue script.py or uv run -m pdb -c continue script.py runs a script and drops into pdb on unhandled exceptions (post-mortem). ChatGPT Technology removes constraints. We then do what we really value. Claude When writing became digitized, we stopped cared about spelling/handwriting for its own sake. Spelling bees and handwriting classes declined. “ur” is acceptable. When fitness tracking became easy, many just track, few exercise more. Few people value exercise When GPS became ubiquitous, we stopped learning geography. Most value arriving, not knowing When photography became unlimited, most captured moments. Few perfected shots I had Codex scrape my ~2,000 pending invites on LinkedIn and asked ChatGPT to analyze it. Here are learnings: ChatGPT, private Power-law. 5% of inviters account for ~42% of all common connections. Top 10 people alone for ~20%. IITM student invites are high (~14%), but with 0-2 common connects, i.e. distant strangers. EdTech is tiny in count but has the highest common connections per person (outlier-sensitive but real). Among ≥20-commons, many hold VP/Head/Site-Lead titles in Data/AI or GenAI (not just recruiters). GenAI people are 7-8% and steady across months. Not a useful signal to prioritize. Premium ~ Senior. Premium accounts show ~40% senior titles vs ~29% for non-premium. Finance invites have higher seniority rate and more common connects than healthcare. Followers have higher common connections (~6 vs ~4). ⭐ Memory can be code. Agent memory is anything it choose to persist. Agents can write code on the fly to automate tasks, save them, and serve the code on the next request, potentially modifying the code as required. This is like the conscious mind saving a habit for the subconscious to execute fast. Finally: Microsoft Office has an agent mode that lets you talk to it and do stuff. The Verge

I asked multiple coding agents and models to build the same app: Create a single-page web app at index.html that beautifully renders a GitHub user profile and activity comprehensively. Pick the ID in the URL ?id=…, default to ?id=torvalds. … and compared their quality, cost, and speed. My observations: Quality variance is the highest. Some models / agents produce great visuals, some average, some fail completely. Cost and time variance are lower among the successful models. About 2X variance in each. ...

Things I Learned - 19 Oct 2025

This week, I learned: ⭐ “… most engineers don’t have public commits. Senior engineers at large tech companies don’t work on open-source projects for the most part.” Why AI Can’t Do Hiring Cloudflare’s Sandbox feature in their Workers looks impressive. It supports streaming, web access to the container, and long-running processes. So we can spawn off a task and have it run a server (at least for a while) or a scraper. Gemini API has a Google Maps tool that it can refer to - like Google Search. Maps Grounding Earlier we needed humans to label data for RLHF. Now we don’t since AI can simulate it. This is a pattern. Once AI learns from a human, that human skill can be automated. How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek The <output> element has a for= attribute indicating which <input> elements it is linked to and a form= attribute indicating which form it belongs to. This works well with screen readers. A good reason to use it more. Examples. Meta built a Code World Model. Basically an LLM that acts like a Python interpreter! sudo apt install moreutils installs a set of useful packages: chronic. Runs a command quietly (suppressing output) unless it fails — good for cron jobs where you only want noise on errors. chronic backup.sh combine. Combines lines from two input streams/files using boolean operations (AND, OR, XOR). combine AND fileA fileB errno. Look up symbolic names, numeric codes, and descriptions for standard errno values. errno -l; errno ENOENT; errno 2 ifdata. Query network interface properties (IP, byte counts, errors) in a script-friendly format. ifdata -sip eth0; ifdata -bops eth0 ifne. Run a command only if stdin is not empty, passing the input through. find . -name core | ifne mail -s "Core files found" admin isutf8. Check whether a file or stdin is valid UTF-8. isutf8 somefile.txt lckdo. Run a command while holding an exclusive lock to prevent concurrent runs. lckdo /var/run/mylockfile.cmd myscript.sh mispipe. Pipe two commands, but return the exit status of the first one (useful in pipelines). cmd1 mispipe cmd2 parallel. Run multiple commands in parallel, reading them from stdin or arguments. parallel < jobs.txt pee. Like tee, but sends stdin to multiple commands in parallel. echo "foo" | pee cmd1 cmd2 ⭐ sponge. Soak up all input before writing to output — enables in-place edits safely. sort file | sponge file ⭐ ts. Prefix each input line with a timestamp. tail -f logfile | ts vidir. Edit a directory listing in your editor to rename, move, or delete files in bulk. vidir ~/myfolder vipe. Insert a text editor into a pipeline to manually edit streamed input before output. cat file | vipe | wc -l zrun. Transparently decompress compressed files before passing them to a command. zrun cat file.gz Despite 20 years of SVG experience, I learnt new things from A Friendly Introduction to SVG and A Friendly Introduction to Paths Setting a <rect> width/height or a <circle> radius to zero removes the element instead of drawing a point. There’s no option to draw the stroke on the inside or outside of a shape/path. Only the center. You can override a path’s pathLength attribute to create a new internal scale for its length. It’s unclear where I can use this. <path> arcs have this syntax: A [rx],[ry] [rotation] [large-arc-flag] [sweep-flag] [end-x],[end-y]. SVG first fits an ellipse to these parameters and then draws the arc. If rx and ry of an arc is too small to connect the points, the SVG spec scales up rx and ry. [large-arc-flag]=1 literally uses the larger arc of the fitting ellipse. This is less common. [sweep-flag]=1 its the ellipse to make the connecting arc go clockwise. 0 is anti-clockwise. [rotation] is rarely used because we usually draw arcs and then rotate them. stroke-linejoin automatically flips from miter (sharp) to bevel (cut) if the sharp edge protrudes too long (e.g. small angles). Increasing stroke-miterlimit increases the cutoff (default: 4) ⭐ Always include a thoughtful gallery of examples with tools / libraries. This does more than showing what a tool can do. It’s use-case / domain transfer: showing what it’s useful for in real life - opening ideas, suggesting workflows. It’s style transfer: showing how to use it. ⭐ Here’s what expert AI coders increasingly focus on. Thomas Dohmke Delegation: context engineering agents for success; parallelizing. Verification: efficiently reviewing and testing code/output; setting stop-points. Expanding scope: instead of time saved as the metric. Education: teaching AI-based coding, debugging, reviewing/testing. Product management: combining requirements + UI design + architecture + engineering + deployment. Cross-discipline: blending code with design, governance, finance, marketing, … (“computational creators”). Notes from Taylor’s How I’m using coding agents: October 2025 Left monitor: 2-4 desktops (e.g. work, side-project). Right monitor: things I always want available Plan next task while first executes. Use plan mode to write to a plan file. Don’t start big tasks if you have meetings scheduled soon. Recent open source package hack methods seem to work more because of people/process than systems (Filippo): Phishing the author Pull requests running unsafe code in CI Taking over expired domain / user ID Stealing long-lived tokens uv run --python 3.14 --isolated --with-editable '.[test]' pytest runs pytest on a local project with a specific Python version. Simon Willison Notes from the State of AI Report 2025: Reasoning models are more fragile. Irrelevant phrases make reasoning models spend FAR more tokens and get wrong answers #21 AI systems are able to teach experts new concepts #41 An environment providing feedback / rewards enables continuous learning #52 E.g. Multi-robot chemical labs at U.Liverpool and NCSU #60 RLHF has a fundamental flaw: humans reward sycophancy #71 We can read what people are typing from brain signals outside the skull #73 Model intelligence-to-price ratio doubles every ~6 months #94 The AI companies’ valuations are also roughly doubling every ~6 months #181 OpenAI is offering Governments giga-watt campuses to run OpenAI models for citizens #122 A 1GW clusters costs $50bn capex and $11bn per annum #130 China has added ~10X the energy capacity as the US in 2024 #146 NVIDIA challengers are still far away #161 LLMs can “read between the lines” even if training data is censored #268 LLMs can pass information via hidden signals #270 Prediction: A major retailer reports >5% of online sales from agentic checkout. AI agent advertising spend hits $5B. #304 OpenAI’s leadership guide says: Align Explain WHY AI thoughtfully. Set a goal, e.g. everyone uses ChatGPT 20 times/day (Moderna). Use it yourself. Show how. Have business leaders run AI sessions Activate Launch an AI skills proram Set up an AI champions network Encourage experimentation (dedicated time, workshops, hackathons, …) Link to performance evaluations Amplify Create an AI knowledge base Share success stories (weekly) Create internal groups (Teams, Slack, …) Celebrate AI wins Accelerate Unblock AI tools and data access Simplify project selection. Quick feedback, clear priorities Unblock projects with a cross-functional council Give resources to successful teams Govern Publish a responsible AI playbook (what’s safe to try) Audit AI practices quarterly

Workshops That Teach Me More Than You

I don’t charge for workshops. Altruism? No: it’s self-interest. “If you’re not paying for it, you’re not the customer; you’re the product being sold.” Andrew Lewis, via Tim O’Reilly, 2010. My workshop process is designed to benefit me first. I pick topics I want to learn, not stuff useful to the audience. Example: I picked DuckDB for my PyCon India 2025 talk to learn it. ...