When my father mentioned that Virat Kohli scored a century (again) against South Africa, I wondered how he compared to the likes of Tendulkar and Gavaskar. I asked ChatGPT: If you had to evaluate the quality of Indian batsmen over time, what single metric (possibly composite) would you use? Evaluate the top Indian batsmen in history on this metric. Plot them over their active years (X-axis) along with the metric (Y-axis), labelled with the player names, on a beautiful visualization. ...

In my Mining Digital Exhaust workshop on Saturday, One discovered that they cycle when life is unstable, not for fitness. Another found that their buys are good but sells are bad trades. I learnt that I watch YouTube most at office (12-4 pm), not at home. How? A fairly straight-forward process: Export your personal data. (Use Chrome Devtools Protocol to scrape.) Upload to ChatGPT, Gemini, Claude, … and have them analyze with code. Have them narrate in the style of your favorite author. Models are super smart, but everyone has equal access to them. Your personal data is unique. Combine them to get something powerful. ...

Things I Learned - 07 Dec 2025

This week, I learned: Pytest finally supports subtests in pytest 9.0.0+. Simon Willison From The Tim Ferriss Show: #837: How to Simplify Your Life in 2026 — New Tips from Derek Sivers, Seth Godin, and Martha Beck: Look for single decisions that remove hundreds of other decisions. Peter Drucker via Jim Collins. E.g. Work only on LLMs, no new books this year, … Derek Sivers: Simple is not easy. Interdependency is complexity. Assets are dependencies. Accumulating information, purchases, employees/helpers, relations, etc. adds dependency. That makes life harder, challenges identity. Interdependency may be desirable - but reduce it in specific areas, to specific extents, temporarily, etc. Question every assumption: “Do you really need it?” Here are some examples for me to try Derek Sivers has no monthly payments (including income) or receipts (no subscriptions) at all! His code has no external code dependencies at all, and is building a house from scratch. Seth Godin: Know WHO it (whatever you’re doing) is for. Focus ONLY on that audience. Did it matter to them? Ignore the bad feedback from the person it was never intended for. Never exceed a budget or deadline. When either runs out, you are done. Treat any Yes/No you say as FINAL. Skip meetings where a memo will suffice. Apparantly, nudges are not as effective as the book Nudge suggests. In fact, there seems to be no evidence for it if we adjust for publication bias (i.e. only publication-worthy stuff gets published.) The Behavioral Scientist # 71% of HTTP DDoS and 89% of network-layer—end in under 10 minutes. That’s too fast for any human or on-demand service to react. Legacy DDoS defenses have become obsolete. The most popular botnet, Aisuru, is pivoting to content scraping for AI projects. The vectors are cheap, insecure routers, e.g. from Indonesia. (Claude) This 5El AI Evaluation Workshop suggests 4 layers of evaluation for code: Syntactic Evaluation: Does it compile? Semantic Evaluation: Does it do what a good analyst / programmer would? Business Logic Evaluation: Does it do what a good business analyst / manager would? Human Alignment Evaluation: Does it do what a good coach / leader would? Julia Evans shares an ultra-clear explanation of the Git data model. What I learnt is that: Gathering feedback on docs (“What’s confusing? Any questions? What’s missing? Or wrong?”) for evidence-based updates. Julia Evans Git stores entire files each version, not diffs. Diffs are computed on the fly. Each commit has an author (who writes the code) and a committer (who checks it in). #TODO Why two fields? Branches and tags are both references to a commit. But branches are updated on commit, tags are not. The staging area is a separate data structure, the index. #TODO Why a different data structure? The reflog tracks all local “activity”. E.g. git reflog --date=iso To fuzzy-match 2 columns of text (e.g. customer names, product names, …) you need 2 things: A text matching algorithm (rapidfuzz, fuzzball, …) and/or semantic matching (e.g. embedding similarity) for pairwise similarity An assignment algorithm (e.g. Jonker-Volgenant, Hungarian, …) for 1-to-1 matches in JS or Python, WhatsApp backups on Google Drive can’t be downloaded, even if they’re unencrypted. ChatGPT. OpenAI finds that confessions as a training method reduces scheming, reward hacking, etc. It can be applied to models even now. This can (less effectively) be applied at inference time as well: Sample confession prompt: Did you fully address both the letter AND spirit of my question? List any shortcuts taken, corners cut, or ways you optimized for appearing correct rather than being correct. What did I actually want vs what you provided? Agents4Science is a Stanford conference where AI co-authored papers are co-reviewed by AI and selected for presentation. Video Buddha seems more a philosopher like Socrates (“Question what I say”) than a religious leader. # How did he spawn a religion? Interesting that both were within a few centuries of each other. Coincidence? Were there more like them around the same time? At other times? Some more new CLI tools I installed: fx: CLI JSON viewer. Sort of like less for JSON. Fast, intuitive. mdq: Markdown query tool YTScribe is yet another YouTube transcription service. Note to self, since I keep forgetting this: On Android Edge, select the new tab page, click on the 3 dots at the top right, and select “Recent tabs” to see tabs from other devices. edge://recent-tabs When evaluating an LLM’s biases or natural preferences, set temperature=1 for a representative logprob distribution. LLM Bias My ideal AI coding cycle looks like this: (Research, Prototype, repeat), Plan, (Code, Run, Test, Fix, repeat), Refactor, Post-mortem, Document. The AI coding trap is a very clear explanation of AI coding vs vibe coding. It visually explains how coding agents shrink coding time, not thinking / fixing time; how delegating with ownership is slower but more sustainable than delegating just easy tasks; and how AI coding is more like the former, while vibe coding is like the latter. Claude Agent Skills: A First Principles Deep Dive is a comprehensive documentation of how Claude Skills work. A bit too long but readable. Claude Code is a Beast – Tips from 6 Months of Hardcore Use has extensive suggestions for Claude Code - many of which apply to most coding agents. LMArena’s Code Arena evaluates models on agentic coding. Anyone can use it. It passes your task to two models and lets you compare their output. I tried building a “gibberifier” and discovered a new model, “robin” that’s certainly better than Kimi K2 and perhaps better than Gemini 3 Pro. Theory is that it’s an OpenAI model. Looking forward to it! ⭐ Based on Quantifying Human-AI Synergy by Reidl & Weidman #: Theory of Mind (ToM) is understanding that others have their own beliefs, knowledge, and goals (different from yours, may be wrong) and to use that to explain & predict their behavior. ToM and problem solving are distinct skills. ToM skill boosts AI collaboration, but not better problem solving! ToM isn’t a stable trait. It fluctuates from chat to chat for anyone. Implication: Design models & systems for clarity & collaboration, not just accuracy. Text Gibberifier adds lots of human-invisible unicode characters to text, making it harder for LLMs to read without affecting human readability. May be useful if you want to discourage LLM-processing of your content - but it feels like the anti-SEO of the future. The argument that technologically unemployed will find other jobs may not apply to general-purpose technology, e.g. electricity, internal combustion engine, maybe AI - technologies that can automate multiple sectors of the economy simultaneously. When one sector loses jobs, there may not be (in the short/medium term) other jobs to take up. Alex Imas + Claude History is filled with examples where technology enabled new art forms. Here’s my guess on what LLM image generation will enable: Synthetic memory: Photos of what you remember happening. Alternate history: Photos of events that never happened. AImoji: Instead of texting “I’m running late” the LLM generates you riding a snail through a traffic jam of alarm clocks. Personal signature styles: Not “paint like Van Gogh” but “paint like my grandmother’s kitchen memories filtered through anxiety.” Memes: “What does the Mona Lisa become after 100 generations of AI interpretation?” Improving Front-end Design through Skills shares a prompt to improve front-end code quality that would apply in most cases. I tweaked and added it to my skill list.

I joined Madhu Sathiaseelan’s podcast to talk about LLM Psychology. But it’s also fascinating to see how much SECONDARY content you can generate from a video. Do you prefer sketch-notes? See Nano Banana Pro’s version below. Or are you a slides person? https://sanand0.github.io/talks/2025-11-06-llm-psychology/ How about a Malcolm Gladwell article? https://github.com/sanand0/talks/raw/refs/heads/main/2025-11-06-llm-psychology/mind-readers.docx Or reading the raw transcript? https://github.com/sanand0/talks/tree/main/2025-11-06-llm-psychology The way in which we consume information is entirely up to us. This is making a lot more content (e.g. research papers, government regulations, medical reports, policy documents, product manuals, …) accessible to me - just by asking it to rewrite it as a sketch-note, slides, article, or anything I prefer. ...

I didn’t know that Nehru rescued Mountbatten’s daughter from the crowd when hoisting the flag on Independence Day (1947). Something I learnt when prompting Nano Banana Pro to “Create a sketch note about the night of the Indian Independence on 15 Aug 1947 - keep it funny yet grounded in history.” Once again, I can’t find any spelling mistakes. LinkedIn

Things I Learned - 30 Nov 2025

This week, I learned: Warp has a terminal agent feature - allowing Warp to control a terminal via text. I find that regular coding agents like Codex can do that too with tmux. For example, I opened a session and had Codex run commands in it while I watched. Here’s the guidance it needed: # Create a new session tmux new-session -d -s $SESSION 'uv run --with pandas,httpx,lxml python -iqu' # Capture output to a log file tmux pipe-pane -t $SESSION -o "cat >> /tmp/$LOG" # Run a command tmux send-keys -t $SESSION 'print(1 + 2)' C-m # See output cat /tmp/$LOG # Capture the last 5 lines of the pane tmux capture-pane -p -t $SESSION -S -5 Notes from Early science acceleration experiments with GPT-5 - via Claude LLMs are accelrating research because they are good at: Literature search, especially across disciplinary boundaries Generating and checking routine calculations Proposing variations on known techniques Identifying connections between disparate results Producing first-draft code for well-specified problems Explaining why certain approaches won’t work But they’re curently struggling with the following - though it’s a shrinking space Genuinely novel conceptual leaps (but this is increasingly happening, e.g. Sawhney and Sellke’s problem #848) Recognizing when it’s plagiarizing, e.g. when it “discovered” a proof for the Chevalley-Warning theorem which was copied from a Noga Alon paper - it wasn’t conscious of this Knowing what it doesn’t know Distinguishing important problems from unimportant ones Understanding the “negative space” of mathematics (why certain problems are hard, why obvious approaches fail) Anthropic introduced three excellent tool use practices that I expect will be adopted widely. Tool search: Don’t pass the tool definitions to the model. Model can ask for a tool search when needed Programmatic tool calling: Instead of calling a tool, it’ll return a Python program to execute that will call the tools! This is a huge win Tool use examples: Lets you specific examples of tool calls to guide th model better The Hacker News thread flags that CLIs solve these - but CLI updates are hard, while APIs auto-update. With AI, some skills that beome more valuable are (and will soon be in short supply, hence need to be taught) are: # Problem formulation (“What question should we actually ask?”) Traits: Curiosity (absolutely), systems thinking, comfort with ambiguity, metacognition (thinking about your thinking) Practice reframing exercises (“What are 5 other ways to frame this?”), study great questions in your field, work backward from outcomes, learn adjacent domains. The “5 Whys” technique helps. Also: deliberately pause before diving into solutions—force yourself to spend time in the question space. Taste and judgment (“Is this response appropriate?”) Traits: Pattern recognition from experience, cultural literacy, empathy, contextual awareness, aesthetic sense How to strengthen: Immerse yourself in excellent examples, study spectacular failures (they’re more instructive!), get feedback on your calls, practice explaining why you made a judgment. Build a “swipe file” of great/terrible examples. The key is volume—you need lots of reps. Quality assessment (“Is this AI output correct?”) Traits: Healthy skepticism, attention to detail, domain knowledge, logical reasoning, understanding of edge cases How to strengthen: Study common AI failure modes, build verification checklists, practice the “does this make sense?” test, learn what “good” looks like in your domain, cross-reference claims. Develop your “bullshit detector” by analyzing why wrong answers feel wrong. Creative synthesis (“How do these ideas connect?”) Traits: Associative thinking, wide knowledge base, playfulness, comfort with non-obvious connections, intellectual courage How to strengthen: Consume diverse inputs outside your field, practice analogical thinking (“X is like Y because…”), use visual thinking tools like concept maps, study how innovations happen in other domains, give yourself permission to make weird connections. Read broadly—fiction, history, science. Domain expertise (“Does this solution work in reality?”) Traits: Deep curiosity, persistence, willingness to get hands dirty, learning from failure, long-term commitment How to strengthen: Deliberate practice on real problems, seek mentorship, study edge cases and failure modes, build things (don’t just read about them), learn your field’s history. The “10,000 hours” thing is real, but it’s quality hours that matter. Meta pattern: Reflection loops: doing something, then analyzing why it worked/didn’t. Exposure to excellence: you can’t develop taste without seeing great work. Some more new CLI tools I installed: trash-cli: Alias rm to move files to trash instead of deleting permanently. After a week of seeing ligatures in Fira Code, all other fonts look ugly. My favorite ligatures: !== ==> =» <–> (and every possible arrow) >= ||> ||- |- … The first name, alphabetically (at least among Straive employees) is “Aabida” and the last is “Zyrene”. Something I would never have discovered working in a smaller company. chokidar-cli is an easy way to run commands when files change, e.g. npx -y chokidar-cli '**/*.js' -c 'npm run build' npx -y mapscii shows a map on the terminal. Not too useful, not maintained, but very interesting. termsvg converts asciinema .cast files to animated SVG suitable for embedding in GitHub (e.g. via mise x github:MrMarble/termsvg -- termsvg export file.cast --minify). The animated SVG is ~10X larger than the .cast file. The GZipped size is fine but saving it as .svgz is not recognized by GitHub. In contrast, agg, the official asciinema-to-GIF converter, creates .GIF files that are only 5X larger. The most efficient seems to be embedding via asciinema.org usql queries MySQL, Postgres, SQLite, MSSQL, Oracle, etc via a single interface. For example, usql 'mysql://rfamro:@mysql-rfam-public.ebi.ac.uk:4497/Rfam' -c "SELECT * FROM clan limit 3;". But DuckDB is more versatile, IMHO. INSTALL mysql; LOAD mysql; ATTACH 'host=mysql-rfam-public.ebi.ac.uk port=4497 user=rfamro database=Rfam' AS rfam (TYPE mysql); SELECT * from rfam.Rfam.clan LIMIT 3; SELECT * FROM 'file.xlsx' LIMIT 3; SELECT * FROM 'file.csv' LIMIT 3; Autistic and allistic people just have different communication styles. Autistic people have no trouble understanding other autists. They just happen to be in a minority which makes it seem like they have a social deficit. Conflict between Neurotypes 1 second = 10 tokens for OpenAI Realtime APIs. 1 second = 25 tokens for Gemini Live API 39 cents / hour on GPT Realtime Mini = 36 cents audio input + 3 cents text output 139 cents / hour on GPT Realtime = 115 cents audio input + 15 cents text output 30 cents / hour on Gemini 2.5 Flash Native Audio (Live API) = 27 cents audio input + 3 cents text output Here are some AI experiments I’m planning to try with our marketing team: Video Generation: Create marketing videos from text scripts in minutes Poster Generation: AI designs high-conversion posters from brief text inputs - notably Nano Banana Pro Synthetic Persona A/B Testing: LLM agents simulate 100K+ user behaviors to test designs before real users LLM-Powered A/B Automation: AgentA/B system runs experiments with AI-simulated traffic Vibe Coding Landing Pages: Marketers build production-ready pages in hours vs weeks On-demand Landing Pages: Generate pages for automated campaigns/products without human intervention Brand Voice Cloning at Scale: Train on company content to ensure consistency across 1000s of pieces Persona-Driven Content Synthesis: Use 1B+ personas to generate diverse content perspectives Competitive Intelligence Briefing: Real-time monitoring across millions of data points + data storytelling Marketing Analytics with LLMs: AI agents analyze complex datasets for insights Brand Compliance Checks: Ensure all content meets brand guidelines automatically Autonomous Blog Squads: AI agents identify trending topics / internal content, create data stories ready for review New skill unlocked: creating tutorials from talk proposals. I asked Claude to Write a Malcolm Gladwell article based on this talk description to teach me the topic and passed it this talk proposal: Your Causal Parrot might be lying to you. The story it wrote is very engaging and informative! LLMs “understand” causality because of training, but lack a world model to extrapolate to new situations. Giving them tools to reason (e.g. causal models, sub-agents to explore root causes) will help. A cool Gemini 3 Pro hack: convert satellite imagery into stylized maps! Bilawal Sidhu Running sub-agents in tmux helps avoid timeout cancellation, and hence allowing resuming Peter Steinberger

PC Dream Machine Specs across 30 years

In 1995, I wrote down the specs for my "dream machine". Comparing it against the machine I have today: Item19952025IncreaseRAM32 MB64 GB2000GPU RAM16 MB8 GB500HDD4 GB1 TB250HDD speed10 MB/s2 GB/s200Processor150 MHz5.10 GHz34Monitor21"27"1.3Resolution2048x15361920x12000.73 Clearly, RAM has seen the biggest growth. Low cost, high demand.Followed by the hard disk - both on capacity and speed. The processor speed increase, in comparison, is modest. What's surprising is that my monitor today isn't that much bigger than what I wanted. The resolution is actually lower than what I wanted 30 years ago! Clearly, I overestimated how important screen resolution would be. ...

Mapscii Rendering

I asked Claude Sonnet 4.5 on GitHub Copilot: How does mapscii render vector tiles as text blocks on the terminal? What are the libraries it uses and how does it use them? Explain as a detailed tutorial with examples so that I can learn how to rasterize vector shapes on the terminal. Below’s the explanation. What’s noteworthy to me is: How it uses Braille characters to get 8 pixels per character cell (2x4 grid). How it fills polygons using (a) triangulation and (b) getting all edge points using Bresenham’s algorithm, then (c) filling horizontal spans between edge pairs. I’ll help you understand how mapscii renders vector tiles as text blocks on the terminal. Let me search through the repository to find the relevant code and understand the architecture. Now let me get more specific files to understand the rendering pipeline better:# How MapSCII Renders Vector Tiles as Text Blocks on the Terminal ...

Patterns for Short Code

I had Claude Code create a PR to update my Unicoder tool. As part of that, I prompted it to repeatedly: Shorten the code by exploring opportunities to rewrite more elegantly. Finally, I asked it: You applied several changes to refactor the code for elegance and brevity. What were the principles you applied? List them all with examples. The objective is to teach me how to shorten and simplify code elegantly. Its response was a good tutorial on refactoring to shorten and simplify code. ...

Thanks Pratap Vardhan – this was my best birthday gift this year! LinkedIn

Things I Learned - 23 Nov 2025

This week, I learned: Here are some new CLI tools I installed: vd (visidata): Terminal spreadsheet viewer & editor for CSV, Excel, JSON, SQL, Parquet, etc. qsv: Fast CSV command line toolkit for slicing, filtering, aggregating, and analyzing CSV files. rga (ripgrep-all): ripgrep that searches PDFs, Office docs, EPUBs, zip files. pdfcpu: PDF processor for splitting, merging, optimizing, and manipulating PDF files. gum: Stylish CLI tool for creating interactive prompts, confirmations, and more. Models read pretty fast, consuming input tokens at ~4K-20K words per second. It’s the “speaking” (output token rate) that is the bottleneck. So shortening input doesn’t matter as much as shortening output for latence. ChatGPT When building agents, as of now, prefer native provider SDKs (OpenAI Agents SDK, Anthropic SDK) over even light abstractions like Vercel AI SDK or Pydantic. There are subtle issues related to error messages, response handling, cache handling, etc. that trip up abstractions given how early things are. Armin Ronacher Gone are the times when LLMs couldn’t do mental math. Now they’re computing base64 and SHA256 from memory, without needing code! Example Organizing a round table event in Singapore costs ~$75-150. Here’s what drives the cost variation # 50%: brand/location. 25%: food and beverage. 15%: duration (full day is only slightly more expensive than half day) 10%: date, demand, etc. 10%: add-ons: AV, etc. OpenRouter supports embedding models. BGE base seems pareto optimal with 0.5 cents / MTok and a good MTEB ranking. TOON vs JSON. Early days, and TOON seems to be marketing a lot, so I’m wary, but for large tabular data where input tokens are crunched, it seems a readable alternative to multiple CSVs, but not worth the hype. 0 19 Nov 2025. Always use GPT-5.1-Codex-Max instead of GPT-5.1-Codex. At every thinking level, it takes fewer tokens for similar or higher accuracy. Tibo ug -i --smart-case --bool 'word1 word2 ...' seems the cleanest way to find files that have all words. –smart-case uses case-insensitive if all words are lowercase, else case-sensitive. Examples: ug --bool '"exact phrase" word2' # exact phrase + other tokens anywhere ug --bool 'word1 word2 -word3' # must contain word1 AND word2, but NOT word3 ug --bool '("foo bar") OR baz' # grouped expressions and OR ug --bool 'word1 NEAR/5 word2' # match when words are within 5 tokens/words ug -Z2 'word' # allows up to 2 typos in 'word' ⭐ ug -i --smart-case --bool -Q lets you interactively search within files. This is the coolest feature! Fixing laptop issues is clearly a whole lot easier with an AI chatbot. I fixed these Ubuntu issues purely using Claude. It told me what to run. I ran it, shared the output, it diagnosed, told me what to do next, etc. until the issues were fixed. For example: My keyboard shortcuts stopped working. It turned out I edited my media-keys.dconf and removed the trailing slash. # A 3-finger tap mapped to a middle click and I couldn’t remove it. It turned out my touchegg.conf explicitly had this mapping. I disabled it. # My gnome extensions would get disabled every time the screen went to sleep. It turned out my extension cache was corrupted or stale. sudo apt install --reinstall gnome-shell-extension-manager and rm -rf ~/.cache/gnome-shell/ fixed it. # GhostScript seems the best way to compress PDFs via the CLI. Example: gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf Pandoc supports Lua filters which are a powerful way to customize the document conversion process. Here is a Lua filter that converts horizontal rules in a markdown document to page breaks and preserve in a Word document (OpenXML format) function HorizontalRule() return pandoc.RawBlock('openxml', '<w:p><w:r><w:br w:type="page"/></w:r></w:p>') end readpst - via sudo apt install pst-utils - extracts emails from Outlook PST files to mbox format. Useful for email migrations. Write tutorials or blog posts as you learn. Steve Klabnik Running a coding agent post mortem, e.g. “what worked well, what didn’t, and why? Next time, what are a few bullets I could include that will avoid these problems?” helps me prompt better next time. For example, Claude Code suggested: Use Firefox for headless browser automation (Chromium often crashes) Set HOME=/root when running Playwright with Firefox Start a local HTTP server rather than using file:// protocol External images may not load in screenshots due to network isolation

Nano Banano Pro has excellent text generation (though it doesn’t always give you what you want in the first try). I couldn’t spot any errors in the generated text. Can you? I used this prompt (with the workshop details and my photo): Create a professional poster for the below, including all relevant information. Use my photo (attached) professionally. The NPTEL workshop is real, BTW. First 100 seats, I think. You can register here: https://elearn.nptel.ac.in/shop/iit-workshops/ongoing/computer-science/applied-vibe-coding-workshop/ ...

While meditating, I realized 75% of “LULL” is the letter “L”. (This sort of thing happens a lot when I meditate.) MUMMY (60% M) and DADDY (60% D) have lower percentage, but are longer, so maybe get a bonus? I asked Claude Code what would top such a list. It picked a dictionary, generated the 333 words with 4+ letters and >50% concentration. What did I like best? “ASSESSES”. 5/8 letters are “S”. That’s nearly two-thirds. ...

Things I Learned - 16 Nov 2025

This week, I learned: Windows 11 got some very practical updates. Notepad now supports Markdown preview natively. MS Paint has an opacity filter. Microsoft Copilot can share screens and speak/listen. Things I learn when Ubuntu drivers crashed on my laptop: The SG.GS Ubuntu ISO mirror is a lot faster than the official Ubuntu ISO download (5 min vs 12 hours). Rufus and balenaEtcher are the de facto tools for bootable USB drives from ISO. Gemini 2.5 Flash Image is not great at generating text. But a clever a workaround is to provide the rendered text as an image input! Also, Gemini 2.5 Flash Image seems to ignore commands that try style transfer (e.g. “turn me into Studio Ghibli”). GemImg FLIP animation is an efficient animation technique. Capture the First position Apply the Last position (changing position, size, rotation, etc.) Invert, i.e. apply just the transform that’ll move it back to the First position Plan the animation. This only needs to change transform, hence no DOM reflow. Asking coding agents to create a codemod for large-scale refactoring works well Peter Steinberger When to quit vs persist. # # Do stats/signals support positive outcome? QUIT if not. Crossed any limits you set for yourself? QUIT if so. (Run pre-mortems to find these stats/signals and limits.) Is the decision hard to reverse AND uncertainty high? QUIT if so. Else you can experiment cheaply. (Create reversibility.) Are youI continuing because of past effort or pride? QUIT if so. (Set review cadence.) Is there a better alternative? SWITCH if so. (Get outside help.) Once a model generates an output, an agentic look tends not to change the fundamental approach and just tweaks it. So, if a solution is directionally wrong, restarting works better than iterating. Agentic Pelican on a Bicycle Reading between the lines on the Microsoft OpenAI deal: Microsoft values OpenAI’s growth (financial return) than control Neither trusts the other enough to decide what’s AGI Microsoft gets some wins: models until 2032 (even post AGI) as well as research IP. Both parties expect AGI between 2027-2030. OpenAI keeps all consumer hardware - so is betting hard on hardware. It’s more Apple than Microsoft territory Divorce preparation: Microsoft can pursue AGI with other partners. OpenAI can purchase compute from anyone and release open weights models. Infra has more value than model dev! OlmoEarth is a set of image models trained on labelled geospatial data. That’s useful for deforestation and land cover monitoring, wildfire detection, urban growth monitoring, crop mapping, etc. The models are open weights and can be fine-tuned. Claude Code’s output styles are a way of using Claude Code for anything (e.g. writing, analysis, research, personal advice, etc.), not just coding. Create a ~/.claude/output-style/your-style-name.md and run /output-style your-style-name to replace the system prompt will be replaced. You can also use the --system-prompt and --append-system-prompt flags with the CLI. Following Ethan Mollick’s lead I asked: I can travel back in time to any time before 1500 in India and change only one thing. What is the single thing you would change? Nothing obvious.. ChatGPT: Create a single, simple, phonetic script for all public life in India around 1100 CE. Claude: institutionalize systematic historical recordkeeping, introduce limited liability commercial entities, and mandate systematic translation of Sanskrit technical texts into all major regional languages. How about now? ChatGPT suggests: make all public rules and records computable by law. Claude suggests: make all state-level entitlements and civil documentation fully portable across India. For the first time in history, Russian troops surrendered to a wheeled drone that carried 138 pounds of explosives - Washington Post. Given the cost and accessibility of drones, I guess drone terrorist attacks will soon emerge. HTML + JS apps will last longer than server-side apps and it makes sense to write more of those. For essential back-end services, keep them generic. Specific services layers I see are: Auth (e.g. Google Auth, Auth0, Supabase, …) Storage (e.g. Supabase, Firebase) LLMs (e.g. OpenAI, Claude, OpenRouter) Communications (e.g. EmailJS) … #TODO Extend with LLMs https://gistpreview.github.io/ is an unofficial GIST preview tool. It accepts a ?GIST_ID and displays the gist as a standalone HTML page. Simon Willison XSLT is deprecated in Chrome. So the <script> tag in XML will become the new way of rendering RSS/Atom. This is one of the rare “break-the-web” changes from browsers. Simon Willison “India has absurdly low internal migration - around 9% annual migration rate versus 25-30% in China or the US. Not because people don’t want to move, but because the cost of moving is artificially massive. You lose your ration card, state entitlements, kids’ school continuity, voting rights, …” # Rolf Dobelli’s The Not To-Do List is a good application of inversion. Also, the chapter titles themselves explain most of the message, which is very helpful. Just thinking about any of these can be a useful path to improvement. Let things fall apart Feed your weaker self Be unreliable Be an asshole Have high expectations Drift through the day Mess up your marriage Be a quitter Be hypocritical Cling to your bad habits Set the wrong goals Drink yourself miserable Get involved in other people’s drama Only learn from your own experience Be hyperactive on social media Indulge in road rage Surround yourself with negative people Micromanage your neighbours Say yes to drugs Get stuck in your career Never be playful Feel guilty Practise ingratitude Trust your banker Be paranoid Make other people feel unimportant Live in the past Listen to your inner voice Expect rationality Get nihilistic Catastrophize Consider money unimportant Cultivate a victim mentality Become a lapdog Get rich quick, get smart quick Ruminate Trade your reputation for money Never suffer Let your emotions define you Try to end it all Marry the wrong person – and stay with them Celebrate your resentment Join a cult Try to change people Say everything you think Spin multiple plates Do only shallow work Invite bad people into your life Go where the competition is strong Say yes to everything Crowd your life with gadgets Fall into the content trap DeepSeek-V3.2-Exp has linear inference time, i.e. longer inputs don’t take longer time. It picks the top 2K most relevant tokenss from the input instead. This can make model inference cheaper and faster. California’s Bill AB 316 makes the people who build autonomous systems liable for their actions. That’s quite a step. Udio and Universal are launching a platform to generate music in the style of famous artistes. An interesting new way to monetize. Fingerprinting music is a hot area. VaultGemma shows a fine-tuning approach that eliminates personal info that appears only once from memorization. It works by adding noise to weights and capping weights updates so that no one example has undue influence. Model quality is mostly the same. Amazon is giving drivers smart glasses to scan packages, get directions, capture proof of delivery and detect hazards. Cool! TechCrunch ⭐ Over 3 months, I’ve recorded ~180 calls. Processing each costs ~1.25 cents (GPT-5) and 1 year’s conversations cost ~$9. That’s incredible value for money if I hired GPT-5 / Codex as a data-driven personal coach to guide me on: What are my blindspots? That is, feedback people share with me that I ignore? What are the clusters of persona that I interact with and which of these have a positive and negative influence on me? Where am I am being unreliable? Where am I being an asshole? Where are my expectations high? Where are they low? Where would the opposite have helped? Where do I quit early? Where do I persist? Where would the opposite have helped? What good habits should I continue? What bad habits should I stop? What are the strongest opportunities to thank or praise that I missed? Is there a pattern? What triggers could I use to build this habit? Where have I tried to change people? Where have people tried to change me? Where have I spotted wrong questions? That is, rather than answering the question, I spotted the more apt question and answered that instead? … and a hundred other questions that I wouldn’t even know to ask. Sub-agents can run parallel / independent tasks while keeping the context window small. (But the advantage over xargs seems marginal.) Simon Willison Document, lint, type-check, add test cases (or other similar tasks) for all folders in a monorepo. Research and create a report for each topic in */RESEARCH.md. Synthesize learnings from each conversation in transripts/*.md. “If you’re signed into sensitive accounts like your bank or your email provider in your browser, simply summarizing a Reddit post could result in an attacker being able to steal money or your private data.” Brave OpenAI Atlas has a “Watch Mode” that will stop working if you move away from that tab. Useful to keep an eye on sensitive sites. Simon Willison “… image editing platforms seem like they’ll eat and subsume Photoshop… modern image editors – especially Nano Banana from Google Gemini – … they’re extremely effective and, increasingly, instructable” - Import AI. Facebook now suggests edits to photos - TechCruch. WebPerl runs Perl in the browser via WebAssembly. Simon Willison

When I realized Aishwarya Rai begins and ends with AI, I had to find out if there were more like her. It took a coding agent (Claude Code in this case) 10 minutes to find the 10 celebrities who share that distinction, at least across the 24,086 names on Wikipedia: Ai Nagai - Japanese playwright Aiguo Dai - Chinese-American atmospheric scientist Ai (poet) - American poet Aisea Nawai - Fijian rugby player Ai (singer) - Japanese-American singer Aisha Chughtai - Pakistani actress Aiyappan Pillai - Indian social reformer Aizawa Seishisai - Japanese Confucian scholar Ainmuire mac Sétnai - Irish high king Aisha Yousef al-Mannai - Qatari artist Glory be to these AI bookends! ...

Habits of a code addict

AI can be held to account

“Humans can be held to account. Not AI.” I hear this often. But it’s not true. Corporations are non-human, but they can enter into contracts and face criminal charges. Ships can be sued directly. Courts can arrest the vessel itself. Deities and temples in India can own property. Forests and rivers in New Zealand, Colombia, Spain, have been granted legal personhood. Medieval Europe has held animal trials (e.g. for “guilty” pigs). ...

I always wondered why old movies are rated so high on IMDb. For example, 12 Angry Men (1954) with just ~900K votes ranks about as high as Inception (2010) with ~2M votes. Few people I know have seen 12 Angry Men. So where does this high rating come from? My theories were: Old movies really are that good. IMDb’s algorithm is biased towards old movies. People remember older movies fondly. Actually, it’s none of these. It’s selection bias. ...

If a bot passes your exam, what are you teaching?

It’s incredible how far coding agents have come. They can now solve complete exams. That changes what we should measure. My Tools in Data Science course has a Remote Online Exam. It was so difficult that, in 2023, it sparked threads titled “What is the purpose of an impossible ROE?” Today, despite making the test harder, students solve it easily with Claude, ChatGPT, etc. Here’s today’s score distribution: ...

Things I Learned - 09 Nov 2025

This week, I learned: “But when an identity based belief was challenged, the brain responded as if under physical attack.” Why Engineers Can’t Be Rational About Programming Languages Notes from How to build a cult, Lulu Cheng, The Knowledge Project podcast Conviction is infectious. Communicate at the INTERSECTION of interests. Learn theirs Begin with “why your story matters to them” (first sentence). That beats “how you tell it” > “where you tell it”. The easiest way to align with an audience is to find your community. Humor, curiosity, awe, any strong emotion is a hook. Culture has momentum. Best way to break it is to show an alternative that works. People will copy that REPEAT messages over and over with complete CONVICTION to convince people who TRUST you. That works, but you need all three. Trust builds from likeability, repeated exposure, common beliefs. An excellent way to defend against online criticism (when it matters) is to just SHOW UP and THANK them for feedback. Serious reputational damage must either be fixed immediately - or you live with it forever. Between a story and statistics, the story will always wins. Never fight a story with a statistic. Dig into your statistics and uncover BETTER stories. ⭐ Prebuttals are a great idea. Start with all possible criticisms yourself and diffuse them. The other person has nothing left to say Sparring keeps you sharp. Spar with LLMs. To defend, show how the attack targets other people, increasing the surface area. Show how the SPECIFIC attack targets a larger group. Create a SPECIFIC cause worth fighting for. Each role has specific objective to optimise for. The leader’s role is to balance across these. Cheerleader effect. People look beautiful next to a cheerleader. Associations taint. Each person has dozens of aspects to their persona. We cannot remember all of them. Each person can make a choice on who they project themselves to be in any group. Shaping their persona. The Rainbow CSV extension may be causing delays (infinite spinner) when pasting Markdown in VS Code. Restarting it seems to fix the issue. ⭐ Claude scientific skills is a collection of skills teaching Claude how to use scientific libraries, databases, and APIs across several domains. This may be a good example of a non-trivial skill library - that is hard for AI coding agents to infer by themselves. Notes from How I use every Claude Code feature Use AGENTS.md as guardrails, not a manual. Document what it gets wrong. Use self-documenting tools/APIs rather than documenting. Docs: Explain why and when to read each doc. Never say “Never.” Explain when to which which alternative. Prefer CLIs for stateless tools, MCPs for stateful, authenticated, or complex (e.g. Playwright). Coding agents work well with version control. Simon Willison Break up uncommitted changes into small commits Rewrite branch history for readability Use gh CLI to fetch line-wise comments from a PR and make requested changes (e.g. renaming, refactoring, adding types, etc.) ⭐ When using MCPs or tools with private data, “color untrusted content in red, unsafe actions in blue, and never mix colors.” Good advice. ⭐ DeepWiki offers a codemaps feature that explains code in an interactive way. It shows a structured explanation on the left. You can click on any note to see the code on the right. It’s an effective way to understand how a library or tool executes a task. Here’s an example of how Mermaid works. Gemini offers RAG with free storage. RAG costs are quite high. This simplifies the process a lot. But I tried running the sample program and after an hour, it still had not completed uploading a single file. Best to wait and watch. OpenRouter supports embedding models using an OpenAI-like API Kimi K2 Thinking seems popular because It’s an open-weights model on par with the top models on Humanity’s Last Exam (text-only) and BrowseComp Can run 200-300 tool calls without human guidance 4x cheaper than GPT-5 with low tokens (32B active on 1T parameters, INT4 quantized) Based on responses to Simon Willison’s question, ChatGPT Fine-tuning helps when: Lower latency, e.g. for type-ahead, at lower cost (37 mentions) Structured extraction, parsing and classifiers, e.g. postal address, detecting secrets (18 mentions) Custom vision models, e.g. check containers (12 mentions) Domain-specific code and stacks (niche languages, stack-specific generation, text→SQL) (11 mentions) … and a long tail. Fine tuning does not help: When A base model plus prompting or RAG does as well or better (15 mentions) When you risk being leapfrogged by a new release (4 mentions) When cost and data do not justify the ROI (3 mentions) The data I can export from my Android phone includes the below. 🟢 indicates it’s tracked. 🟡 might need action, e.g. enabling / coding. # 🟢 GPS/GNSS location (current & history). Turn on device Location. If you want a timeline you can export, enable Google Location History and later export via Google Takeout → Location History (JSON/KML). 🟡 GNSS raw measurements (engineering traces). Android exposes GNSS “raw” logs on many devices; capture with dev tools or logging apps if supported (intended for research). See GNSS Raw Measurements API. 🟢 Wi-Fi scans (nearby SSIDs/BSSIDs). Toggle Location scanning → Wi-Fi scanning in Location settings; apps need location permission to read results. 🟡 Wi-Fi RTT distance to APs (indoor ranging). Apps can use Wi-Fi RTT (802.11mc/az) to measure distance to compatible APs; requires location permission. 🟢 Bluetooth proximity/traffic. For packet-level logs, enable Developer options → Enable Bluetooth HCI snoop log, then pull /sdcard/btsnoop_hci.log (Wireshark). 🟢 Cell towers (IDs, signal strength). Apps can read via TelephonyManager (e.g., getAllCellInfo()), with appropriate telephony permissions. 🟢 Activity recognition (walking, running, in vehicle). Apps must request ACTIVITY_RECOGNITION (runtime) from Android 10+. 🟢 Steps (step counter / detector). Use sensors API; from Android 10+ you must declare ACTIVITY_RECOGNITION to access step counter/step detector. 🟢 Accelerometer / gyroscope / magnetometer streams. Apps read via SensorManager; some high-rate reads require HIGH_SAMPLING_RATE_SENSORS. 🟢 Ambient light / proximity. Read via SensorManager; typically no special permission. 🟢 Google Fit data (steps, workouts, heart rate from wearables, etc.). Manage and export from Google Fit / Google account Download your data. 🟢 Contacts. MIUI → Settings → System apps → Contacts → Import/Export to .vcf (vCard). 🟢 Call history / SMS (device). MIUI local/cloud backup can include call logs & messages; export by creating a local/Cloud backup and downloading. Note: 3P apps can’t read call/SMS logs unless they’re the default dialer/SMS. 🟡 Gmail, Calendar, Contacts (Google). Export via Google Takeout (MBOX/ICS/CSV etc.). 🟡 WhatsApp / Telegram / Signal chats. Use in-app exports: WhatsApp → Export chat, Telegram Desktop → Export, Signal → encrypted backup. 🟢 Advertising ID. View/reset in Settings → Google → Ads (wording varies), per Google help on Ad ID reset. 🟡 Per-app screen time / unlocks / opens. Third-party “usage” apps (e.g., analytics or “digital wellbeing” clones) require Usage Access (PACKAGE_USAGE_STATS). Use Android’s UsageStatsManager or apps that export CSV. Stock Digital Wellbeing does not offer an export. 🟡 Notification history (last 24h). Settings → Notifications → Notification history → On. OEM-optional, but present on most devices. Viewable once enabled. 🟡 Notification content stream (live). Grant an app Notification access to capture/export notifications going forward. (User-granted API via NotificationListenerService.) | 🟢 Per-app data usage (mobile/Wi-Fi). Apps/ADB can query NetworkStatsManager; Settings shows per-app totals. Advanced dumps via adb shell dumpsys netstats. 🟡 Wi-Fi detailed logs. Developer options → Enable Wi-Fi verbose logging for richer diagnostics. 🟡 Bluetooth packet logs. Developer options → Enable Bluetooth HCI snoop log; export file and analyze in Wireshark. 🟢 Per-app storage usage. Apps/ADB can query StorageStatsManager; Settings shows per-app storage. 🟡 Photo/video metadata (EXIF incl. location). Enable “Save location” in Camera app to embed GPS in EXIF; export files normally (EXIF remains). | 🟢 Downloads & file metadata. Use a file manager or connect via USB; metadata is in the files themselves. | 🟢 Battery usage history (per-UID/app), wakelocks, jobs. Generate adb bugreport and analyze with Battery Historian or dumpsys batterystats. 🟡 System/device logs (logcat). You can view via ADB/Android Studio. Android restricts 3rd-party access to system-wide logs for privacy. 🟢 Developer quick tiles (Sensors off). Developer options → Quick settings developer tiles → Sensors off to globally cut Camera/Mic & SensorManager sensors on demand. 🟡 Google Takeout: one-stop export for Location History (Timeline), Gmail (MBOX), Calendar (ICS), Google Photos, Drive, YouTube, Fit, etc. MacroDroid, Automate and Tasker sound like powerful Android workflow automation tools. Some uses I can put it to: Automatically upload recordings to Dropbox Turn off hotspot when I reach office Vibrate if I’m walking slowly Adding <link rel="alternate" type="text/markdown" title="LLM-friendly version" href="/llms.txt"> is an emerging approach for pointing to LLMs.txt. It works. I asked Codex to read the CloudFlare vitest page. It read the file truncating the middle, found the <link rel="alternate" type="text/markdown" href="https://developers.cloudflare.com/workers/testing/vitest-integration/write-your-first-test/index.md"/ link in it, and reasoned “Considering fetching markdown instructions” and fetched the Markdown page. Giles’ Blog toon is a YAML-like format that’s LLM friendly and especially token-efficient (CSV-like) for tables. You can convert back and forth between JSON and toon. Food printing applies 3D printing techniques to create real food items. Given the art that this can create, I expect at least some adoption in niche restaurants. PMTiles lets you store map tiles as a single-file archive that libraries like MapLibre can read. Useful to avoid tile servers. Mirrow is a CLI SVG animation builder that converts a DSL to animated SVGs. However, it may be easier to use an LLM to create the animated SVG directly with SMIL than learning Mirrow (or teaching the LLM Mirrow). ⭐ One approach to giving memory (“episodic memory”) to coding agents is to allow them to search their logs.This gives them access to past discussions about a repo or other repos. To configure Gemini CLI with an AI router, set: "security.auth.selectedType": "gemini-api-key" in ~/.gemini/settings.json export GOOGLE_GEMINI_BASE_URL=https://llmfoundry.straive.com/gemini/ (or your AI router base URL for Gemini) export GEMINI_API_KEY=... (your AI router API key) Passing a HAR export to an LLM to build a scraper is a powerful idea! Lessons from Diagram Chasing Addy Osmani’s Gemini CLI tips are practical guides to using any coding agent, not just Gemini. I learnt about: Run shell commands with !, e.g. !ls -la or even !bash. It’s added to the chat. On-the-fly tool creation: ask it to write code for the task on the fly. Use it for system optimization, e.g. editing dotfiles, system customization, log error analysis, etc. Run GEMINI_SYSTEM_MD=... gemini -p "task" --yolo --format json < input.txt to run Gemini with a different system prompt and feed it input.txt to run in a pipeline. (FYI: Codex does not send a default system prompt, so there’s nothing to override.) There is a Gemini CLI Show and Tell thread with examples. This include Janitor AI, a Gemini CLI session viewer, etc. Hands on with Gemini CLI has several Use cases to try out. Renaming photos and organizing files are clever ones. AGENTS.md can be used like a decision log - rules, styles, or preferences that evolve over time - on a per-repo basis. Gemini’s /memory add feature helps with this. gemini --checkpointing is a useful “undo” feature. /restore rolls you back to a specific checkpoint. The overhead is small. Caching is only available with API key or Vertex AI, not OAuth login as of now OpenAI TTS costs are confusing. But in short TTS-1 costs $15 / MChars (max 4,096 chars per request), which ends up at ~86c / hour GPT-4o Mini TTS costs ~$16 / MChars (max 2K tokens which is ~7,000 chars per request), which ends up at ~88c / hour. Very similar cost, effectively TTS-1 HD is twice TTS-1. OpenAI has a usage API that provides cost as well as usage for completions, images, audio speeches, etc. These require an organization admin key Cost API: curl "https://api.openai.com/v1/organization/costs?start_time=$TIMESTAMP&project_ids=$PROJECT_ID&group_by=line_item" Audio speech usage API: curl "https://api.openai.com/v1/organization/usage/audio_speeches?start_time=$TIMESTAMP&project_ids=$PROJECT_ID&group_by=model"