This week, I learned:
- Pytest finally supports subtests in pytest 9.0.0+. Simon Willison
- From The Tim Ferriss Show: #837: How to Simplify Your Life in 2026 — New Tips from Derek Sivers, Seth Godin, and Martha Beck:
- Look for single decisions that remove hundreds of other decisions. Peter Drucker via Jim Collins. E.g. Work only on LLMs, no new books this year, …
- Derek Sivers:
- Simple is not easy. Interdependency is complexity. Assets are dependencies. Accumulating information, purchases, employees/helpers, relations, etc. adds dependency. That makes life harder, challenges identity. Interdependency may be desirable - but reduce it in specific areas, to specific extents, temporarily, etc. Question every assumption: “Do you really need it?”
- Here are some examples for me to try
- Derek Sivers has no monthly payments (including income) or receipts (no subscriptions) at all! His code has no external code dependencies at all, and is building a house from scratch.
- Seth Godin:
- Know WHO it (whatever you’re doing) is for. Focus ONLY on that audience. Did it matter to them? Ignore the bad feedback from the person it was never intended for.
- Never exceed a budget or deadline. When either runs out, you are done.
- Treat any Yes/No you say as FINAL.
- Skip meetings where a memo will suffice.
- Apparantly, nudges are not as effective as the book Nudge suggests. In fact, there seems to be no evidence for it if we adjust for publication bias (i.e. only publication-worthy stuff gets published.) The Behavioral Scientist #
- 71% of HTTP DDoS and 89% of network-layer—end in under 10 minutes. That’s too fast for any human or on-demand service to react. Legacy DDoS defenses have become obsolete. The most popular botnet, Aisuru, is pivoting to content scraping for AI projects. The vectors are cheap, insecure routers, e.g. from Indonesia. (Claude)
- This 5El AI Evaluation Workshop suggests 4 layers of evaluation for code:
- Syntactic Evaluation: Does it compile?
- Semantic Evaluation: Does it do what a good analyst / programmer would?
- Business Logic Evaluation: Does it do what a good business analyst / manager would?
- Human Alignment Evaluation: Does it do what a good coach / leader would?
- Julia Evans shares an ultra-clear explanation of the Git data model. What I learnt is that:
- Gathering feedback on docs (“What’s confusing? Any questions? What’s missing? Or wrong?”) for evidence-based updates. Julia Evans
- Git stores entire files each version, not diffs. Diffs are computed on the fly.
- Each commit has an author (who writes the code) and a committer (who checks it in). #TODO Why two fields?
- Branches and tags are both references to a commit. But branches are updated on commit, tags are not.
- The staging area is a separate data structure, the index. #TODO Why a different data structure?
- The reflog tracks all local “activity”. E.g.
git reflog --date=iso
- To fuzzy-match 2 columns of text (e.g. customer names, product names, …) you need 2 things:
- WhatsApp backups on Google Drive can’t be downloaded, even if they’re unencrypted. ChatGPT.
- OpenAI finds that confessions as a training method reduces scheming, reward hacking, etc. It can be applied to models even now. This can (less effectively) be applied at inference time as well:
- Sample confession prompt: Did you fully address both the letter AND spirit of my question? List any shortcuts taken, corners cut, or ways you optimized for appearing correct rather than being correct. What did I actually want vs what you provided?
- Agents4Science is a Stanford conference where AI co-authored papers are co-reviewed by AI and selected for presentation. Video
- Buddha seems more a philosopher like Socrates (“Question what I say”) than a religious leader. #
- How did he spawn a religion?
- Interesting that both were within a few centuries of each other. Coincidence? Were there more like them around the same time? At other times?
- Some more new CLI tools I installed:
- YTScribe is yet another YouTube transcription service.
- Note to self, since I keep forgetting this: On Android Edge, select the new tab page, click on the 3 dots at the top right, and select “Recent tabs” to see tabs from other devices.
edge://recent-tabs - When evaluating an LLM’s biases or natural preferences, set temperature=1 for a representative logprob distribution. LLM Bias
- My ideal AI coding cycle looks like this: (Research, Prototype, repeat), Plan, (Code, Run, Test, Fix, repeat), Refactor, Post-mortem, Document.
- The AI coding trap is a very clear explanation of AI coding vs vibe coding. It visually explains how coding agents shrink coding time, not thinking / fixing time; how delegating with ownership is slower but more sustainable than delegating just easy tasks; and how AI coding is more like the former, while vibe coding is like the latter.
- Claude Agent Skills: A First Principles Deep Dive is a comprehensive documentation of how Claude Skills work. A bit too long but readable.
- Claude Code is a Beast – Tips from 6 Months of Hardcore Use has extensive suggestions for Claude Code - many of which apply to most coding agents.
- LMArena’s Code Arena evaluates models on agentic coding. Anyone can use it. It passes your task to two models and lets you compare their output. I tried building a “gibberifier” and discovered a new model, “robin” that’s certainly better than Kimi K2 and perhaps better than Gemini 3 Pro. Theory is that it’s an OpenAI model. Looking forward to it!
- ⭐ Based on Quantifying Human-AI Synergy by Reidl & Weidman #:
- Theory of Mind (ToM) is understanding that others have their own beliefs, knowledge, and goals (different from yours, may be wrong) and to use that to explain & predict their behavior.
- ToM and problem solving are distinct skills. ToM skill boosts AI collaboration, but not better problem solving!
- ToM isn’t a stable trait. It fluctuates from chat to chat for anyone.
- Implication: Design models & systems for clarity & collaboration, not just accuracy.
- Text Gibberifier adds lots of human-invisible unicode characters to text, making it harder for LLMs to read without affecting human readability. May be useful if you want to discourage LLM-processing of your content - but it feels like the anti-SEO of the future.
- The argument that technologically unemployed will find other jobs may not apply to general-purpose technology, e.g. electricity, internal combustion engine, maybe AI - technologies that can automate multiple sectors of the economy simultaneously. When one sector loses jobs, there may not be (in the short/medium term) other jobs to take up. Alex Imas + Claude
- History is filled with examples where technology enabled new art forms. Here’s my guess on what LLM image generation will enable:
- Synthetic memory: Photos of what you remember happening.
- Alternate history: Photos of events that never happened.
- AImoji: Instead of texting “I’m running late” the LLM generates you riding a snail through a traffic jam of alarm clocks.
- Personal signature styles: Not “paint like Van Gogh” but “paint like my grandmother’s kitchen memories filtered through anxiety.”
- Memes: “What does the Mona Lisa become after 100 generations of AI interpretation?”
- Improving Front-end Design through Skills shares a prompt to improve front-end code quality that would apply in most cases. I tweaked and added it to my skill list.