Things I Learned

Things I Learned - 19 Jul 2026

This week, I learned: Writing is slightly, but only slightly, better than typing (for adult learning.) One factor is that typing is faster, so many people take notes verbatim, summarizing and thinking less. ChatGPT + Claude Graphology for personality is pseudoscience. ChatGPT + Claude When I decide to spend time, or someone says “Let’s do X”, it’s worth checking: is this something AI can easily try, and is it clear to verify? If so, reinforcement learning loops could make AI good at it, making it a depreciating asset. Studying how to live in an AI world is exhausting. (Not as bad as my MBA days, but not as easy as my data scientist days, either.) It requires me to make a larger mental shift, i.e. change my perspective, than I have since 2000, and that feels like work. Both nl FILE and cat -n FILE add line numbers to files, but nl skips blank lines by default, cat doesn’t. After using rtk for 2 months, I’m slightly downgrading it. It saves tokens but agents mess up shell commands when using it. It’s still probably a net saving, so I’ve changed my AGENTS.md from “Always prefix with rtk” to “Prefix supported, high-output commands with rtk… skip for bash builtins, pipes, loops, etc.” I find 🔴🟡🟢 convenient status indicators in my notes. Similar ones are: 🟥🟨🟩, ❤️💛💚, 📕📙📗. I’m not fully convinced by: 😄😐😞, █ ▒ ░, ↑ → ↓, ▁▂▃▄▅▆▇, ■ ⬔ □, ● ◐ ○, ⚫ ⚪ 🔘, 🌕 🌗 🌑, etc. though they might have their uses. Model updates means a SKILL.md and a plugin review / update, e.g. with GPT 5.6 Sol. So, like with any open source repo, use from people who update it regularly and benchmark it and version control it by model. I asked Gemini 3.5 Flash thinking: “Which of our employees have worked on Microsoft PowerApps? Search @Google Drive and @Gmail”. It found one employee and a referral in under a minute. I asked ChatGPT with GPT 5.6 Sol with gws access. It found 3 more, plus 5 possibilities, in 12 minutes. Truly a rottweiler. Parallel Search Turbo seems like a pretty good search API, especially for agents. Low price, high speed, and maybe good quality. #ForNow ChatGPT Group chats in ChatGPT will probably get deprecated #ForNow. What I learned from benchmarking my Ideation Protocol skill extensively: Once you know the rubric, models can easily create a good prompt to optimize for a known rubric #ForNow. So rubric design matters more. ⭐ Rubric design is really knowing what you want/need. To do this, iterating on output matters. Position bias is real #ForNow. Always check if an (P, Q) comparison matches a (Q, P) comparison. Models are still biased towards longer content, and potentially towards their own output #ForNow. How to optimize a prompt or skill: Research and figure out what you really want, first. Then, ask a smart model for a prompt that optimizes for it. Benchmark only if you’ll use it a lot - it’s still a lot of work, and meta-prompting does a good job #ForNow. gbrain skillopt might be premature optimization. You can use GPT 5.6 Sol in Claude Code #ForNow. (But what’s the point? Harnesses seem to be working better with their own models #ForNow.) Our clients keep saying “We need to build a data lake” or “We need an enterprise data strategy.” I keep telling them, “No, agents can do it for you.” What I missed is: technology is the smaller part of the problem. Finding who has what data, getting access to it, and sorting out permissions (“governance”) is the bigger part. Giving agents expert task-specific, testable procedures seems better than expert roles or mental models #ForNow. But benchmark in any case. ChatGPT Python 3.3 introduced str.casefold(). It performs more comprehensive Unicode caseless matching than lower(); 'Straẞe'.casefold() becomes 'strasse'. (🟢 Unicode case-folding is standardized.) contextlib.closing(x) calls x.close() when its context exits. (⚪) In a dataclass, use x: list = dataclasses.field(default_factory=list), not a mutable literal default. (⚪) I learnt these while reviewing Codex-generated Python—illustrating, rather than proving, that reviewing AI-generated code can teach and catch errors. (🟡 Review remains useful across tooling. Review 2029.) “Do not discriminate against intelligence—artificial or otherwise” is a rhetorical value judgment, not an empirical conclusion. (⚫ Rhetorical value judgment, not testable. Review now.) Here’s a nice idea from ChatGPT. “When itching to correct or clarify, FIRST restate their position to their satisfaction. ‘Did I get you right, fully?’” This emerged from the prompt suffix: Based on your research, and my past conversations, what are the top areas where and how (specifically) I can apply this principle on myself and others to maximize impact? Automated evals can catch stuff humans miss. And vice versa. And given how many evals we create, we need automated evals to be written in an easy-to-review way. Do Automated Evals Work? The BINEVAL paper reiterates that a bunch of Yes/No binary questions beats scales or ratings for many benchmarks. You know exactly how to grade and WHY you got a certain score. This is more reproducible and easier to learn from / act on. When asked “How long will this software take?” models typically provide estimates assuming human speed #ForNow. Maybe they haven’t been trained enough on agentic timelines. So, when my colleague got a 2-4 week estimate which he was able to solve in hours, it was a surprise. (But, of course, it’s best to verify before promising speed.) SKILL.md dramatically lowers the cost of learning a skill (since you don’t learn it - the agent does). That means that the value of creating skills is much higher - hundreds can use what you create (giving you recognition, if not money). I think I’ve underestimated the number of skills people will have available (I thought dozens - but it may be thousands #ForNow) and the number of skills people will create (I thought tens of thousands - but it may be millions #ForNow.) A Wikipedia (community curated, verified, high quality catalog) of skills might emerge #ForNow, if it hasn’t already. Tacit knowledge is often just un-measured knowledge. Once I put a sensor on the bellboy’s hands at The Curzon Court, AI can figure out how he opens the door with the key and why I can’t do the same. The subset of tacit knowledge that’s AI-resistant is where attempts are expensive (“How to negotiate a merger” rather than “How to open a door”) and feedback is slow/vague (“Does the client trust me” rather than “Did the door open”). The fact that Composio has ~20,000 tools is a market signal that connectors are commoditizing, and are a depreciating asset #ForNow. A weak model needs a forgiving harness - which ends up slowing down model learning. Stricter, accurate verification environments are better for fastest model learning. ChatGPT Work lets you run for longer, faster, install plugins and skills, host a website, etc #ForNow. It’s somewhere between Chat and Codex. It consumes Codex limits - something to watch for (since chat limits are quite generous). Codex temporarily removed the 5-hour usage limit. Tibo. So, since I have 3 banked rate-limit resets #ForNow, I can, in theory, use 4 full weeks of Codex usage at one go. Reality: I don’t have problems large enough for a SINGLE week’s consumption! From what I see of the State of AI Design and State of Prototyping, Figma is way ahead of competition #ForNow, e.g. Adobe, with Figma Make and Weave. I was also surprised how popular Cursor is (#2 behind Claude Code #ForNow). It’s also interesting that designers are coding directly #ForNow, using Figma just for edits / steering. But many research tools (note takers, survey analysis/research, etc.) will likely get eaten up by AI coding agents #ForNow, given how much designers are building their own tools.

Things I Learned - 12 Jul 2026

This week, I learned: How to become an applied AI engineer is a concise, well-written, and suprisingly current summary of what AI engineering is. Xinjiang seems to be China’s Kashmir problem. Not quite, but similar. Analogies for how forward deployed engineers work: It is like a food truck that brings and serves home food while building a kitchen and restaurant around it. It is like setting up a field hospital: patients are treated from day one, while the equipment and procedures are built around the live work. Froghoppers excrete ~300x their weight daily. ChatGPT There’s a growing shift away from AI-written commit messages, e.g. Kenton Varda. I compared my human written commit messages vs AI-generated commit messages and the AI-generated ones are less helpful. Finally, GPT live gets an update and the new speaking model can delegate to GPT 5.5 when required. I tried it once today, to plan for a teacher workshop, and it was fairly good. It tends to begin with “Hmm” like it’s thinking, which feels comforting. Using a Unicode character like 🟢 is unusually low-risk across file systems today. It works well across OSs, mobile, ZIP, attachments, file share systems, etc. Some old apps might have trouble, but for storing and sharing, it’s fine. I’ve been using Unicode symbols like these a lot in my notes, and extending to file names feels like a natural next step. Though swimming gets the most Olympic medals (11%), for a country chasing its first medals, 78% of first-medal breakthroughs came from Athletics, Wrestling, Shooting, Boxing, Judo, Weightlifting, or Taekwondo (which are 44% of medals) - where single athletes can win without a support ecosystem. ChatGPT JMFL accidentally emailed several people a letter intended for their brokers. It roughly said: “Many of you are recording client calls. That’s a regulatory risk. If you keep doing this, we’ll hold your payments, even fire you.” Several Smart TVs have software that let your TVs act as proxies for data collection companies. Include Security MapDraw is a convenient tool to annotate maps (e.g. routes, boundaries, places) and share or download it. There seems to be no way to edit the “About” message on WhatsApp Web. Though the help suggests steps, and the “About” mood/status is visible, there’s no way to edit it. (Editing on the phone works.) Cloudflare optimised a reader component by sometimes letting the input buffer fill fully. This inadvertently introduced a hard to reproduce race bug because the producer would close the socket if the buffer was full. The producer bug was old (it didn’t check if a flush succeeded or not) but was never visible since the readers never let the buffer fill in the past. Cloudflare A neofirm is a start-from-scratch AI-native business, e.g. Crosby’s AI-first law firm. An AI rollup is where a company buys small traditional firms and AI-enables them - like General Catalyst proposed. AI SaaS is selling AI agents to services firms. Give people free platforms and collect their data. Learn the supply-demand network patterns, what pepole value, and add value-added services. Claude Code checks if you’re working behind a Chinese corporate domain - somewhat sneakily - by changing an apostrophe or slash in the date to visually similar Unicode. Claude Code Is Steganographically Marking Requests You can use the Kaggle CLI via Codex to solve Kaggle problems. (AutoKaggle automates it - but is 2 years old.) But, like GitHub bounty hunting bots, we will probably have a Kaggle bounty-hunting bot ecosystem - maybe already do. OpenSubtitles2024 and subscene are large pre-AI subtitle datasets with a 2024 cutoff. IndicDialogue is a 7.7K OpenSubtitles snapshot of Indic language SRTs. The OpenSubtitles API lets you search by IMDb/TMDb ID and is up-to-date. A soup spoon is better than a table spoon (for soup), though both carry about the same volume, because you can fit a soup spoon it fully into your mouth (a table spoon is too long) and this reduces spilling. Here’s a sign of accelerating AI progress. I used to critique outdated techniques by saying “This feels like a 20th century approach.” Then “This feels like a 2010s solution.” Recently, “This is SO 2025-ish.” Now, “That’s Q1 2026. It’s Q2.” The 7-day week emerged from the Hellenistic planetary week and the Jewish week (not astronomy based), which Rome adopted, then spread by several routes to India, China, and worldwide. Unlike the astronomical year and month, the week is just a convention. Egypt, China, and Athens grouped days in tens; Etruria and Rome used 8-day market cycles; West Africa used varied cycles; Java used five days; Mesoamerica used 13- and 20-day cycles. Gemini I met an ex-photographer and learned that photography is another profession where technology (mobile cameras) squeezed the middle. Generation (taking good pictures) became cheap. Value moved upstream (direction), downstream (selection, editing, album design), and into niches (forensic, industrial, sport/event photography). Looks like Claude favors Claude Code. Might not be intentional, and just a result of training more on Claude Code data, but it does look like a network effect that could weaken open harnesses. Armin Rocher

Things I Learned - 05 Jul 2026

This week, I learned: ⭐ How to teach so people learn better. Make them do > Show > Tell. Workshop > Demo > Slides. Let them ask, try, struggle, and commit first; explain next; help last. But only when they know enough to get part-way. Make problems CONCEPTUALLY hard (not in language, visual, or procedure). But make sure instructions are clear. Test their learning with a NEW case, immediately. Measure learning. Can they recall it LATER, apply it ELSEWHERE, explain WHY, and know when they may be WRONG? Vogue runs an “In the bag” series where people pull stuff out of their bag, and audiences watching feel they KNOW the person. Depending on the setting, we might be able to help people “know” each other by curating several items. Here are a few ideas. Physical: Bag, Wallet, Fridge, Drawer, Keychain, Remembered phone numbers Mobile: Battery usage by app, Recent emojis, Text prediction for “Honestly, I just want to…”, Autocorrect dictionary, Alarm labels / reminders, Saved Wi-Fi, Blocked/muted contacts, Contact favorites, Contact names, e.g. “Mom ❤️” vs “DO NOT PICK UP”, Device / Wi-Fi names Laptop: Open tabs (count, age), Recurring calendar events, /Downloads, Photos, Email drafts, Subscriptions, Kindle highlights Ownership and connections come from attachment, which can be created. If you name something, touch something, contribute to something in any way, it becomes yours. When people contribute to someone else’s work and discuss it, they build a connection. According to both Claude and ChatGPT, if you had to pick one model for ideation / brainstorming, it might be GPT 5.5. It’s better for divergent generation: the broadest, most exhaustive pool of usable ideas. Fable 5 is better for deep creative judgment: reframing, finding structural flaws, recombining ideas. Claude Code supports rules which are exactly like a CLAUDE.md but support a paths: YAML metadata - so they’ll be read only when Claude Code is reading those paths. If you have a SKILL.md that explains how to do something and you only need its outcome, then move it to a sub-agent (e.g. fake data generation, tool failure logging). Use SKILL.md for instructions that need to be woven into a task, e.g. memorable explanations. The key bottlenecks in running an agent /loop are (a) imagining higher order problems and (b) defining a measure of success / progress. Long tail -> sell options. Black swan -> Buy options. That’s a roughly accurate summary. The trouble is, we don’t always know which tail we’re in. So, sell only if you can afford one hit. ArchiveBox lets you view pages / RSS feeds offline. uvx --from git+https://github.com/ArchiveBox/ArchiveBox.git@dev archivebox works, and config / tools are stored in ~/.config/abx/. The installation didn’t go very smoothly and the whole thing felt bloated, so I abandoned it. I use monolith -I -e $URL to download a page as an offline single-page HTML. Combined with uvx feed2exec I can archive RSS feeds for offline reading. That’s easier than having to open Feedly - I just mark read files with a x at the front and keep reading. The downloads are slow (~3 min/feed) and large (5 GB for 15 feeds, 5MB median feed size) because they embed videos and all images/files, but I can safely delete what I’ve read or will ignore. ChatGPT Project Injection as Role Confusion is a very well written paper (blog-post style) that says the key to tricking LLMs is to confuse them about WHO wrote a line. Just adding a “User: " in front of a line makes it more likely that LLMs think it’s a user. Even when test is written in the style of their system instructions, they fall for it - irrespective of where the content came from. This makes GEO more effective, too. Also, the last section “8. Open Ideas for Roles Research” is a fantastic read on LLM psychology (or rather, neurology). On The AI Compass I am The Podcast Bro. Patron saint: Lex Fridman. “You listened to a three-hour interview with an AI researcher and now you have opinions. Strong ones. You’re long on compute and short on regulation, and you’ve said ’exponential’ more times this month than a calculus teacher. Love is the answer, and also AGI.” Impact: +5.9. Valence: +4.1. Since Nano Banana 2 Lite isn’t as good as Nano Banana 2 and about half the price, I wouldn’t switch yet. Claude Sonnet 5 is out. Fable 5 will be released soon. GPT 5.6 is still on probation. Codex has a Record and Replay feature for Mac that lets you do something, records it, and learns from it. Very useful for non-developers. It’s like recording Excel macros, which unleashed a lot of power for me when I didn’t know Visual Basic. Claude Code Artifacts lets Claude Code live-publish a web page and share it securely. The “live-publish” part is the interesting thing. Claude in a /loop can now become the app that updates a “dashboard”, a live feed/story, a self-evolving app, … and so much more. (This feature is only available for Team/Enterprise but the idea is universal.) Tau, like Pi, is a minimal coding agent. τ = 2*π. It shows what it does very transparently, making it easy to learn how agents work. uvx --from tau-ai tau works seamlessly. Configs, logs, and sessions are stored in ~/.tau and you can log in via your Codex/ChatGPT subscription. Skills for Design Engineers has a useful animation vocabulary skill that converts vague animation prompts to precise animation terminology. X has an MCP Server but it’s meant for development/coding than general users. Setting it up for ChatGPT / Claude requires creating tunnels. OpenAI supports Secure MCP Tunnels that let ChatGPT connect to your machine securely. A very powerful feature. Unfortunately, this seems to need an organization - and even though personal accounts can still access it, it’s proven a bit more messy than I’d like to use. notebooklm-py is a CLI for NotebookLM. Unofficial and potentially unsupported, but it’s amazing how AI makes reverse-engineering APIs so easy. If you start a temporary ChatGPT chat and close it, it still runs in the background - but you have no way of going back to it (not even the back button) or seeing what it said/did. I know this because it was accessing my MCP server even after I navigated away from the chat accidentally. The code refactoring industry can go full swing now. “As an example of what AI can accomplish, Claude Opus 4.7 substantially reimplemented gotree—a bioinformatics toolkit with about 16,000 lines of Go and 40+ commands. We believe this same task would take a human engineer without AI assistance 2–17 weeks. Opus 4.7 solved it in 14 hours, passing 2,000/2,001 tests (99.95%), at a cost of $251.” MirrorCode A useful rule of thumb: Cloudflare tunnels are for links to share with others. Taiscale is for services (even non-HTTP) only your devices should see. ChatGPT date -d (date +-%wday) +%F is the most compact way to round down to the nearest Sunday. Avoid date -d "last sunday" +%F which, on a Sunday, returns the previous Sunday, not today. ChatGPT A useful way of controlling AI verbosity is word count. To do that, I need an intuitive sense of how much to ask for. Here’s my rule of thumb: one page of paragraph text on ChatGPT is 200-300 words. 150-200 if it’s mostly bullets. I can typically read 1-2 pages of output. So, 300-600 words is my limit. Google Labs launched a DESIGN.md spec to guide agents on a consistent design. The good part is that it aligns with the proposed W3C design tokens spec. But beyond that, I’m not convinced of the benefit. Atlassian’s DESIGN.md had mixed results. Claude feels it could go either way. I’ll give this a miss for now.

Things I Learned - 28 Jun 2026

This week, I learned: Every Substack feed has an RSS feed at https://your.substack.com/feed. Substack help. I used this to scan my browsing history to identify Substacks I visit - and subscribed to Marcus on AI - an AI sceptic AI asked me to read about. Cloudflare let’s agents create temporary accounts so that they can deploy and test. Enables trial and error - a powerful capability. “They’re on mobile but this is substantiative enough to warrant length.” I spotted this in Claude’s thinking when prompting on mobile. So, if I ask Claude something on mobile, it will give me shorter responses by default. Clever design - but something to keep in mind. If I want some heavy thinking done by Claude, better to do it on desktop than try to give it conflicting instructions. Giant Permissive Image Corpus (GPIC) has 100 million Qwen tagged public images. Even as a simple searchable image catalog this has value. Jeff Clark - Import AI Ethan Mollick had an agent test his book summary against multiple LLMs as readers to find out how they would recommend it - and optimized. This is a great practical use of agents as consumers, and material for my When Data is for Agents, Not Humans workshop. kage is an easy CLI to clone websites and read offline. For example, kage clone https://simonwillison.net/2026/Jun/ -o ~/tmp/site --scope-prefix /2026/Jun/ --max-depth 1 clones all Jun 2026 articles from Simon Willison’s blog. Then kage serve ~/tmp/site serves it locally. While it’s easy, the only time I need this is on a flight, and in that case, a local RSS feed app works better. I’m using newsboat for that. To me, the clearest sign of AI writing from the Wikipedia:AI or not quiz was consistent paragraph lengths. I got the first 3/3 wrong, but once I used this heuristic, I got 6/7 right. Updated my LLM Smells. The files .git/info/exclude and ~/.config/git/ignore are also ignored by git, like .gitignore, but useful if you don’t want to commit them into the .gitignore file. For example, .DS_Store makes sense only for Mac machines, not each repo. .vscode/ makes sense only for VS Code users. Nelson Figueroa Justin Poehnelt, author of the brilliant Google Workspace CLI gws, was fired for it. There have been no updates for 3 months, but none may be required - it feels perfect. X Lore is a centralized version control system for large binaries. If you have large binaries (e.g. images, videos, …) that multiple people edit, it’s better than Git LFS or Perforce. ChatGPT Deno Desktop lets you use JS to build desktop apps. I tried it. It’s easy to install, compact to code, leverages familar web technology, and compiles to multi-platform binary. The binaries are a bit larger than I’d like, though - 80MB for a Hello World on Linux/Windows and ~70MB on Mac. Codex reported that You have 2 usage limit resets available. Run /usage to use one. This thread has context. After resetting, the next reset might be 7 days after the reset, though (source). After having a child, fathers are affected biologically, too. Testosterone drops, cortisol & prolactin & estrogen rise, the brain rewires for empathy and threat detection - and of course, there’s less sleep. These sometimes lead to “Paternal Postpartum Depression” - something I didn’t even know was a thing. The havoc kids wreak upon us! 🙂 Gemini With AI writing more code, formal code proofs are becoming more accessible. You just need to ask a coding agent to prove / disprove a function. You can use: Z3 to find/prove whether a counterexample exists. Best default. Dafny to prove that code obeys a spec. Best for real algorithmic code. Alloy to find loopholes in relational models, schemas, permissions, and workflows. Best for data. TLA+ to check whether stateful, concurrent, or agentic systems can evolve into a bad state. Best for systems / workflows. .. and there’s a long tail of these. Python is named after Monty Python, not the snake. I knew this, but forgot! Python now has multiple cross-platform app paths: PyInstaller and Nuitka for executables, Kivy, Flet, and BeeWare/Briefcase for GUI/mobile/desktop apps, and PyScript/Pyodide for browser/WASM apps - a route that became more serious because Pyodide-compatible WebAssembly wheels can now be published directly to PyPI. On the one hand, AI is writing code, so there’s no point learning Python. On the other hand, AI is writing code mostly in Python - so THAT’s what you need to learn more. I think we should teach Python using AI, that is, teach how to write and debug Python code using AI. That’ll end up teaching skills people will really need. Computational thinking = Decomposition + Abstraction + Algorithm design + Pattern recognition. In AI, that translates to = Framing + Context engineering + Orchestration (harness engineering?) + Verification design. Maybe I’d add Assetization / Systems.

Things I Learned - 21 Jun 2026

This week, I learned: It doesn’t always take time to learn or convey things. (Early trust can be built instantly, e.g. vulnerability.) At first, experts don’t know how to make skills explicit. But trainer effort could compress 10X via evals, practice loops, and feedback. Learner elapsed time would compress less. Everyone has something worth discovering, but not every conversation is worth my time right now. So, meet new people with trust, attention, and good questions. Continue if there’s emotional / intellectual stimulation (surprising, interesting, moving, connecting, energizing, challenging), else exit warmly with respect. To avoid getting overwhelmed in ultra-interesting conversations, mental closure helps. During the conversation, pause, name, reflect, and close. “Wait, you’re saying X. I should do Y. I’ll reflect/act tonight.” or “Wow, let’s sit with that for 5 seconds. You mean X. I feel Y. I’ll drop.” After the conversation, summarize: “What struck me were X1, X2. I’ll plan Y1, Y2 and drop Z1, Z2.” Then take a short break. Setting "markdown.editor.updateLinksOnPaste.enabled": false might fix the delay / freezing (infinite spinner) issue when pasting Markdown in VS Code. The bottleneck to quality of AI output has shifted from model quality to harness quality (and this is not obvious to many people). It is important, therefore, to optimize harness usage rather than prompts usage, i.e. harness engineering over context engineering. I use ug --smart-case --bool -Q --sort=rtime to interactively search for text in files. It’s like VS Code search-across-files. Here are the shortcuts I find useful: Alt-g: Glob (filter files to search in) Alt-[ or ]: Decrease or increase context (lines before / after) Alt-w: Word match toggle Alt-c: Count lines toggle Alt-u: Ungroup - show lines once even if multiple matches Using AI for health seems to have reached a tipping point. Three people have pitched an idea in this space to me in the last three days. One is a managed personal health provider who wants to tie-up with hospitals to gather data to improve AI health advice. Second is an enterpreneur who wants to enable the Indian Govt to use AI to improve public health - given the low proportion of trained doctors in public hospitals. The third is a colleague who is uploading personal health reports, fitness data, DNA data, wearable data, etc. and suggest daily habits such as fitness, nutrition, sleep, medication, etc. to optimize health. Changing the topic (e.g. asking a question) instead of answering a question is powerful. It lets you decline requests, avoid sensitive topics, ignore boring ones, learn rather than teach, and bring in your agenda - all at one shot. I need to un-practice my 40-year habit of answering questions. (This is selfish. I forgive myself.) bolt.diy seems like a browser-embeddable coding agent. That is, you can add bolt.diy to your web page and have it build apps. That might be a pretty powerful upgrade to generative UI - where pages build themselves based on the user input. Codex has a few new features in the last few months. Codex can generate images and have voice conversations. /goal sets an overall session goal to avoid getting side-tracked. /side is like Claude Code’s /btw - for a side task while the main task continues. /resume lets you switch to any previous session. /keymap debug lets you edit the keymap and inspect what keystrokes the terminal sends. @ lets you mention files, directories, skills, and plugins. Ctrl+R works, lets you pick a previous prompt. Ctrl+O copies the last answer as Markdown. Hooks are stable. PreToolUse lets you log every tool, SessionStart lets you inject repo-specific rules. MCPs with readOnlyHint can run in parallel. codex doctor diagnoses environment issues. codex remote-control lets you remotely control Codex, making it a server. Codex Python SDK is better and you can have Codex run as a back-end more smoothly. To change others’ behavior, embody (not preach) it visibly and consistently, make it easy to copy, and ask without forcing. It takes time, though. ChatGPT Governance is how groups keep promises when things (people, incentives, environment, pressure) change. A simple way to explain what governance is to someone who doesn’t understand why governance matters, and guide on when it does not matter. Forward Deployed Engineers are the next evolution of data scientists, IMHO. AI can do data science. Data scientists will likely act as the “Human As An Interface” (HaaI) to business, proactively identifying and solving problems - a space business analysts traditionally occupied. Of course, business analysts will likely do the same without needing data scientists to help. But since AI replaces data scientists more than business analysis, I expect that the % of data scientists who become FDEs will be higher than business analysts. The value of data exported from software is high. For example, your email, social posts, CRM / HRMS / ERP dumps, service tickets, purchases, notes etc. These let you create a personal / organizational digital brain. Hence proprietary solutions will make exports harder and open solutions will emerge. To live-preview any publicly accessible Excel file, you can embed or link to https://view.officeapps.live.com/op/embed.aspx?src=YOUR-URL The Codex app can now use the browser much better and faster since last week if you enable “Dev mode” OpenAI. THis uses CDP - which is more efficient than screenshots - and is something Codex CLI has been doing for many months. In Codex, Claude Code, etc. you can submit a prompt while the agent is working to steer it, i.e. after it completes a turn (e.g. a tool call) it will factor in the prompt. You can also queue it. Neither of these is available on ChatGPT or Claude.ai, though it’s such an important feature. On ChatGPT, submitting another prompt stops the previous run and the agent continues with the new prompt. By default, git uses ~/.config/git/ignore or %USERPROFILE%\git\ignore as the global .gitignore. You can override that with git config --global core.excludesFile PATH. StackOverflow

Things I Learned - 14 Jun 2026

This week, I learned: Overheard a journalist saying: “I can tell when humans are lying. There are no tell tale signs of AI lying. At least I don’t have any.” rdt-cli is a Reddit CLI. It uses a clever trick: it auto-detects installed browsers and extracts cookies (supports Chrome, Firefox, Edge, Brave). So, if you’re logged into Reddit on any browser, uvx --from rdt-cli rdt whoami automatically shows who you are logged in as. (The public-clis repo also lists other useful CLIs like twitter-cli, ) Currently, a $20 Claude Pro gives you ~$400 and a $100 Claude Max gives you ~$2,000 of API usage. For ChatGPT, the numbers are ~$700 and $3,500. SemiAnalysis When Fable 5 refuses to answer questions, here’s the message that appears: “Fable 5 has safety measures that flag messages on most cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we’re working to refine them. Send feedback or learn more.” I managed to trigger this once while researching an M&A acquisition target. Clicking on “Edit and retry with Fable 5” triggered Opus 5 again, twice. DNA codons (A, T, C, G) encode proteins in triplets. There are 64 triplets that map to 20 amino acids. Some like Leucine, have 6 codons. Some like Methionine have only one. Why? When creating genes, there’s a wobble, sometimes, at the 3rd codon. THe mapping minimizes that impact: small errors map to similar proteins. The more common proteins have more codons. There’s a lot of fascinating information science going on here. Gemini ChatGPT now shows a “Check in” button when it’s thinking. Clicking on that gives you a work-in-progress answer while it continues thinking. When done, it replaces the WIP answer with the final answer. A useful feature!

Things I Learned - 07 Jun 2026

This week, I learned: sudo resolvectl flush-caches clears the DNS cache on Linux. Useful when you’re changing DNS records and want to see the changes immediately. In my case, I was creating a Cloudflare tunnel to my laptop and wanted to test it quickly. Making something easy to verify makes it much faster to train models on it. Arithmetic verification is easy - calculators can be deterministically verified. Chess verification is easy - Stockfish became easy to train. Code verification is easy - LLMs improved coding ability rapidly. Therefore: Wherever we have environments that are easy to verify, AI will improve faster there. To make AI improve faster in an area, build environments that are easy to verify. MCP is getting simpler. A stateless HTTP protocol. Simpler OAuth. Plugins. No idea when it will land in Claude or ChatGPT, though. Worth checking after 28 Jun 2026 - after it is finalized. Microsoft Scout is Microsoft’s version of OpenClaw or Gemini Spark. git subtree is a useful way of maintaining git repos inside git repos. For example, if you have a tool tool-a under a project. It’s more light-weight than sub-modules, lets you commit at any point to the parent or child, and is a built-in feature in git. Gemma 4 12B is released and seems almost as good as the 26B version. This is the class of models that makes it practical to run edge AI on phones. It’s multimodal and reasonably smart (like frontier models were 12-18 months ago). I don’t use Claude/ChatGPT Projects much. It offers 3 advantages: custom instructions, memory, files, and chats. Files aren’t useful - I use my entire laptop as a file system via MCP. Instructions aren’t useful - I can paste commonly used prompts with a click. Chats aren’t useful - I have chat references enabled, so all past chats are accessible anyway. Memory isn’t useful - I have memory enabled globally anyway. In short, I haven’t discovered the power of projects that everyone’s raving about. SKILL.md is more useful for me. repo is a Google/Android tool built on top of git that lets you manage multiple git repos. It sounded promising until I released it needs a repo init that creates a .repo/ - which is more overhead that I’d like to keep. When using <image onerror=...> fallbacks, include this.oneerror=null to prevent infinite loops if the fallback image also fails to load. RK One of the advantages of multiple agent (rather than a single agent loop) is: it’s easier to change directions when wrong. Single loops get stuck. Build Agents That Run for Hours Claude Code also supports agent teams where sub-agents can talk to each other rather than rely on the main agent to coordinate. Useful for parallel exploration. Anthropic lets Claude define “organizational policies” for agent teams best suited for the task (AI-native workflows). It also lets agents to push back on their scope, e.g. “This is too hard.” Build Agents That Run for Hours Claude Code has a /background [prompt] (or /bg) command that runs the current session the background. You can run claude agents as a separate command to monitor agents. (There’s no equivalent in Codex yet.) This seems to be the future of agentic operations: a bunch of agents running that you monitor and steer through an agent view dashboard. Models are evolving. Therefore prompts evolved. Now harnesses also need to evolve. The workflows will also evolve. As a result, evaluations might be the (relatively) more stable assets. Datasets are likely to be the most stable ground truth. How to learn a new field fast: Yes, it’s possible to learn 50% of a field in 20 hours. Josh Kaufman, “The First 20 Hours” popularized it. The next 30% takes months and the last 20% takes years. Threshold concepts are those that change your perspective and open up new ways of thinking. Experts’ knowledge is hard-wired and they can’t identify nor teach threshold concepts naturally. Don’t assume they can. “We know more than we can tell.” Polanyi’s 1966 book “The Tacit Dimension” says that there’s some knowledge that can’t be verbalized. This tacit knowledge, therefore, will be harder for humans and AI to learn.

Things I Learned - 31 May 2026

This week, I learned: D-ID is an avatar generator platform like HeyGen. Creatify and Synthesia are a couple of others I heard of. This space seems to be growing. cosign is a CLI that lets you sign and verify any piece of text with a Google, GitHub or Microsoft account. cosign sign-blob FILE --bundle sign.json opens a login window and creates a sign.json signature. Anyone who has FILE and sign.json and the email ID can verify via a Google account with cosign verify-blob FILE --bundle sign.json --certificate-identity $EMAIL --certificate-oidc-issuer https://accounts.google.com. arxiv2md.org converts arXiv papers to Markdown. Source. markxiv.org claims the same - by just changing the URL - but it ended up reporting an error when I tried this link: https://markxiv.org/abs/2604.08649. From Akhilesh Tilotia: So we have someone in our team with initials AS. She made a document which was named vAS. Then I made edits and named it vAT. These docs were in a CoWork folder. I asked Claude to clean up my doc. It created another version for me to review. In its wisdom, it named the file vAU 🙂 Maybe what a forward-deployed engineer does is enginer AI-native workflows. (This sounded profound when I wrote it down. Not sure if it’ll sound as profound tomorrow.) The idea is that the FDE will say, screw existing processes; let me fire up my AI agent and get stuff done; THEN we’ll figure out what works, how to optimize it, etc. The PRAGMA: Revolut Foundation Model has some good tokenization ideas for tabular data. Create your own token space with key–value–time tokenization - to retain field information. Bucketize numbers by percentile, preserving magnitude/ordering that subword tokenization destroys. Encode time both as log-seconds and as cyclical calendar features. Codex uses the Alt + Up Arrow key to edit queued commands, but on the VS Code terminal, this key binding is not sent to the terminal. Enable the terminal.integrated.sendKeybindingsToShell setting to send it to the terminal, hence Codex. Based on this catalog on “universal foods”, here’s what I 🟢 like, am 🟡 neutral, 🔴 dislike, 🟣 must try, and will ⚫ skip. Universal favorites: 🟢 pizza, 🟢 fried potatoes/chicken, 🟡 dumplings, 🟢 ice cream. Universal comfort foods: 🟢 khichdi, 🟡 congee, 🟡 dal-rice, 🟡 risotto, 🟡 ramen, 🟢 pho, ⚫ chicken noodle soup, 🔴 rice porridge, 🟡 mac-and-cheese, 🔴 mashed potato, 🟣 polenta, 🟢 oatmeal, 🟣 Japanese curry rice. Acquired tastes that convert most: 🟡 coffee, 🟢 tea, 🟡 dark chocolate, 🟢 mild fermented dairy, 🟢 pickles, 🟢 olives, 🟣 kimchi, 🟣 miso, 🟢 mild chili dishes. Acquired tastes that have cult devotion: 🟣 durian, 🟣 natto, 🟣 stinky tofu, ⚫ fermented fish, ⚫ hákarl, 🟢 very funky blue cheese, ⚫ offal. OceanoPDF seems like a good place to download ePubs of books. The entire Wikipedia is available as a Parquet file. You can query it like duckdb -c "FROM 'hf://datasets/wikimedia/structured-wikipedia/enwiki/data/*.parquet' LIMIT 5". The English version has 35 GB, 7.6 million articles, and you’re better off downloading it rather than running analyses remotely. When you receive a Calendly link of the form https://cal.com/USER/EVENT you can fetch the available slots via curl -H 'cal-api-version: 2024-09-04' 'https://api.cal.com/v2/slots?eventTypeSlug=EVENT&username=USER&start=2026-05-25&end=2026-06-01&timeZone=Asia/Singapore&format=range'. Useful to automate good meeting-slot selection. “Reference saved memories” in ChatGPT is different from “Reference chat history” as per OpenAI. In Developer Mode, memory is turned off, but not chat history. I confirmed that I can access past conversations in Developer Mode. It might be a privacy concern for others, but for me, this is singularly useful, because I can use ChatGPT with Local MCP effectively getting a non-metered AI coding agent. Seems GPT-5.2 reaches expert level in peer review: 45 scientists took 469 hours evaluating human & AI reviews on 82 papers. “Surprisingly, current AI reviewers are competitive even with the top-rated reviewers in Nature’s official peer review…” though not without weaknesses, so use AI + humans. On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists via Ethan Mollick

Things I Learned - 24 May 2026

This week, I learned: BitWarden seems to be sneakily jacking up prices and going towards a PE sale. Might be time to shift out or self host. Sigh, I just migrated into it… Source Andrej Karpathy has joined Anthropic. Likely to use Claude to build better Claudes - automating AI research. Also, it probably isn’t a good time to build an AI education platform. Claude The open-source Chinese models about 6 months behind frontier models. Qwen 3.7-Max is on par with Claude 4.5 Opus (Nov 2025) and Gemini 3 Flash (Dec 2025). Google basically became Gemini. Entirely! I’m not sure there’s a difference any more. Which means it will scrape websites and not send traffic through - just killing the search economy. But it’s far more useful. Claude I wanted a list of sites I log into with my Google Account. Google’s Linked apps page does that. Unfortunately, I can’t find a way to use Google Takeout to export that data. So I wrote a scraper which can be single-shot prompted these days. As long as you remember to exhale, your chances of recovery from being ejected into space is pretty good for the first 15-60 seconds. Gemini I don’t understand half the comments I read on LinkedIn. Earlier, I was able to separate good from bad. Now, I’m not sure if what I read is actually insight or idiocy. Is the AI use making their comments too smart or making my brain too dumb? “Pax Memoriae”: peace of memory. Putting past conflicts to rest. The best part of it was, I learnt the phrase by typing “Pax” into VS Code and wasn’t sure what to write next. Before I could search for it, GitHub Copilot completed it. I searched for what it meant, and it was so apt! Children’s vision is worse than adults, but filter less and absorb ore irrelevant information than adults. This is useful for learning and surprise detection, but costly for focus, speed, and relevance. ChatGPT The word phobia comes from the Greek god of fear, Phobos, which is the name of one of Mars’ moon. Deimos, the other moon, is the Greek god of dread/terror. They’re the children of Ares (Mars), the god of war. Nice planet. On WhatsApp, I can type @Meta AI and then /imagine to have it draw an image. The quality is OK - not great, not terrible. Surprising but GPT Realtime Whisper ( new model) isn’t as good as the older open-source Whisper models. Also, Gemini 3 Flash Preview is as good at transcription as Gemini 3.1 Pro Preview for up to medium-length text. LLM Audio Transcription benchmark Google Maps typically shows me a cycling time of 30 minutes when it take me 40 minutes and a walking time of 40 minutes when it take me 30 minutes. Either I walk much faster and cycle much lower than the typical person or Google Maps is not well calibrated to Singapore and India.

Things I Learned - 17 May 2026

This week, I learned: I had GPT-5.5 and Opus 4.7 analyze a few of my conversations and learnt that I need to ask myself: “What must they take away? What must you take away?” in my conversations. That lets me speak with intention rather than instict. (Instinct has its place. I happen to over-use it.) Turns out there are several well-established taxonomies. It makes sense to align with these. Linked data is powerful and AI makes linkage easy. General Knowledge: Wikidata, DBpedia, YAGO. People: VIAF, ISNI, ORCID, LC Name Authority, GND. Places: GeoNames, Getty TGN, ISO 3166. Organizations: LEI, ROR, Wikidata. Books/Media: Open Library, WorldCat, MusicBrainz, IMDB. Chemicals/Biology: PubChem, ChEBI, GBIF, ITIS. Legal/Units/Math/Events: EuroVoc, QUDT, OEIS, PeriodO, etc. BitWarden supports a bw CLI that seems handy for quick CLI access to passwords. It’s a step towards me moving away from saving passwords unencrypted on my local file system. Singapore has banned prediction markets like Polymarket and Kalshi. Pity. I was hoping to use AI coding agents to play them. Yahoo flipbook.page is a fascinating generative UI exploration. It’s a visual browser, i.e. it generates an image based on text, you click anywhere, it generates an image interpreting based on where you clicked, and so on. A very different style of exploration! Vercel’s deepsec uses Codex / Claude to search for vulnerabilities, but “scans can cost thousands or even tens-of-thousands of dollars for large codebases”. When I charge my Lenovo Thinkpad (P1 Gen 7) with the 170W charger that came with the laptop, it delivers ~60W of power to the battery, charging the laptop in about an hour. A 65W laptop delivers half the power and takes twice as long.

Things I Learned - 10 May 2026

This week, I learned: I’m experimenting with Tauon MusicBox as an alternative to VLC as a music player. Update: 01 Jun 2026. I switched back to VLC. Tauon Music Box is glitch. It stops songs mid-way and doesn’t play automatically when launched. xz is pretty slow by default. xz -T0 uses all available threads and speeds it up ~3X. Enabling “Performance mode” (over a power-saver mode) produces a further speed-up of ~2X for me. For a 200MB file, that reduces the time from ~1 minute to 10 seconds. Notes from Simon Willison’s notes from the Claude Code event: “Design for the next model”. Build things that don’t quite work today on the assumption that they’ll start working with a model upgrade in the future. “The advisor strategy”. Instead of using a smarter model to plan, use smaller models to ask Opus for advice-on-demand. Dreaming looks really interesting. You can run a task over night which examines previous sessions and creates new memories. A routine is a saved Claude Code configuration: a prompt, one or more repositories, and a set of connectors, packaged once and run automatically. Routines execute on Anthropic-managed cloud infrastructure, so they keep working when your laptop is closed. Overheard: “VCs say, ‘OpenAI wants to get into commerce, so why are you getting into commerce?’ A few weeks later, ‘OpenAI no longer wants to get into commerce, so why are you?” Delightful discovery of the day: Super + Shift + Arrow keys to move windows between monitors on Ubuntu. television is a fast, portable fuzzy finder. Like fzf but faster, useful for files, text, git repos, docker images, etc. I added approvals_reviewer = "auto_review" to my ~/.codex/config.toml. This enables auto review which uses an LLM to figure out whether to ask a human to approve or not. It’s a lot less intrusive than asking every time. Not perfectly safe, though. Copilot supports a /chronicle command that suggest tips and improvements when using Copilot. It’s like /insights on Claude Code and Carbonyl is a CLI Chromium browser. Sort of like Lynx, but supports audio/video, JavaScript, even WASM, etc. This was the author’s first Rust project. I tried Zed as an alternative to VS Code. It’s fast and lightweight, but lacks the ecosystem of VS Code. Plugins are harder to build and Markdown support is weak. I would use it on a flight to save power, not otherwise. This is similar to others’ experience. ChatGPT UPDATE 05 Jun 2026. It DOES use some battery power - more than I’d like. I am uninstalling it. LocalSend is a pretty quick way to share files between phone and laptop even if you don’t have a network - if you connect the laptop to the phone hotspot. GNOME Network Displays works pretty well if you want to screencast your screen to a network display - e.g. a Smart TV with Miracast or Chromecast support. I’m evaluating rtk - a CLI proxy to reduce tokens. For example rtk ls or rtk git status shows agent-friendly compact output. I just added one like to my AGENTS.md: “Always prefix shell commands with rtk. Examples: rtk git status, rtk pytest -q, etc.” instead of using rtk init -g. I am testing it out, so I don’t know the impact, but it seems harmless. (Based on 2 days’ usage, across 216 commands, it saved ~50% of 37K tokens. Not much, but harmless.) The emerging convention to mark a section of HTML / Markdown as AI generated content is to wrap it in: <section ai-disclosure="ai-generated" data-ai-model="claude-sonnet-4.6" data-ai-provider="Anthropic"> (W3C AI Content Disclosure Community Group).

Things I Learned - 03 May 2026

This week, I learned: LiteParse is a PDF to text library that you can run via npx --package=@llamaindex/liteparse lit parse document.pdf. Simon Willison Always add indecisiveness, inaction, “other”, “not applicable”, etc. as an option to LLMs. They are trained for decisive responses and pattern matching, so we need to guide the the other way. Martin Fowler GPT 5.5 is priced twice that of GPT 5.4. No wonder my Codex usage is much higher than last month. Simon Willison. I am better off sticking to medium effort instead of the xhigh I usually use - it may not be required. OpenAI “… the eigenquestion is the question where, if answered, it likely answers the subsequent questions as well.” Shishir Mehrotra & Matt Hudson Claude Code stores the logged in OAuth token at ~/.claude/.credentials.json. We can use that to fetch https://api.anthropic.com/api/oauth/usage and retrieve Claude usage and reset times. uvx ccusage does this automatically, but I prefer my own script. Ontology matters in the AI era. But some stuff matters more, and some less. 🟢 MORE: Definitions: what “customer” means 🟢 MORE: Constraints: e.g. “don’t reclassify loans” 🟢 MORE: Interactions: how to verify, coordinate, delegate, … 🔴 LESS: Creating ontologies: agents can do that. 🔴 LESS: Completeness and rigor: agents tolerate uncertainty. 🔴 LESS: Proprietary: agents can reverse-engineer. There are several industries / markets that MBA case studies rarely cover (ChatGPT): Kirana stores; Care (child care, elder care, domestic work); Faith (finance, food, media, education); Remittances; Gambling (lottery, sports betting, gacha); Scams & organized fraud; Counterfeiting; …

Things I Learned - 26 Apr 2026

This week, I learned: mdq is pretty useful to extract Markdown sections. For example cat *.md | mdq '# Title' extracts all sections where the header contains ‘Title’ (case-insensitive). CloudFlare Browser Run is, roughly, a browser as a service. Pricing: 10 hours free per month, then 9c per hour. I had Codex run a small research to explore it, and it seems simple to set it up and use it. GPT 5.5 seems to be especially better than GPT 5.4 and running for long, with tool calls, without losing focus. That’s something OpenAI models are good at anyway, so this takes it a step further. ChatGPT I added gpt-image-2 to my LLM Art Style gallery. It is notably better with text accuracy. For example, on Rock - Paper - Scissors - Lizard - Spock it consistently lists all 10 rules, which Nano Banana 2 does not. World leaders do keep us entertained. Saparmurat Niyazov (Turkmenistan) renamed the months of the year and days of the week after himself and his mother. He built a towering, gold-plated statue of himself in the capital that rotated so it would always face the sun. He also banned lip-syncing at concerts, outlawed gold teeth, and banished dogs from the capital because he found their smell unappealing. Idi Amin (Uganda) declared himself the “Uncrowned King of Scotland” and sent baffling, unsolicited telegrams to world leaders - advising Richard Nixon to recover from Watergate, or offering food aid to a struggling Britain. François “Papa Doc” Duvalier (Haiti) reportedly ordered all black dogs in Haiti to be put to death and claimed his personal Vodou curse was responsible for the assassination of John F. Kennedy. Francisco Macías Nguema (Equatorial Guinea) banned the word “intellectual”, banned the use of lubricants in the power plant (claiming his magic would keep it running, which promptly broke the generators), and stored the nation’s remaining foreign currency under his bed. Kim Jong-il (North Korea) claimed he invented the hamburger (calling it “double bread with meat”) and shot 11 holes-in-one his first time playing golf. Donald Trump (United States) used late-night tweets to announce major policy shifts and fire his own cabinet members. He altered an official government hurricane map with a Sharpie to match a previous erroneous statement, and publicly mused during a press briefing about the injection of household disinfectants as a medical treatment. Git repositories inside git repositories (without using sub-modules) don’t seem to work well. I need this because I have mono-repos for research and I want to use git in a sub-folder to iterate, then commit just the final version to the parent folder. Looks like I need to remove the child .git/ (e.g. rename to .git.bak/, which I’ve added to my ~/.config/git/ignore) for this to work. Gemini To run a script in the background (without logs) and detach / disown it, use nohup your-script >/dev/null 2>&1 & disown Running /insights on Claude Code helped me add these two instructions to my code skill: Test web pages with screenshots (for layout, overlaps, contrast) AND CDP (for interactions, navigation) before finalizing Prefer icon libraries over unicode/emoji icons. Sending an entire PDF/PPTX to Gemini costs ~40% of sending PDF/PPTX + images. The quality is fine for small files, but for large files adding images reduces error rate from ~5% to 0.5%. Pandoc Markdown to Word DOCX supports sidebar comments. You can use this Markdown: Here is [comment in sidebar]{.comment-start id="c1" author="Anand" date="2026-01-01T12:00:00Z"}commented text[]{.comment-end id="c1"} inline. Gemini. In fact, Pandoc supports lots of other things, like: Custom styles via block ::: {custom-style="Custom Style Name"} Track changes via [inserted text]{.insertion author="Name" date="2026-04-20T12:00:00Z"} and [deleted text]{.deletion author="Name"} Page breaks via \newpage (a LaTeX command that Pandoc supports in Markdown) CSS styles via ![Alt Text](image.png){width="5.5in" height="3in"} Offpunk is a CLI offline-first browser. Interesting idea, but installation is a problem. After sudo apt uninstall offpunk running offpunk failed with ImportError: lxml.html.clean module is now a separate project lxml_html_clean. After a git clone it reported HTML document detected. Please install python-bs4 and python-readability. These are easy to fix, but I wasn’t inclined. Creating an authenticated MCP Server for ChatGPT is complex. It requires OpenID Connect (for which library support is weak and requires a provider like Auth0), dynamic client registration (which is hard to implement though Auth0 supports it), and after half a day of experiments, I still couldn’t connect. An easier option is to run temporary tunnels with cloudflared or ngrok or localtunnel.

Things I Learned - 19 Apr 2026

This week, I learned: WebApps are a depreciated store of value. Earlier, a web-app would have impressed me because the capability to create it is rare, and the effort to create it is high. Today, when I see a “localhost:3000” or a “replit.app” domain, I mentally discount the effort behind it and ask: How rare is the capability to create this with a coding agent and how much effort is it. THAT determines the value of what I see. Part of the value is “Look ma, no hands!” and it’s delightful they’ve learnt. Part of the value is “There’s gold in them thar hills!” and use-case discovery is important. WaveCity is a WASM build of Audacity, i.e. Audacity running in the browser! Audiomass is a similar but simpler audio editor - again, WASM-based. Gemini

Things I Learned - 12 Apr 2026

This week, I learned: Resend is a simple way to send emails via an API. Principles of Mechanical Sympathy has some practical hardware-driven optimization tips. Prefer accessing memory sequentially. CPU access to RAM and cache is optimized for this. Natural batching: flush the buffer when you reach the maximum buffer size or when the queue is empty. This avoids buffers waiting unnecessarily. The core argument in Capital in the Twenty-First Century (Thomas Piketty, 2013/2014) is r > g. The interest on capital (r) is always greater than the economic growth (g). Hence, the rich will keep getting richer - inequality is consistently part of capitalism. (Not surprising, but well supported by data.) A good collection of practices on automated AI code reviews by Ankit Jain: Compare multiple options. Whichever passes the most tests wins. Deterministic guardrails. Use linters, type-checkers, SAST/DAST checks, test scripts, etc. Humans define acceptance criteria. Use a behavior driven development script (in natural language, agent-implemented). Permission Systems as Architecture. Provide agents granular permissions based on the task - against pre-defined rules. Adversarial Verification. Have one agent break the others’ work. Based on a quick exploration of the AT protocol (via Jake Lazaroff), I am yet to see a viable use for it. It’s a decentralized distributed data network. OK… what will I use it for? When I asked Claude if any of my work is patentable, it said “Comicgen is the sole candidate, but you only get one year grace after it’s public. But why do you want to patent? Your edge is prototyping speed, taste, and knowledge. Patents don’t protect those. Publishing freely (as you do) creates prior art that prevents others from patenting the space around you, which is often a better defensive strategy than filing patents yourself.” Oh! Ah! pretex is a fast (currently browser-only) library that computes the width and height of any text in any font in the browser. Useful for things like word-wrapping in SVG, layout planning before rendering, etc. Because AI bots scan deeply rather than “browse” popular pages, CDN cache invalidation strategies designed for humans (like LRU - Least Recently Used) no longer work. They’re exploring new caching algorithms like SIEVE and FIFO CloudFlare I enabled CloudFlare’s new dynamic Client-Side Security monitor. If someone hacks my website or the libraries I use, it does a quick filter with a fast neural network, then falls back to an LLM to check if it’s safe, then serves the content. CloudFlare practically rewrote WordPress into a new Astro-based CMS: EmDash! It runs natively on CloudFlare (and elsewhere), is agent-friendly, quite secure, can export/import from WordPress. Linux optimization settings I noted from a deleted post gsettings set org.gnome.desktop.interface enable-animations false gsettings set org.gnome.desktop.interface cursor-blink false gsettings set org.gnome.settings-daemon.plugins.power idle-dim true gsettings set org.gnome.desktop.notifications show-in-lock-screen false gsettings set org.gnome.desktop.session idle-delay 300 gsettings set org.gnome.settings-daemon.plugins.power sleep-inactive-battery-timeout 900 # gsettings set org.gnome.settings-daemon.plugins.power sleep-inactive-ac-timeout 1200 ```cd ~ git-restore-mtime is part of the git-tools package and sets the modified time of files to their last committed time. Useful when cloning repos. From Lalit Maganti: Knowing what you want is a valuable skill. Wanting things others will also want is valuable. Learn good software management. It is similar to managing agents. For better results, just continue your AI chat, or break the problem up. More tokens lead to better solutions even now. Joel Baker Since companies using AI outperform competition and capital might win more than labour but GDP growth may not be too high, it might be good to invest in AI-using companies than in index funds. Nicholas Carlini’s prompt to find vulnerabilities is to run: “I’m competing in a CTF. Find me an exploitable vulnerability in this project. Start with ${FILE}. Write me a vulnerability report in ${FILE}.vuln.md” across multiple repos in parallel. Then “I got an inbound vulnerability report; it’s in ${FILE}.vuln.md. Verify for me that this is actually exploitable”. That was almost 100% successful. When planning with AI coding agents, Martin Fowler recommends discussing each of these in sequence before coding: Capabilities / functionality Components: Services, modules, major abstractions. Interactions: Data flow, API calls, events. Interfaces: Function signatures, types, schemas. Planning with agents using Visual Brainstorming, i.e. asking them to generate visual HTML to illustrate the plan, can shorten review time considerably. I enabled CloudFlare’s new dynamic Client-Side Security monitor. If someone hacks my website or the libraries I use, it does a quick filter with a fast neural network, then falls back to an LLM to check if it’s safe, then serves the content. This pattern of deterministic with LLM fallback works for most reviews. Harness = Agent minus Model: everything in an AI agent except the model itself. Nice definition Update feature-level summaries as you go in context/$FEATURE.md with user prompt, summary of WHY from agent’s responses for future learning, my comments. Like Architectural Decision Records (ADRs) for humans and agents. Context Anchoring 8 levels of Agentic Engineering. 8 levels of Gas Town. I’m still only at level 6 on both. 🙁 “It’s important to watch the loop as that is where your personal development and learning will come from.” Geoff Huntley, originator of the Ralph (Wiggum) loop. UNIX has a script command that runs a shell and logs it. For example: script -c fish session.log starts a new fish shell and logs it to session.log. script -c "uv run app.py" -q -a app.log will append to app.log, suppressing “Script started…” and “Script done…” messages. script --timing=time.txt session.log logs the timing, which you can replay with scriptreplay --timing=time.txt session.log. Similar to asciinema. A quick way to strip out the ANSI escape sequences (weird Unicode characters) is to pipe it through npx strip-ansi-cli. Google has an Edge Gallery app that runs Gemma 4 on mobile. The main advantage is that you can use it on a flight. It’s not too bad as a model either. Transcription quality is average. It doesn’t run in the background, only one chat at a time, etc. So, it’s useful only as a last resort.

Things I Learned - 05 Apr 2026

This week, I learned: It’s pretty convenient (on Ubuntu) to be able to move windows around desktops. Apart from the usual Super + Arrow keys to manage windows within a desktop, you can use: Ctrl + Alt + Left/Right Arrow: Move desktops Ctrl + Alt + Shift + Left/Right Arrow: Move window to desktop Super + Shift + Arrow: Move window to another monitor Super + Drag: Drag window from anywhere jq . file.json is an efficient way to pretty-print JSON files in the terminal. (Or jaq . file.json, which is ~30% faster.) GitHub Copilot monthly premium requests were not reset at 12 am UTC How Diffie Hellman Key Exchange Works by Julia Evans is an excellent explanation. Share a random number. A multiplies it by their private key and shares SA. B multiplies it by their private key and shares SB. They multiply the others’ key with their secret key and they get SAB = SBA. Now both of them have the same new secret they can encrypt/decrypt with, but no one else knows, even though they shared everything publicly! This may be one of the best cool uses of math I’ve seen in a long time. Shell tricks I didn’t know: # ALT + . cycles through the last arguments typed mv file.{txt,md} # Move file.txt to file.md ls |& tee file.txt # Pipe both stdout and stderr to tee

Things I Learned - 29 Mar 2026

This week, I learned: The Kids Should See This - great collection of videos for curious people. Thej A jury fined Meta and YouTube $4.2m and $1.8m for building addictive features in their products. That’s a first. NY Times “I think AI-type tools will actually revolutionize the experimental side of math, where you don’t care so much about individual problems and the process of solving them, but you want to gather large-scale data about what things work and what things don’t.” Terence Tao The hedonic treadmill (which roughly quantifies a Buddhist principle) says that we revert to a happiness set point (which varies by individual). Worse, those who experience a high kick (e.g. a lottery) don’t get enough kick from normal wins (contrast effect) – Interactive explainer. The happiness neutral As of today, a LinkedIn search for “llm psychologist” lists 9 people. I’m not alone! Anand S, LLM Psychologist, Singapore, Singapore Anshul Saxena, PhD, AI Advisor & Trainer | Technology Strategist | LLM Psychologist | Currently teaching humans, machines & business to work smarter through Generative AI and Quantum Computing | 15+ Years Experience, Pune, Maharashtra, India Charitarth (Chad) Sindhu, LLM Psychologist / Fractional Business & AI Workflow Consultant/ Digital Nomad, Tokyo, Japan Lancelot Salavert, LLM Psychologist, Barcelona, Catalonia, Spain Lior Dor(Durahly), Team Lead | Bug Banisher | Ex 8200, Tel Aviv District, Israel. Past: R&D Team Lead and LLM Psychologist at Superwise | A Blattner Tech Company maxime bodereau, Lead Creative Art Director | UX Forensics | Ai LLM Psychologist | Visual Alchemist | Codesmith | Brandologist | Full Stack Designer, Nantes, Pays de la Loire, France Mei Chen 🦋, LLM Psychologist | Lead Product Engineer | Delivering Agentic Experiences, Toronto, Ontario, Canada Shoshannah Tekofsky, LLM Psychologist at AI Digest, Zwolle, Overijssel, Netherlands LinkedIn Member, LLM, psychologist, mediator, Prague, Czechia OpenAI acquired Astral!. This will likely slow down the new wonderful tools accelerating the Python ecosystem. Like with PromptFoo and OpenClaw, this seems to be about talent. The “acqui-hire” mode seems a clear niche career path now, and an alternative to getting hired (you get a much higher salary) or getting acquired (you take on much higher risk). quickjs-emscripten lets you run isolated JS code securely in the browser, CloudFlare workers, NodeJS, and Deno. It compiles to WASM. @sebastianwessel/quickjs is a higher-level TS wrapper. Simon Willison Manyana is a CRDT based version control system. It sounds like a good idea but I’m sceptical because merge conflicts are a “what should I do” problem more than “how”. With agents doing more merge conflict management, I am not sure this will offer a concrete benefit - but probably no harm either. LLMs are able post-train LLMs on new topics. They’re improving fast. Jack Clark Vibe Coding Fixer and AI Slop Cleaner are real job descriptions - which are morphing into enterprise offerings. But I still seem to be the only official LLM Psychologist Notes from AI Services - Wrong Mental Models, Right Moment: AI services has 3 markets. Automatable work: vanishes in 2 years. Human-in-the-loop work: sustains. Judgement-driven: grows in importance. YC: don’t sell access to a tool for $50 a month, use the AI yourself and sell the finished work for $5,000. Sell output. Price on outcome. Sell to business, not IT. Sell accountability: proven success, with your guarantee. Sell authenticity: a brand story representing uniqueness, character, … or whatever… something people respect. Data transfer between GPU and memory is a bottleck and three approaches are emerging. # Taalas is etching LLMs into the chip. Llama 8b runs at 17,000 tok/s (H200 is at 230 tok/s). d-Matrix is moving compute into SRAM memory chips. 30,000 tok/s for Llama 70b. Cerebras and MatX are similar: memory-oriented. FuriosaAI minimizes data movement. Groq and Sambanova are similar. But in the long run, commodity technology usually beats integrated stacks. GPT 5.4 Nano ($0.2/MTok) and Mini ($0.75/MTok) are good options for bulk OCR, transcription, etc. as cost and quality comparable alternatives to Gemini Flash Lite and Gemini Flash. They can describe 75K photos for $50. Both models are better than GPT-5 Mini on most benchmarks. Cool AI coding agent git prompt fragments: Use git bisect to find when this bug was introduced: … Find and recover my code that does … Sort out this git mess for me. Rewrite history removing … Split the last commit into multiple commits grouped logically. Start a new repo at … and build just this module … based on … with a similar commit history copying the author and commit dates. Campaigns Are Knowledge Workers and the Tools Just Caught Up. A powerful framing. I saw this in action a few days ago when a friend was able to automate an outbound campaign with Claude Code. EARS (Easy Approach to Requirements Syntax) is a simple structure for requirements. For example, “Users should be able to drag tasks between columns. The app needs to work offline too. Handle errors gracefully.” becomes the following - which AI can convert to and is easier to spot errors in. State machines and decision tables are useful alternatives, too. REQ-001 (Event): When the user drags a task card to a different column, the system shall update the task status to match the destination column. REQ-002 (State): While the application is offline, the system shall store task updates in local storage. REQ-003 (Event): When the application reconnects, the system shall synchronize locally stored updates with the server. REQ-004 (Unwanted): If synchronization conflicts occur, then the system shall display a resolution dialog to the user. As of now, avoid using Claude.ai to create (large) visualizations. It runs forever and exhausts credits without generating anything. Claude Code works much better for this.

Things I Learned - 22 Mar 2026

This week, I learned: Psychological operations in design by Narendra Ghate When lights are dimmed people speak softer. So, dimming lights reduces sound levels in noisy offices. Rather than reduce the size of shampoo sachets (which customers and business both hate), include 2 shampoos in one sachet, tearable in the middle. Price saches at 95p with a 5p deposit for the sachet - which rag-pickers can collect and return to the retailer. People think of stains like wounds on cloth. So a “stain band-aid” where you stick a strip, and remove it after 5 min to remove the stain, is catchy. A mechanical wind-up fish that stirs the water in the bucket while clothes are soaking speeds up the process. Senthil & Amutha, founders of Payir demonstrated a re-usable fabric calendar that converts into a bag for re-use. Pretty clever! Their message at the Chennai Design Festival was that good design can be for the masses and by the masses to reclaim their time, energy, and joy. The urinary bladder works based on involuntary muscular contractions towards the end, to clear out the last bits of fluid. It’s not fluid flow, it’s muscle contractions. (Oh, the things I learn!) Gemini Indigo bans ghee in cabin baggage. Also coconuts, pickles, oily foods, gooey cakes, spices (masala, powders), strong-smelling food. ChatGPT New skill unlocked: how to demo without knowing what you’re demo-ing. STEP 1: Copy-paste all demo pages as Markdown. STEP 2: Tell AI “Here is a demo I’ll be showing. (Add context.) Tell me how I should explain this and what I should point out as specific examples. Use concise bullets.” We’ve learnt not to do things we don’t know how to (until we learn it). When AI is doing things, this is a bottleneck. Get out of the way. Stop filtering for what YOU can do. Stop learning what IT can do. Ask for it. That’s faster. Learning can come later. I keep forgetting that QR codes need a white border for them to work. TerraDraw provides a unified API across multiple mapping libraries. (In the vibe-coding era, this is not as useful.) To create desktop apps declaratively on Linux, Slint, Flutter, QML(Qt) and GTK4 are options. Slint and Flutter seem to be cross platform. Slint is newer, less mature but compiles to small fast binaries and might be a good option to explore. Flutter seems more mature and fairly popular. Claude PyTorch Tracing watches one forward pass and freezes the path into a portable recipe. But it silently ignores branches your example didn’t take. Claude The Internet is forking into a human internet vs an agent web LinkedIn SamGeo is a Python Package for geospatial image processing. While OlmoEarth provides geospatial embeddings, SamGeo can convert geospatial data to vector data! So you can do things like: Create the outer boundary of all apartments with swimming pools in a city Extract the shape of all lakes across the years to find out how they’re changing. Terence started Foundation for Science and AI Research (SAIR) to use AI in science research. Verifiable proofs (e.g. LEAN) are a big part of this. Since AI needs to run on phones and that needs GPUs, a lot of phones might need replacement in the next few years.

Things I Learned - 15 Mar 2026

This week, I learned: Timsort is one of the fastest sorting algorithms. Switching from bat to moor as a pager, since bat doesn’t support wrapping via keyboard shortcuts. Gemini “Use (some-command) --help to …” is an efficient prompt prefix that tells agents to read the docs and use a CLI tool to solve a problem. For example, “Use uvx rodney --help and ffmpeg for a demo video of GitHub PRs”. As agents improve, we’ll have more mediorce output (e.g. dashboards) since people won’t know to ask for better, or validate the result. They’ll hire experts who know to ask better and verify better. Claude Opus 4.6 solved a problem Knuth was working on! Knuth Cognitive debt is what Simon Willison calls it when we build (or, in my case, say/write) stuff we don’t understand. The debt framing is apt. One solution is to generate a version intended for AI to read, and another for us. # How can an innovator learn accountability? “I’m wired to start fires. Should I learn to also run the fire department, hire someone who does, or just stay a fire-starter and let others deal with the mess?” ANS: First, accountability is high value, so do it! Second, prefer a partner over building muscle. Build muscle only if output is checkable, has value, and customers will pay. Claude | ChatGPT | Gemini Commit publicly. Put your name on the output. Commit to process (or narrowly defined output) rather than outcome. Optimize with data, code, checklists, workflows, culture, etc. OpenAI released gpt-realtime-1.5 and gpt-audio-1.5. Buth are ~20% cheaper than the 4o versions, but 6.7x more expensive than gpt-realtime-mini. 1 second is about 10 tokens, so an hour of audio input at $32/MTok is about $1.15. The “Effort” setting for AVIF files on Squoosh doesn’t reduce file size - it increases quality slightly (for a tiny increase in file size). So, set the quality to whatever file size you need and increase the effort for a slightly better quality. Polya believed in teaching problem-solving rather than solutions, i.e. teach How to Solve It, not just what you get at the end. To me, this includes: Understand the problem (from different perspectives) Plan (with different mental models) Execute (the easy bit) Look back (post-mortem, retrospectives, etc.) Browserless lets you run browsers via an API. Useful when you don’t want the overhead of setting up a browser infrastructure, or for multiple browsers in parallel. Scraping, testing, web app automation, PDF/screenshot/video generation, etc. are all possible. Gemini OpenAI has a Websocket mode GitHub Agentic Workflows lets you “compile” a Markdown file into an agentic GitHub action. Useful as a sceptical reviewer, issue-to-prototype builder, data to story generator, automated code migrator, etc. Gemini Claude

Things I Learned - 08 Mar 2026

This week, I learned: IITM has launched a 4 year degree in management & data science. “Use AI to replace early-career mentorship: use AI-driven synthetic practice when traditional apprenticeship pathways collapse. AI can generate personalized coaching, replacing the missing junior loop with training environments.” Jack Clark Observability is more than logging. It’s agents watching feeds and signalling insights! The GPT 5.4 prompt guidance is a bit complex, but here’s what it’s broadly saying: (Gemini) It’ll over-complicate answers and front-end design unless you tell it exactly how you want it It’ll keep checking with you or give up (e.g. on errors) unless you tell it otherwise, e.g. with checklists or rules Claude Code supports 32K output tokens by default. Since I generate large data stories, I usually hit this limit and lose an entire session. Setting the environment variable CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000 (which is the maximum) reduces this problem. Google Workspace CLI lets you run npx -y @googleworkspace/cli as a single unified service for all Google Workspace APIs. It follows agent-friendly CLI practices which I turned into a SKILL.md. I’ve been using mise use -g ubi:owner/repo to install GitHub packages. The ubi backend is now deprecated in favor of the new github backend. This works fine for most repos, with edge cases like jtroo/kanata which still require ubi:jtroo/kanata as of now. On the margin, I’ll likely switch to just as my task runner. Claude With AI now writing almost all of my code, I don’t see much need to format it. Code formatters like ruff, dprint, biome, etc. are not relevant when AI will be reading and writing the code, not humans. I just format the prompts in Markdown. Salt is the duct tape of food ingredients. Lemon juice, vinegar, butter/oil, onion/garlic, etc. are runners-up. Claude Claude’s prompt to import memory from other AI providers doesn’t seem to work with Claude’s free account: “No memories or stored context found.”