Things I Learned - 18 Jan 2026

This week, I learned: Vulture is a neat library that funds unused Python code. uvx vulture script.py works fairly well, out-of-box. This helps when cleaning up AI-edited scripts that often have left-over code or imports. One of the lightest alternatives to Google Analytics is GoatCounter. If you just want page views, referrers, browsers, OSes, countries, and devices, it’s great. It’s privacy-friendly (no cookies), open source, easy to self-host, free for small sites, and the data is exportable. The number of countries that allow visa-free entries to Indian passports is gently growing in Asia (Kazakhstan, Thailand, Sri Lanka, Malaysia, Iran, and Philippines). Lessons from performance books. Claude # # Summary: In early days, explore, sample. Then narrow based on interest & fit. Practice hard and persist. ⭐⭐⭐⭐ Range (David Epstein): In changing environments (rules shift, feedback is noisy/late), sample broadly, i.e. generalize. Specialization vs generalization Nobel laureates have more hobbies. Olympic athletes have less. Shift nurses have same hobbies as non-shift workers. Hobbies help expertise in some areas Rewarding ONLY what succeeds locks behavior, halts exploration. Vary / delay incentives. Reward AFTER figuring out what works. Reinforcement and rewards Maybe “orderly” people specialize and creative people generalize? So pick what aligns with personality? ⭐⭐⭐ Peak (Anders Ericsson & Robert Pool): Compounded practice at the edge of competence, with good immediate feedback, helps 14-26%. But talent (genetics, upbringing, brainpower) differentiates more the expert level. Slow, effortful practice (spaced recall, interleaving topics, self-testing) builds lasting knowledge - but looks inefficient and doesn’t help with exams. Learning and long-term retention “Easy” 10K hours don’t help. ⭐⭐ Grit (Angela Duckworth): predicts roughly the same as conscientiousness (18%). It predicts success in stable paths moderately (but brainpower, etc. matter too). But premature grit hurts. Quit if it helps. But environment can defeat grit. Lessons from attention economy books. Claude # # The attention economy is real. It is designed to capture our mind, and it is winning. Distractions hurt MUCH more than we think. Batching, focus time helps. Privilege helps. The rich have more control over these than the poor do. ⭐⭐⭐⭐ Deep Work (Cal Newport, 2016) and ⭐⭐⭐ Digital Minimalism (Cal Newport, 2019): control the tools. Focus time, digital detox, embrace boredom. This helps - when you can afford to. ⭐⭐⭐ Indistractable (Nir Eyal): control yourself. The problem is internal (also true), so build habits, since willpower depletes (hm… not really). ⭐⭐⭐ How to Do Nothing (Jenny Odell, 2019): reject. Embrace boredom as resistance. This helps - when you can afford to. ⭐⭐ Stolen Focus (Johann Hari, 2022): regulate & rebel. The problem is systemic and external (also true). Reclaim your interface. BTW: Goldfish have excellent attention spans and memory :-) Lessons from trauma books. Claude # # ⭐⭐⭐ The Body Keeps the Score (Bessel van der Kolk, 2014): trauma recall shuts down the speech area. Eye movement desensitization (EMDR) helps. So does CBT, despite what the book says. But does yoga (only a little) or neurofeedback (too little data)? ⭐⭐⭐ What Happened to You? (Bruce Perry & Oprah Winfrey, 2021): calming people down before talking. Strong connections help more than a therapist. ⭐⭐ The Myth of Normal (Gabor Maté, 2022): trauma causes cancer (no), autoimmunity (partly), ALS (?), etc. ⭐ It Didn’t Start with You (Mark Wolynn, 2016): maybe anxiety is epigenetic and heriditary? Unproven. Family Constellation Therapy is wrong ⭐⭐ My Grandmother’s Hands (Resmaa Menakem, 2017): maybe racism is a somatic (body) response to generational (epigenetic) trauma? Too little data ⭐⭐ No Bad Parts (Richard Schwartz): maybe we’re not one person but a collection of parts, and interviewing family systems (IFS) helps? Unclear ⭐⭐⭐ Maybe You Should Talk to Someone (Lori Gottlieb): our memory is unreliable and therapy is messy. Connection & compassion help Most of these are based on the contested Polyvagal Theory: the nervous system scans for danger before the mind can process it. But the specific claims of the theory are wrong and it makes no other falsifiable claims. The nervous system has hierarchical responses to threat. 🟢 Not unique to PVT Social connection regulates physiology. 🟢 Not unique to PVT Unconscious threat detection (neuroception). 🟡 Weak evidence Mamellian brain (ventral vagal system) is uniquely mammalian. 🔴 Lungfish have it Reptilian brain (dorsal vagal) “shutdown” causes dissociation. 🔴 No evidence RSA directly measures vagal tone. 🔴 Contested Reptiles are “asocial”. 🔴 Wrong Trauma causes body changes too. It’s not just the mind. Childhood trauma persists. Relationships (connection & compassion) help more than therapy What constitutes tax residency in India? For an Indian citizen, as I understand it (after 2 hours of research): If you were in India >= 182 days: Resident* Else, if you left India this year for employment: NRI. Else, if you are an Indian Citizen living abroad (visiting or not): If Indian Income <= ₹15 Lakhs: NRI. Else if you were in India >= 120 days AND >= 365 days in the last 4 years: RNOR. Else if you are not liable to tax in any other country: RNOR. Else, if you left India for non-employment (students, tourism) and were in India >= 60 days AND >= 365 days in the last 4 years: Resident* Else: NRI. If you ended up as a Resident* If you were NRI in 9 of the last 10 years OR in India <= 729 days in the last 7 years: RNOR Else: ROR (Resident & Ordinarily Resident). For all practical purposes, RNOR is like an NRI. You pay tax only on Indian income, not global income. It’s like a transition status for returning NRIs. AVIF compresses better than WebP and may be the “next big thing”. I will be switching for all future images. Squoosh remains my choice of compressor and Ezgif’s AVIF maker and GIF to AVIF are handy.

Things I Learned - 11 Jan 2026

This week, I learned: Software Heritage is a non-profit that archives software. You can submit any Git repo for archival. Over 400 million projects have been archived so far. Everything Bad Is Good For You by Steven Johnson (2005) argues that pop culture isn’t all bad. But it isn’t all good either, unlike the book’s claims. Claude Popular culture formats (e.g. video games, manga, soap operas, game shows) are steadily more cognitively demanding, complex. They provide a dopamine kick from problem-solving. These may have led to the Flynn Effect (rising IQs in 1990s-2000s). Or it may be due to nutrition, smaller families, education, etc. Action games correlate with visual-spatial skills. Strategy games correlate with memory, planning. But is it causation? It doesn’t always translate to real-world skills. Also, side effects are real and bad: screen-time, addiction, misinformation, etc. The purpose of a featured image in a blog post is to help readers decide whether to read it. Share the article’s output/focus (e.g. for data stories, products). Else a visual summary (e.g. sketchnote, comic capturing the essence). Else skip. Avoid stock photos. # NFLSavant.com has play-by-play data for NFL games. Ten of the least well known psychology / sociology research findings. ChatGPT Learning styles are a myth. People might prefer visual / audio / … learning but it doesn’t help learning. Mix learning modes. NotebookLM can help. Casual acquaintances help find new information or jobs much more than close friends, since they’re in different social circles. Nurture weak ties. Use a relationship architect. Tell a lie often enough and people mistake familiarity for truth. Fact-check habitually. The more you see / hear something the more you like it. (Exposure effect.) Expose to good things. When others mess up, we blame them. When we mess up, we blame the situation. (Attribution error.) Pause before judging. Sometimes, rewarding people makes them like doing it less. (Overjustification effect.) People who know less over-estimate their knowledge. (Dunning-Kruger effect.) Habitualize calibration via feedback and tests. People do worse when they’re afraid their failure will reflect on their stereotype. (Stereotype threat.) Practice emotional resets. Higher expectations lead to better performance. (Pygmalion effect.) Engineer positive expectations. Benevolent sexism (e.g. protective paternalism) can be harmful too. Scan for well-meaning bias. Liberalism => economic growth, peace and expanding rights. Also colonial violence, exclusions (women, slavery, …), and eroding community. It is vulnerable to authoritarianism (e.g. emergency powers, recessions). Since 2006, democracy has consecutively declined, reversing half the progress since WW2. But alternatives are unclear. Claude Notes from The Periodic Table by Primo Levi. Pure Zinc does not dissolve easily in sulphuric acid. An impurity like Copper Sulphate pulls electrons from Zinc and offers them to Hydrogen ions, speeding up the reaction. Impurities, foreign bodies, etc. have a purpose, too. Discomfort = Information. Overcoming discomfort = Capability. Capability = Freedom. Therefore: Seeking discomfort (carefully, purposefully) = Building freedom. Simple != Easy. Simple = Clear. Clear = Actionable. Indifference often feels like malice. ⭐ Analogies have limits. (The Map is not the Territory.) When using analogies, always explore where, when and why they will break. Pay close attention near where they break. ⭐ Knowledge vanishes with people unless written down. Write “Do X. Because of Y. Unless Z changes.” The last two are critical. I could NOT have read the book without a Randall Munroe re-styling. I cried anyway. “There’s about 300-400 that were corporate assets. One watched them all the time. These are people who in 15 years could be CEO. There’s something about them that caught your fancy when you were in a meeting… brilliant ideas that challenged your thinking… We called them “Corporate Assets” and tracked them, to make sure we game-planned them, give them the right assignments.” Indra Nooyi, The Knowledge Project The accesskey attribute works a bit like magic. Adding an accesskey="h" on a home page link, or an accesskey="t" on a theme toggle button automatically enables keyboard shortcuts Alt+H or Alt+T to activate them. (Varies by browser and OS, but hovering shows the shortcut!) Familiarity and recency feel like learning but they’re not. Instead: Take tests. Review (spaced repetition). Interleave learning. That’s what helps. Claude Make It Stick (Peter C. Brown, 2014) A Mind for Numbers (Barbara Oakley, 2014) Ultralearning (Scott Young, 2019) How to Take Smart Notes (Sönke Ahrens, 2017)

Things I Learned - 04 Jan 2026

This week, I learned: A bunch of new CLI tools I found via awesome-cli-apps that I’m likely to use. fselect 4,374 ⭐ Dec 2025 - Find files with SQL-like queries. mise x ubi:jhspetersson/fselect -- fselect 'path, name, size from . WHERE name = "*.md" AND size < 1000' git-standup 7,805 ⭐ Jul 2025 - Recall what you did on the last working day. npm install -g git-standup && git standup litecli - SQLite CLI with auto-complete and syntax highlighting. uvx litecli mycli - MySQL CLI with auto-complete and syntax highlighting. uvx mycli pgcli - Postgres CLI with auto-complete and syntax highlighting. uvx pgcli fkill-cli 6,966 ⭐ Nov 2025 - Simple cross-platform process killer. npx -y fkill-cli fkill :8000 mlt 1,709 ⭐ Jan 2026 - Command line video editing. sudo apt install mlt xxh 5,870 ⭐ Sep 2025 - Bring your favorite shell wherever you go through SSH. uvx --from xxh-xxh xxh user@host epr 1,356 ⭐ Feb 2023 - Command line ePub reader. npx -y --package epr-reader epr tunnelmole-client 1,759 ⭐ Jun 2025 – ngrok alternative. npx -y tunnelmole 8000 localtunnel 21,822 ⭐ Aug 2025 – ngrok alternative. npx -y localtunnel --port 8000 svg-term-cli 4,168 ⭐ May 2024 - Record and replay terminal sessions as SVG animations. npx -y --package svg-term-cli svg-term pageres-cli 1,732 ⭐ Sep 2025 - Capture website screenshots. npx -y pageres-cli example.com 1366x768 gita 1,816 ⭐ Nov 2025 - Manage multiple git repos side by side. editly 5,259 ⭐ May 2025 - Declarative video editing. np 7,661 ⭐ Nov 2025 - A better npm publish. ffscreencast 1,816 ⭐ Jul 2024 - A ffmpeg screencast with video overlay and multi monitor support. beets 14,504 ⭐ Jan 2026 - Music library manager and tagger. uvx --python 3.12 --from beets beet import /path/to/music slides 11,065 ⭐ Aug 2024 - A markdown presentation tool. gotty 19,285 ⭐ Aug 2024 - Share your terminal as a web application. The day-fine system fines people by severity of crime (# of days) and their income (daily disposable income). Finland, Sweden, Germany use it. It’s equal deterrence and more state tax, but needs good data & enforcement, cultural acceptance, and similar income streams (income vs assets, salary vs freelance, …) Claude LLM evals rarely pass all the time or fail all the time. Either would be a good signal, but results are usually mid-way, which can make evals a bit frustrating. Will Larsen A smart way to handle large context and compaction: pass any large input (even text) as a file and always provide file tools to the agent. After compacting a conversation, also pass the conversation history as a file! Will Larson Anthropic’s API lets you upload custom skills and use them via the API. You can share these across the organization. Modern HTML has a huge number of of useful attributes and some elements I knew little about. Most of these improve the user experience, especially on mobile devices. Add popover and popovertarget= to associate elements with popovers. This can replace tooltips, dropdowns, menus, toasts, etc. Add formmethod="dialog" to forms inside <dialog> elements to close the dialog instead of submitting. Add name= attribute to details for accordion-like behavior Add loading="lazy" to images and iframes to load only when user scrolls to them Add fetchpriority="high" (or low) to image, script, link rel=“preload” … to prioritize loading Add inputmode= to inputs for better virtual keyboard experience. Values can be text, decimal, numeric, tel, search, email, url. Add autocomplete= to form inputs for better autofill experience. Values are extensive and multiple values are allowed. E.g.: name, email, username, new-password, current-password, organization, street-address, postal-code, country, tel, url, cc-number, cc-exp, … Add list= to inputs to associate with a <datalist> for suggestions/autocomplete. Add autocapitalize= to inputs and textareas to control capitalization behavior. Values: off, none, sentences, words, characters. Add enterkeyhint= to inputs and textareas to customize the enter key on virtual keyboards. Values: enter, done, go, next, previous, search, send. Add contenteditable="plaintext-only" to disable rich text formatting on editable elements Add inert to disable user interaction. Useful for modals to disable background content. Add form= to associate inputs/buttons with a form outside the form element. Add download= to anchor tags to suggest file download with a specific filename. Add capture="environment" to file input to directly open the outward facing camera/mic on mobile devices. "user" opens the inward facing camera/mic. Use accept= values of audio/*, video/* or image/* to specify media type. Add spellcheck="false" to disable spell checking on inputs or textareas, e.g. for code snippets. <dialog>: for native modals, popups, etc. Methods: show(), showModal(), and close(). <meter>: for displaying scalar values within a known range, e.g. disk usage, battery level, etc. <progress>: for displaying progress of a task. Similar to meter but indicates progress rather than a static value. <track kind="captions">: for adding captions/subtitles to <video> elements. <data value="...">: to capture values in a more query-able way than data-* attributes. Grok Voice Agent API tops the speech-to-speech quality benchmark and is pretty cheap at 5c/min ($3/hr). The Collider Bias: when you analyze a subset, you can get wrong correlations. For example, analyzing top performers can show that performance drops with time - whereas, if you pick everyone, performance improves with time. It’s similar to the Simpson’s Paradox: combining groups can reverse trends. Ethan Mollick fresh is a TUI text editor that I’ve replaced micro with (for now). It has menus and mouse support which shrinks the learning curve. It’s also a single Rust binary. Small Wins Every Day: 100 Powerful Ways to Transform Your Life and Health by Luke Coutinho recommends compounding small habits. Claude Small compounding wins make the brain feel less bad about losing. Continous wins make us feel good. So they’re more likely to sustain. (Atomic Habits / Tiny Habits) What works: Breath control, fasting, regular sleep, keep moving, etc. The Tell-Tale Brain: A Neuroscientist’s Quest for What Makes Us Human by V.S. Ramachandran expands on Phantoms in the Brain. Claude Mirror neurons fire BOTH when we do something OR when we see someone do it. That’s how we learn skills & feelings by imitation. We’re not born with this. They’re formed with practice in childhood. Synesthesia cross-wires sensory inputs, e.g. seeing colors when hearing sounds. When shown a curved vs jagged lines and asked to name them bouba or kiki, 98% name the curved one bouba, mapping the sharp “kiki” sound to the sharp shape. This may partly explain why some people are more artistic, how language evolved (and similarly), and why marketing logos work. He proposes 8 laws of neuroaesthetics as starting hypotheses for understanding art and beauty: Peak shift. We’re attracted to exaggerations. Caricatures, exaggerated feminine curves in sculpture, cubism, super-villains, stereotypes. Grouping. We like to find patterns. E.g. melody from notes, faces from pixels, plots from events. Contrast. We prefer edges to surfaces. E.g. outlined cartoons, silence before a drop in EDM, Holmes vs Watson. Isolation. Removing context helps focus. E.g. sketching, minimalism, unplugged music, solo music, theater spotlight. Perceptual problem solving. We relish a LITTLE effort. E.g. negative art, stereograms, puzzles, mysteries, plot twists, optical illusions. Symmetry. We like balanced forms. E.g. symmetrical faces, architecture, mandalas, poetic justice, verse-chorus-verse, rhymes, plots ending as they began. Abhorrence of coincidence. Everything has a cause. E.g. need for alignment, pareidolia (seeing faces in clouds), Chekov’s gun, deus ex machina. Metaphor. We understand new things via familiar ones. E.g. allegories (Animal Farm is about communism, not pigs), leitmotifs (music BECOMES a character, e.g. Darth Vadar’s march). Phantoms in the Brain: Probing the Mysteries of the Human Mind by V.S. Ramachandran argues we do NOT know ourselves and rewiring our brains can help/hurt. Claude You truly understand something only you observe how it breaks. Brain damage patients reveal how the brain constructs reality. The brain has a “map” of the body. When we lose an arm, it rewires it to adjacent areas, e.g. face. Touching the face triggers phantom sensations in the missing arm. Mirror box therapy works. Have patients put their good arm in a box with a mirror, so it looks like the missing arm. Moving the good arm tricks the brain into thinking the missing arm is moving, relieving pain. The brain has a “model” of the self and reality. If the model is wrong, we get illusions/hallucinations. This is BIOLOGICAL. Mrs Dodds was paralyzed. When asked to touch her nose, she said “I am”. When shown her arm, she said “I don’t feel like it.” Her brain was damaged preventing her from updating her model of self. (Anosognosia) Not My Hand Error damages the body map and deletes an arm from the model. Brain sees the arm but decides it’s someone else’s. (Somatoparaphrenia) Imposter Error breaks the wire between recognition and emotion. We see familar people, don’t feel anything, so decide they’re imposters. (Capgras Delusion) Everyone is Disguised Error strengthens the recognition-emotion wire. We feel strong emotions to strangers, inventing a conspiracy. (Fregoli Delusion) Walking Corpse Error disconnects feedback from the body and emotional centers. We no longer feel alive. So the brain concludes we’re dead. (Cotard’s Syndrome) Somewhere Else Error damages sensory data to place tag mapping. We see medical equipment but feel safe, so we must be at home not a hospital. (Reduplicative Paramnesia) Timeline Error deletes short term memory (alcoholism, malnutrition). We can’t remember yesterday, so we pick the closest we remember. (Korsakoff’s Syndrome) Meaning of Life Error strengthens “what’s meaningful” signals, so we see divine intervention in rocks. (Geschwild Syndrome) The cortex does not know how it does stuff. It invents stories to explain actions after the fact. Blindsight. Despite visual cortex damage, patients can use a different route (reptile vision) from the eye into the brain to “see”. They’re unaware of this. Procedural memory. Patients with short term memory learn new skills (e.g. mirror drawing) but have no memory of learning them. The Libet Delay. Consciousness lags reality by 500ms. We think we decide to move, but the brain has already started moving before we become aware of the decision. The Low Road. Thalamus -> Amygdala is ~12ms for instinctive reactions (fear). Thalamus -> Cortex -> Amygdala is ~30ms for conscious reactions. We feel fear before we know why. Our definition of “self” is an amalgamation of occupying a body, having a history, making decisions, what we value, etc. Damage to different areas breaks different parts of this model. Entangled Life: How Fungi Make Our Worlds, Change Our Minds & Shape Our Futures by Merlin Sheldrake questions the boundaries of identity and intelligence. Claude Fungi form vast underground networks (mycelium) that connect plants, trees, and ecosystems. They exchange nutrients, information, and even memories across species. In fact, the largest organism on Earth is a honey fungus in Oregon spanning 2,400 acres. They can decompose almost anything: petroleum, pesticides, plastics, explosives, even nuclear waste. They can filter air & water, detoxify soil, and make plants resistant. (But we don’t know how to do this at scale without harming ecosystems.) We’re all symbiotic organisms. So what defines “self”? Lichen are a combination of a fungus, alga, and a yeast. The fungus provides structure, the alga photosynthesizes, the yeast protects with acid. The combination produces a long-lived, leafy and resilient “organism”. Human gut bacteria influence our mood; skin bacteria clog pores against pathogens; mites in our eyelashes eat dead skin; mouth bacteria digest nitrates; bacteriophages attack viruses. Intelligence emerges in many ways - not just through neurons. Fungi solve mazes. Slime molds find shortest paths. Termites build breathing mounds. Honey bees communicate location via dance. Have we colonized the planet, or have dogs, wheat/corn, fungi, … colonized us? The Demon-Haunted World: Science as a Candle in the Dark by Carl Sagan calls for a more scientific temper in daily life. Claude In the 1990s, the alien abduction phenomenon was rampant. Paralyzed in bed, taken to spacecraft, remember via hypnosis. this is sleep paralysis, when brain partially wakes while body is in REM sleep. 5-40% of people experience it at least once. It led to witch burning, satanic panic, and now, alien abduction stories. Same phenomenon, different interpretations based on culture and time. This is a common pattern when communities face uncertainties: plagues, famines, social change. Someone proposes a non-falsifiable explanation with a scapegoat, gains power, and fear spreads. Fake news, conspiracy theories, cults thrive in such environments. We evolved for explanations. That bush sound must’ve been a lion. The cloud is a dragon. Someone caused the plague. It takes effort to fight it. Check for Evidence: Is it independently verifiable? Good data? Check for Logic: Is it falsifiable? Logically sound? Check for Bias: What are alternatives? What’s my/their motive? The Stuff of Thought: Language as a Window into Human Nature by Steven Pinker suggests that all languages has common patterns and that the brain packs complex ideas into this simple structure for transmission. Claude Verbs across languages typically cover cause of motion (threw), manner of motion (walked), state (broke), possession (gave), force (hit). (But culture also shapes these.) Spaces is used as a metaphor for many things. Markets go up, people grow close, time flies. (But the Aymara of the Andes say the future is behind and the past is in front.) Names are labels for people, not descriptions. (But some names DO describe, e.g. Potter, Mumbaikar, von Neumann) Indirect speech saves face, e.g. “Could you pass the salt?” not “Pass the salt”. (But culture matters, too.) Swear words are typically about sex, excretion, religion, slurs, diseases (“pox”), … and stored in the limbic system (an ancient portion) not the language circuits. They’re emotional outburts, closer to laughing or screaming than speaking. (Mostly true.) Verbs assign cause, agency, responsibility, … e.g. killed vs died, allowed vs made, etc. Language is made of core concepts: space and motion, time, causation, possession and transfer, goals and intentions. (Unproven. Usage based linguists disagree.) The Blank Slate: The Modern Denial of Human Nature by Steven Pinker reiterates the modern belief that genetics determines part of our psychology. Claude Western philosophy says we’re born a blank slate (Tabula Rasa), are naturally good but corrupted by civilization (Rousseau), and the mind is separate from the brain (Descartes). All three are wrong. 🟢 Identical twins raised by separate families shared characteristics, e.g. wearing rubber bands around wrists, flushing toilet before & after, naming sons James Allen / James Alan, volunteering as firefighters, … Research shows 40-60% of variation in psychological traits is accounted for by genes. 🟢 Babies have innate capacities for language, number sense, understanding of physical objects, and basic moral intuitions. 🟢 The brain is the same as the mind. Damage to brain = damage to mind. 🟡 Pinker claims that our mind was shaped by evolution, e.g. men take more risks because it got them more mates. This is unproven. 🟡 Pinker claims violence has reduced over time. This is unproven. 🟡 Pinker cites Harris’ research that parenting style has little effect. This is unproven. How the Mind Works by Steven Pinker argues that the mind evolved as tools to solve specific problems. Claude The brain is literally a computer: a bunch of neurons that fire based on a function of the inputs. It evolved into a mix of special-purpose tools, not general purpose. Facial recognition, language, object detection, spatial navigation, social cues, etc. (But in reality, it may be a mix of special + general purpose. Degree of specialization is unknown.) Some of this is complex. E.g. each eye captures 2D, but we use complex cues like shading, parallax (closer things move more) and steropsis (difference between what each eye sees) Emotions evolved for survival. (Basic emotions have strong evidence: fear, disgust, revenge, … but complex ones like love, sacrifice, social emotions are unproven.) We prefer closer kin over distant kin. (But culture & context play a part, too, and it’s not the sole factor.) Art may have evolved accidentally - exploiting things that evolved for other purposes. (But it may be genuine adaption, e.g. for sexual selection or group bonding. Divided opinions.) Men and women evolved differently. Men prefer things, women prefer people. Men do better in 3D mental rotation. Men have a wider IQ distribution (but cultural factors amplify this.) Also a few contested claims: Men are better at mathematics (this has narrowed and may be cultural). Women are better at language (small difference). Testosterone masculinizes the brain (unclear if it’s behavioral or bioliogical.) The Language Instinct: How the Mind Creates Language by Steven Pinker argues that language is inborn, universal, and an evolutionary advantage. Claude Deaf kids in Nicaragua spontaneously invented their own sign language. Younger kids who copied them added grammer, tenses, and abstract concepts. This is atypical: we learn language by “growing it”, unlike skills which we copy. In fact, we over-apply grammar. “I goed to the store.” Pinker argues this is inborn. The Language Myth (Evans, 2014) argues lack of evidence. It’s unproven if it’s emergent or inborn. He claims all human grammar is roughly equally complex and roughly equivalent. (Vocabulary grows by need.) But there’s no proven “universal grammar” we know of yet. Grammar does have genetic pinnings. E.g. A mutated FOXP2 gene causes grammatical impairments. It doesn’t affect grammar as such, but fine motor control of mouth and tongue. But still, there’s some evidence. The strong Sapir-Whorf hypothesis that “language determines thought” is not true. We can think concepts that don’t have words. The weak version “language influences thought” has some evidence. Russian speakers who have separate words for light blue and dark blue can differentiate them faster. People with separate words for north/south (vs left/right) have better spatial orientation. He claims language provided us an evolutionary advantage. Evidence for this is pending. Metabolical: The Lure and the Lies of Processed Food, Nutrition, and Modern Medicine by Robert H. Lustig gives good diet advice but not so good scientific/economic ones. Claude There’s a trend of “lean diabetes” - diabetes in lean people. BMI isn’t a reliable biomarker for diabetes risk. (But it’s better than the book suggests.) Chronic diseases are due to cell dysfunctions, all can be improved with diet (but not as much as the book suggests.) “Fructose is the main villain”. But studies don’t find fructose doing more harm than anything else. “Protect the liver.” Less sugar, alcohol, and other toxins. (True) “Feed the gut”: More fiber. Both Keto and Vegan diets do this. (True) “Whole foods » highly processed foods”. (Very true - strong evidence.) Big Food, Big Pharma, Big Govt have low incentives to promote this. (Partly.) Sometimes, I need a browser with a custom DNS mapping to temporarily override DNS, e.g. when I have a dev version of a site on one IP and a production version on another. In that case, using something like chrome --host-resolver-rules="MAP www.s-anand.net 192.254.190.216" --user-data-dir="/tmp/chrome-dev" works well. You can replace chrome with microsoft-edge or opera or anything Chromium based. Build: An Unorthodox Guide to Making Things Worth Making by Tony Fadell suggests becoming the KIND of person who makes worthwhile products. Claude Everything you need to know about success, you learn from failure - if you pay attention. Products take three iterations before they succeed. Prototype, product market fit, business model. IPhone. IPod. Windows. Nest. All followed this pattern. Budget for it. Create the story for the product WHILE, not after, you build it. Bake it in. Differentiate between assholes based on what they care about. Power? Ego? Mission? The third type is worth tolerating, even getting behind. Your next idea is probably hiding in plain sight, annoying you. Thermostats did that to Fadell. Ugly, outdated, and controlling 10% of US energy. He built Nest. Quit when you know what next. Not just when you don’t like where you are. We’re wired to ignore failure to protect self-worth. We do that through cognitive biases. Gemini Devaluation (sour grapes): I never wanted it anyway Externalization (not my fault): It was an unfair test. The market is irrational. Virtue signaling (moral high ground): Rich people are unhappy. I don’t play politics. Sabotage (self-handicapping): I didn’t study. I did this last minute. Dissociation (fatalism): It happened for a reason. Intellectualization (false pivot): I learned so much. Same as Ever: A Guide to What Never Changes by Morgan Housel suggests doubling down on timeless principles. Claude Random luck drives many outcomes. The kamikaze that saved Japan from the Mongol invasion. The East River fog that saved George Washington’s army. Penicillin. Hilbert and Einstein almost raced to formulate the final equations of general relativity after Einstein presented his incomplete theory in 1915 summer. Einstein won by cramming - just like students today. Technology changes. Psychology does not. Risk is what you don’t see. Blind spots. Prepare using margins of safety / optionality, distributed failure points, survival > success, … Stories > Ideas. Stories are how our brains work. They’re leverage for ideas. Wrap EVERYTHING in a story. High expectations = Low happiness. So, visualize failure/disaster, practice gratitude, compare downwards. Compounding is magic. In any asset: money, skills, relationships, health, … So, automate the decisions, be patient and don’t interrupt. Success carries the seeds of failure. The innovator’s dilemma, the Malthuian trap, or the Dynastic cycle. So, be paranoid, stay simple, kill cash cows, practice discomfort. Why We Die: The New Science of Aging and the Quest for Immortality by Venki Ramakrishnan says that there’s no reason we have to die at our current age. But we don’t have proven ways to extend life yet. It’s also not clear if/how we should. Claude Evolution has optimized us for reproduction. After reproduction age, it doesn’t care. “Death is the price we pay for sex.” Telomeres are DNA sequences at the end of chromosomes that shorten with each cell division. When too short, cells die (apoptosis) or become zombies (senescent). These zombie cells secrete toxins that inflame / damage nearby cells. When young, our immune system clears them out. With age, they accumulate. With age, mitochondria (cell powerhouses) become less efficient. With age, the quality of proteins we make decline. They start clumping (like scrambled eggs), leading to Alzheimer’s, Parkinson’s. Some animals live longer than expected. There’s no reason our life span HAS to be what it is. The Naked Mole Rat lives 30+ years (10x longer than mice) without cancer, and can repair their own tissues. The Greenland Shark lives 400 years. The Hydra and the “Immortal” Jellyfish can regenerate when some parts are chopped off. Their chance of dying doesn’t increase with age. But there’s a lot of hype. Current methods are far from proven. Telomere-extending supplements are not FDA approved. They might work on mice, not men. Rapamycin helps mice live longer. But suppresses immunity, so risky for humans. Senolytics kills senescent cells. They might work. Yamanaka won a Nobel prize for turning adult cells into stem cells. But it could cause cancer. Injecting young rats’ blood into old rats helps the old rats, but old blood hurts young rats. So: diet, exercise, and sleep Also: longevity will help the rich more, increase stagnation, and what’s the point of living longer with an aged brain? The Happiness Hypothesis by Jonathan Haidt blends ancient wisdom with modern philosophy. Claude Happiness = Set point + Circumstances + Voluntary activities Set point has ~50% impact. Haidt suggests this doesn’t change. Research shows major life events can shift it a bit. Circumstances: We adapt to some stuff (money, house, etc.) but not to others (commute, noise, lack of control, relationships) Voluntary activities have variety that we don’t adapt to. Meditation, learning, exercising, … Modern CBT is similar to Stoicism. Events don’t upset us, our thoughts about events do. So change the thoughts. ACT (Acceptance and Commitment Therapy) is like Buddhism which suggests observing, not changing, the thoughts. CBT seems better for acute / specific stuff, logical people or beginners. ACT seems better for chronic / vague unease, grief, etc. Brains rationalize more than reason. There are more signals INTO the prefrontal cortex (PFC) than out of it. We make up stories to justify our actions. This evolved to make us look good socially. Adversity can help but only if it’s significant but not overwhelming. It takes time and support to learn from adversity. Works only if we interpret and integrate it well. Quality of relationships is a strong driver of happiness. Something the Stoics and Buddhists didn’t emphasize as much as Confucius did. Reality is Not What it Seems: The Journey to Quantum Gravity by Carlo Rovelli shares his theories. Mainstream but not proven. Claude In quantum mechanics, particles can interfere with themselves and their position “snaps” only when observed. Multiple theories interpret this: Copenhagen interpretation: Observation is special and collapses the wavefunction. But what counts as observation? Bohm’s interpretation: Particles “surf” the wave. Waves interfere, but particles only take one path. But needs non-local hidden variables. (Testable) Objective collapse: Wafe functions collapse when too “big” or complex, even if no one’s looking. But how big? (Testable) Many worlds: Sever possibility creates a parallel universe. But … Occam’s razor? QBism: Wavefunction is just our knowledge, not reality. Particles have properties, measurement updates our knowledge. Rovelli’s Relational quantum mechanics: position, momentum, etc. are relative. It has position relative to an observing device/particle. No absolute state. Reality literally is perspective. Loop quantum gravity: Aims to bridge general relativity and quantum mechanics by modeling spacetime as discrete loops. Far from proven, but possible. Space has a smallest unit - Plank length (~10^-35 m). You can’t subdivide space infinitely. Space is made of atomic “loops” that spin. They’re connected to form a fabric (“spin foam”). They’re not “in” space. They ARE space. They interact with matter/energy to create gravity and evolve over time. Predictions: Black holes don’t have singularities, since you can’t have infinite density. Entropy of black holes comes from the number of ways loops can arrange on the event horizon, so it’s proportional to surface area, not volume. Time doesn’t exist fundamentally. It emerges from change and relationships between things. Again, not yet proven, but possible. For example, the Wheeler-DeWitt equation in quantum gravity has no time variable. It’s a snaphot of the universe across all time. The universe is a giant graph of relationships between quantum events. Time is just how we order these events from our perspective. Implications: there’s no master clock and the present is local. Duration only emerges at larger scales, like temperature emerges in thermodynamics. The Emperor of All Maladies: A Biography of Cancer by Siddhartha Mukherjee. Claude Cancer has always existed. We just didn’t live long enough for it to affect enough of us for most of history. In 1890s, Halsted developed radical mastectomy - removing the breast + chest muscles + lymph nodes … to prevent spread. It didn’t improve survival but disfigured. In 1947, Farber injected cancer children with a drug that blocked folic acid (which cells need to grow). Tumors shrank, but relapsed. This was the first chemotherapy. In 1950s, cigarettes were found to cause lung cancer but the tobacco industry delayed regulation for decades. In 1971, Lasker & Nixon declared “War on Cancer” with $100m funding. (Impact: increased awareness, more research, not cure.) In 1970s, we found that the virus that caused cancer in chickens carried an “oncogene” that caused uncontrolled growth. Hence, cancer isn’t a virus, but a genetic mutation. Also, the p53 gene that suppresses tumors is mutated in half the cancers. In 2001, FDA approved Gleevac, a drug that specifically targets a specific protein that causes a certain cancer (chronic myeloid leukemia - CML). This was the first “targeted therapy”. In 2011, FDA approved ipilimumab, a drug that blocks CTLA-4, a protein that stops immune T-cells from attacking tumors. This was the first “immunotherapy” (by James Allison) which offers long-term protection. But it works only for some cancers, some patients. In 2018, Alison shared a Nobel prize with Tasuku Honjo, who discovered another immune checkpoint PD-1. Tumors produce PD-L1 that binds to PD-1 on T-cells to turn them off. Drugs that block PD-1 or PD-L1 unleash T-cells to attack tumors. In 2018, the Cancer Genome Atlas was published, showing that even the same cancer (e.g. lung) has different mutations in different patients, requiring personalized treatment. In 2017, FDA approved a CAR-T therapy for children with acute blood cancer. We extract a patient’s T-cells, insert a gene with a receptor that recognizes specific tumor cells, grow them by the billions, and infuse them back. But there are severe side effects and it doesn’t yet work for solid tumors. In 2024, FDA approved a cellular therapy for skin cancer. We extract the T-cells INSIDE the tumor (that recognized the cancer but were overwhelmed), grow them by the billions, and re-infuse them. In 2024, we’re exploring AI-powered analysis of blood tests to find DNA fragments of several types of cancer - “liquid biopsy”. It’s early stages. The Song of the Cell: An Exploration of Medicine and the New Human by Siddhartha Mukherjee. Claude Metaphor: Cells as autonomous “citizens”. Cancer is a rogue cell rebellion. Immune system is law enforcement. Type 1 diabetes is friendly fire. We’re growing from fixing organs (surgery) to chemistry (drugs) to cells (e.g. bone marrow transplant, IVF - we’re in the early stages). E.g. CAR-T Therapy: Extract T-cells, genetically modify them to recognize cancer, re-inject. But it’s costly, severe side effects, works mainly for blood cancers. He predicts that we’ll have: Prediction: Lab-grown organs from patients’ cells. (Growing is easier than organizing into functional organs. We may be a few decades away.) Prediction: Gene editing & cell therapies will converge. CRISPR edits cells that we transplant back. (This was approved for sickle cell anemia in 2023. Seems promising.) Prediction: Anti-aging cellular medicine. Senescent cell research and telomere biology have progressed, but this is a hyped field in early stages. Some of these will likely be expensive and inaccessible to most people, at least at first. Recollecting something Mr KP Krishnan told us in 2000 about the 1991 deregulation (fact-checked). “A meeting happened in Mr. Narasimha Rao’s house, where he emerged from a bath, toweling himself. His immediate advisors told him that we had only a few weeks of cash left and that we would need to accede to the World Bank’s request, but that the parliament would likely not agree. So, instead of risking a vote on a new law, they decided to bypass Parliament’s immediate approval entirely. They tabled the reforms as a ‘Statement of Industrial Policy’ right before the lunch break, just hours before the big Budget speech. Since it was a ‘Statement’ and not a ‘Bill,’ it didn’t require a vote to pass. It fell under executive powers and could be legislated later. By the time the opposition realized the ‘License Raj’ had been dismantled, they were already distracted by the Budget presentation that evening.” Outcomes over Output by Josh Seiden suggests that between output (e.g. features) and impact (e.g. revenue) lies outcome (e.g. user engagement) - leading indicators that you can organize around. Claude Ensure ownership of outcomes. Who owns increased checkout conversion rate? Payments, engineering, marketing, product, or UX? You may instead need small cross-functional activation, engagement, and retention teams. PM, designer engineer. Validate that outcomes lead to impact. This can be slow, and attribution is hard, but is important to continuously validate. Outcome change takes months, not weeks. So sprint using Now/Next/Later later roadmaps. As you learn, re-prioritize outcomes. Stakeholders want specificity. So quantify outcomes (+10% conversion) and timeframes (in 6 months). Stop experimenting and ship when you’ve validated the opportunity (customers need really connects to outcome) AND solution (feature really improves outcome). This is Torres’ Opportunity Solution Tree (OST). Change incrementally. If you’re running a feature backlog, continue. Add an “outcome hypothesis” field to each feature and create evidence. The Culture Map by Erin Meyer argues that cultural differences are practically alien languages. Claude There are 8 dimensions of culture. Communication: Low-context (precise, explicit, clear) like Americans vs High-context (implicit, layered, nuanced, between-the-lines) like the Japanese Evaluating: Direct negative feedback (blunt, honest) like the Dutch vs Indirect negative feedback (tactful, polite) like Thai or Japanese Persuading: Principles-first (deductive, theoretical) like the French vs Applications-first (pragmatic, practical) like Americans Leading: Egalitarian (flat organizational structure) like Swedes vs Hierarchical (respect for authority) like India, Nigeria, Japan, Korea Deciding: Consensual (group agreement) like Japanese vs Top-down (leader decides) like Russians Trusting: Task-based (trust through competence/reliability) like Americans, Germans vs Relationship-based (trust through personal connection) like Arabs, Chinese Disagreeing: Confrontational (open disagreement) like Israelis vs Avoids confrontation (harmony, save face) like Thais Scheduling: Linear time (one thing at a time, punctual) like Germans vs Flexible time (multi-tasking, fluid) like Indians Critique is that this is anecdotal, not research driven, stereotypical. Meyer’s aim is to sensitize. Action: Before meeting people, have LLMs plot their culture map and share advice.

Things I Learned - 28 Dec 2025

This week, I learned: The Body Keeps the Score by Bessel van der Kolk argues that trauma is stored in the body, not just the mind. Claude Trauma recall shuts down Broca’s area (speech). Trauma survivors literally struggle to talk about trauma. Our nervous system has a calm social engagement state, a fight-or-flight state, and a freeze or shutdown state. For trauma survivors, the nervous system gets stuck in fight-or-flight or shutdown. (Based on the contested Polyvagal Theory.) Childhood trauma leads to several major health problems - heart disease, autoimmune disorders, depression, addiction, … Recalling traumatic memories while following a therapist’s finger with your eyes (EMDR) works. Yoga is promising but unproven. Neurofeedback (altering brainwave patterns with EEG feedback) even less proven. Clearly, trauma is stored in the body, not just the mind. We might need to rethink therapy. The Extended Phenotype by Richard Dawkins argues that genes shape not just the organism but the environment too. Claude In The Selfish Gene, he proposed that organisms are “survival machines” for genes. In this book, he extends this idea to show how genes can influence the environment beyond the organism’s body. The dam of the beaver, the brain of an ant infected by a parasite, gut bacteria, are examples. Critics argue that this may be tautological. It’s hard to falsify. It’s more a mental model than a theory. Also, there’s critique (see below). The general view is that there’s merit to both perspectives. Epigenetics: Dawkins argues that only genes are inherited. But some RNA, proteins, and epigenetic markers are also be inherited. Developmental plasticity: Dawkins downplays the role of environment in shaping phenotypes. Multi-level selection: Gould argues that selection happens the organism and group levels too. Niche construction: Dawkins says genes modify the environment. But environments modify genes too. The Structure of Evolutionary Theory by Stephen Jay Gould expands on Darwinism, suggesting there’s more than only natural selection. Claude Darwinism proposed continuous, smooth evolution. Gould proposes punctuated equilibrium - stable periods interrupted by rapid change. Fossils support this. Darwinism proposed selection of individuals. Gould proposes hierarchical selection - genes, individuals, groups, species. This is debated but has merit. Darwinism proposed every feature has a reason. Gould suggests some are byproducts of other adaptations (spandrels). Not every trait is adaptive. This is generally accepted. Darwinism proposed that humans would evolve if we replayed history. Gould argues that it’s chance. Current opinion is convergence, i.e. something like us would still likely evolve. (He probably didn’t need to write such a long book over 20+ years for this. Also, it led to the Darwin Wars, mostly with Dawkins.) The Gene: An Intimate History by Siddhartha Mukherjee explains the history of genetics. Claude Mendel’s pea experiment numbers seem too neat. He probably didn’t fudge it but stopped at good results. Rosalind Franklin’s X-ray diffraction images were key to discovering DNA structure, but she didn’t get enough credit. Recombinant DNA (1970s) lets us copy-paste genes between organisms. E.g.: we can find the DNA sequence for the insulin protein in humans, copy it into bacteria, and have bacteria produce insulin for us. How it works: Restriction enzymes cut DNA at specific sequences. E.g. EcoRI (from E. coli) cuts DNA at GAATTC. Many cut with one strand overhangs that stick to complementary sequences, making assembly easy. DNA ligase paste DNA strands together. Plasmids (circular DNA in bacteria) are vectors that carry foreign DNA into host cells. We can paste DNA to plasmids and introduce them into bacteria. Some viruses work similarly for animals / humans. This is useful for creating medicines, crops, and gene therapies.: Medicines: e.g. insulin, human growth hormone, clotting factors for hemophilia, vaccines (Hep B), erythropoietin (EPO) for anemia, cancer therapies, focused antibodies, etc. Agriculture for genetically modified (GM) foods: pest/drought-resistant crops, biofortified foods (Golden Rice with Vitamin A), nitrogen-fixing plants, etc. Gene therapy: replacing faulty genes to treat genetic disorders (inherited blindness, immune deficiencies, blood disorders, muscular dystrophy, etc.) This is risky because of the unintended consequences, equity, and long-term risks: Unintended consequences: Crop genes can spread. Herbicide-resistant weeds have emerged. Equity: Corporates control gene patents, concentrating power and limiting access. Only the rich can afford gene therapies. Long-term risks: Biological weapons, ecological disruption, new diseases, etc. The Human Genome Project (1990-2003) sequenced the entire human genome (3.2 billion base pairs). This helps identify disease genes, understand genetic variation, and develop personalized medicine. They chopped the DNA into small pieces, multiplied them using bacteria, paired them with colored markers to read them, and reassembled the full sequence using overlapping regions. We have 20-25K genes. 99.9% is the same between humans. The 0.1% accounts for ALL human diversity. A lot of the genome is not for protiens, but for regulation, i.e. when and where genes are expressed. This enables pharmacogenomics, i.e. custom drugs. Read a genome and predict which drugs will work best. Also targeted cancer therapies, i.e. read the tumor genome and design smart bomb drugs. Ancestry and crime solving. Find distant cousins, catch the Golden State Killer, etc. We can sequence our genome for ~$600 in 24 hours and it’s falling. (Analysis is expensive.) CRISPR (2010s) lets you edit genes precisely. These “Clustered Regularly Interspaced Short Palindromic Repeats” are in bacteria. When bacteria survived a viral attack, they store a small piece of the enemy DNA in their own genome to recognize it. Cas9 is an enzyme that cuts DNA at a specific location suggested by Guide RNA. It unzips the DNA, matches the guide RNA to one strand, and cuts both strands. This disables the gene. Or, we can insert a new DNA sequence. This has been used to cure sickle cell anemia (which has a ‘GTG’ instead of ‘GAG’ in the hemoglobin gene, changing glutamic acid to valine) by editing bone marrow cells (not to fix this - that’s hard - but to reactivate a fetal hemoglobin gene). This is FDA approved. Scientists are trying to edit the Asian Elephant to include woolly mammoth traits, make spicy tomatoes, etc. Risks: CRISPR might cut a SIMILAR but unintended gene. We can edit genes for better humans (like in Gattaca) and create edited species. Epigenetics is about how gene expression (not the DNA) changes based on environment and lifestyle. Epigenetics has 2 mechanisms. First, DNA has tags (methyl groups) that turn genes on/off without changing the sequence. Second, DNA is wrapped around protein “spools” (histones). Tight wrapping hides genes, loose wrapping exposes them. In the Agouti Study, mice fed methyl-rich diets had brown, healthy babies. Mice without it had yellow, obese babies prone to cancer. Queen bees are identical to worker bees genetically, but royal jelly changes their epigenome to make them queens. Grandchildren of the 1944 Dutch famine survivors have higher obesity, heart disease risk. Epigenetic changes are inheritable. Epigenetics inherits via sperm by (a) retaining ~1-10% of histones wrapped around important genes, and (b) small RNA molecules that regulate gene expression. Epigenetics inherits via eggs by (a) retaining several histones and (b) impact of the fluid environment in the womb. Also, mother eggs were developing when she was a fetus in grandmother’s womb, so grandmother’s environment matters too. Mother epigenetics affects 3 generations. Fathers affect only 2. There are ~100-200 imprintable genes that determine whether the dad’s or the mom’s gene is expressed. Growth is one example. E.g. dad IGF2 gene pushes for growth, while mom H19 gene limits growth to conserve resources. Lions have strong “grow/stop” genes. Tigers have weak ones. Ligers (Lion dad, Tiger mom) are huge. Tigons are small. Eugenics is about improving humans by controlling breeding. Invented by Francis Galton (Darwin’s cousin) who founded psychometrics (IQ tests), fingerprinting, correlations, questionnaires, anthropometry (measuring humans), and a female attractiveness map of the UK (London » Aberdeen). He suggested that the best humans breed, and the worst be prevented. The US and many countries adopted this (1900s). E.g. Buck v. Bell (1927) said forced sterilization of “feeble-minded” people was legal. Oliver Wendell Holms: “Three generations of imbeciles is enough.” The last euginic law sterilization in the US was in 1981. California prisons sterilized females (2006-2010). Nazi Germany industrialized this. Deaf, blind, mentally ill, then eliminate gene pools. CRISPR and gene editing lets us design babies - another form of eugenics. Iceland and Denmark have eliminated Down syndrome births through screening and selective termination. It’s a bit controversial. Immune: A Journey Into the Mysterious System That Keeps You Alive by Philipp Dettmer explains the war our immune system wages daily. Claude Immunology is, as science writer Ed Yong memorably put it, “where intuition goes to die.” It’s the kind of subject that makes medical students weep and practicing physicians throw up their hands. We have an innate immune system. Genetically programmed for common pathogens. Fast, but limited. Like: Macrophages: beat cops that patrol tissues and eat dozens of bacteria before dying. Neutrophils: SWAT team that rush in, spray toxic chemicals (with collateral damage), and die. They rip out their own DNA to make nets that trap bacteria! Natural killer cells: bouncers that kill cells without an ID or have been infected (cancer, virus). Dendritic cells: spies that capture pieces of invaders and present them to the adaptive immune system. Mast cells: alarms that explode and release histimine (causing inflammation) to call for backup. Eoisinophils: bombers that drop toxic enzymes to melt parasitic worms too large to eat. The adaptive immune system is smarter and slower (days to weeks). It generates millions of cells with random DNA to create a 3D sheet with loops to grab specific antigens. It combines: ~40 Variable segments: “gloves” of ~95 amino acids to grab antigens. Like a 4x4 lego brick. ~25 Diversity segments: “fingers” of ~15 amino acits for the glove to grab better. Like a 1x2 lego piece. ~6 Joining segments: “tips” of 3-5 amino acids to connect glove to arm. Like a 1x1 stud. It randomly chews off a a bit from the ends and adds a few random bits to create ~10^15 potential combinations. When a new cell is born, the Thymus (near the upper chest) tests if it can attach to invaders and whether it’s peaceful to body cells. Failing cells are killed. ~2% survive and go to the lymph nodes. These can be: Helper T cells: generals that coordinate the immune response by activating B cells, killer T cells, and macrophages. Killer T cells: soldiers that inject toxins into infected cells to kill them. B cells: factories that churn out antibodies that stick to invaders, gum them up, and tag them for destruction. When we get sick, the dendritic cell grabs an antigen (piece of an invader), sends it to the nearest lymph node, and if a Helper T cell recognizes it, it activates the B cells and Killer T cells specific to that invader. This can take days, and multiplies ~1,000x every ~2 days. Also, the B cells divide with intentional mutations and evolve to find mutants that catch invaders better. Better fits multiply faster. “Immune boosting” is a misnomer. We really want balance, and diet, fruit, vitamins, antioxidants, probiotics, sleep, exercise, stress reduction, social connection, etc. help. But vaccines are the best way to train the immune system. Every breath and meal draws in invaders. They’re catalogued and tolerated or destroyed. It’s incredible! Measles reduces immune memory 11-73%, wiping out years of immunity to other diseases. So, when I had measles in 2009 after my splenectomy in 2004, I had a double whammy. Damn! Didn’t know that. Some books, like The Choice, aren’t meant to be summarized. I can’t even summarize the summary. The Origin of Consciousness in the Breakdown of the Bicameral Mind by Julian Jaynes proposes that introspective consciousness emerged ~3,000 years ago. Unproven but unfalsified. Claude He theorizes that until ~3,000 years ago, the right part of the brain generated “voices” the left part obeyed. The Iliad heard voices. The Odyssey has a self-aware hero. That’s why ancients across cultures heard gods’ voices. Idols were meant to trigger these voices. Kings literally spoke for the gods. People didn’t feel responsible for their actions. With population growth, writing, and the Bronze Age collapse, humans were forced to adopt alternate cognitive strategies, leading to consciousness. That was also when philosophy, introspective religions, and new forms of literature emerged across the world. The Axial Age. Schizophrenia may be a vestige of this bicameral mind, where the right brain’s “voices” are misinterpreted by the left brain. Hypnosis, oracles, and creative muses are other remnants. Neurological support is weak but literary/cultural analysis is strong. His theory hasn’t been falsified. The Slight Edge by Jeff Olson suggests small, consistently repeated actions. Claude Small actions compound exponentially. But what’s easy to do is also easy to skip. Quitters stop because it feels like small actions don’t matter - leading to exponential decay. Willpower is overrated; time is underrated. What to do: Show up. Consistently. Even if no one’s watching. Commit long-term. With optimism and purpose. Pay the price. Also: Happiness isn’t just a result of success. It’s often a cause. And habits also influence people around us. Why We Sleep by Matthew Walker elevates the importance of sleep - but also exaggerates. Claude Sleep has phases. In the first half, deep sleep (NREM) dominates. It consolidates memories. Then REM dominates. Dreams, connections, creativity, emotional regulation happen here. It resets brain and body health. Sleep deprivation worsens focus, memory, immunity, metabolism, heart health. Phones, caffeine (5-7 hours before), alcohol, early alarms, irregular schedules, late meals, warm bedrooms - all hurt sleep quality. But a lot of claims are exaggerated, unproven, or false. Sleep loss isn’t a WHO epidemic. More sleep != longer life, it can shorten it (~7 hours seems optimal). Sleeping < 6-7 hours doesn’t impact cancer. Modern societies don’t sleep less than historically. Sleep deprivation helps depression. He also removed inconvenient data from a graph. BTW, anxiety about lack of sleep worsens sleep. So, chill. How Not To Die by Michael Greger suggests a plant-only diet, but evidence indicates otherwise. Claude Eat more plants, less processed food, cook meat at lower temperatures, and exercise. This much is valid. Unlike his claims, Omega-3s help heart disease. Milk reduces asthma risk. Soy doesn’t seem to benefit non-Asians. Fish prevent dementia. He buries B12 deficiency risk. Overall, studies are cherry-picked and even contradicts his statements. Behave by Robert Sapolsky reasons our actions across seconds (neurological), hours (hormones, environment), months (neuroplasticity, learning), decades (genes, culture), millenia (evolution). Claude The Brain The amygdala detects uncertainty, not just fear/aggression. The unfamiliar triggers it. But the response diminishes with exposure. It can be modulated by the prefrontal cortex. The prefrontal cortex plans. It develops late and can override amygdala responses. It is impaired by stress, fatigue, alcohol, etc. It can plan genocide / retribution as well as peace. The Hormone Testosterone is an amplifier. It amplifies agression but also generosity! It lowers threshold for context-appropriate behaviour. Oxytocin promotes in-group trust, but promotes out-group hostility. It’s the molecule of tribalism, not love. The Childhood Early stress triggers epigenetic changes that makes the amygdala hyperactive, impairs prefrontal cortex, and alters stress hormone regulation. This leads to impulsivity, aggression, anxiety, depression, addiction, etc. The Evolution We are naturally hierarchical and tribal. Thanks to language, we can expand / contract our tribes to include / exclude anyone (or anything) based on arbitrary stories. Let’s approach wrongdoing with humility. Punishment and rewards CAN shape behavior. But a focus on prevention over retribution may help more. Price of the Modi Years by Aakar Patel suggests that most metrics worsened under Modi. True, but these also seem cherry-picked. Claude Announcements had more impact than executions. 99% of the 15.3 lakh crores demonetized came back to banks, laundered. Manufacturing GDP share fell from 16% to 13% after Make in India was launched. Employment fell from 5.1 to 2.7 cr. Press freedom rank fell from 140 (2014) to 151 - and is now 159. Democracy rank fell freom 27 (2014) to 41 (2021). India is classified as a “Flawed Democracy”. Human Development Index is stagnant at 130. Global Hunger rank fell freom 55 / 76 (2014) to 102 / 123 (2024?). But: Economic Freedom rank rose from 112 (2014) to 84 (2023?). Digital payments, rural electrification, toilet construction, etc. aren’t mentioned. The Ants by Bert Hölldobler, Edward O. Wilson is the only science textbook (!) to win a Pulitzer. Claude Ants are only ~20% of human biomass, not 100% as the book claims, but that’s 2.5m ants per human! ⭐ Sterile ants, which are all female, help the queen reproduce instead of having their own kids. Ants, bees, wasps, etc. are haplodiploid, i.e. females have father + mother genes, males develop from unfertilized eggs (only mother genes). So, sisters share 75% of genes, more than 50% with their own kids. Helping mothers make sisters is better than having kids! (And if this isn’t an alien civilization, what is? We don’t need sci-fi. Nature is weirder than fiction.) But their math doesn’t hold up! If the queen produces 50% brothers (with 25% common genes) and 50% sisters (75% common genes), the average is 50% common genes, same as having kids. But building a nest is hard, specialization is efficient, etc. So ants stay sterile and help the colony. BTW, this gene math only works if the queen is 100% monogamous - so they are, or at least, were, until evolution locked it in. (Making nature one-step less alien. But still weird.) Epigenetics determines caste. More food or specific food (e.g. royal jelly in bees) changes gene expression of the same DNA. When a queen dies, some ants (e.g. Indian jumping ants) can reprogram a worker ant into a queen through diet! Leafcutter ants have been “farming” for 50 million years. Rather, are part of an agricultural symbiosis. They cut leaves and feed it to a fungus they cultivate in their nests. They protect the fungus from pests using antibiotics produced by bacteria living on their bodies. They clear decay and weeds. They can’t live without the fungus because it produces a nutrient (Arginine) that they need but can no longer make themselves. Ants communicate using pheromones, touch, and sound. Pheromones can communicate species, colony, caste, reproductive status, alarm, food trail, etc. These evaporate unless reinforced. They have a bigger, more advanced, brain region than other insects. It’s not multiple brain parts coordinating. Using CRISPR to knock out pheromone receptors makes ants unable to communicate. Mutant ants wander aimlessly until killed by the colony. They tap each other with their antennae: to taste skin for identification, or to beg for regurgitated food. Some ants have a scraper on their waist that they rub against their abdomen. Triggers “emergency alarm”, e.g. “I’m buried” or “Help me cut!” Ant colonies are superorganisms, i.e. agents that work together to produce emergent behavior. They have sterile castes. Only ants, bees, naked mole rats, … qualify. The nest is like a giant lung. Passive ventilation sucks CO2 from top chimneys and brings in O2 from lower entrances. They regulate temperature by opening/closing nest entrances. They send workers out for water to evaporatively cool the nest. They circulate nutrients by vomiting food into each others mouths. (Ants have two stomachs - one for themselves, one for sharing.) Ants can’t digest meat but larvae can, so they feed meat to larvae and share the digested food. Larvae act like a liver. They have an immune system. Sanitation squads carry dead ants far away. Fungally infected ants leave the nest and die alone. Infected pupae are killed by workers. They have a neural system. Memory is stored in pheromone trails. Ant politics exists. E.g. Workers destroy eggs laid by other workers, protect sister-laid eggs, etc. The Cancer Code by Jason Fung suggests that cancer might be about cells acting selfishly, triggered by body environment. Claude Metastatis is when cancer cells spread from the tumor to other body parts. It kills by eating up resources, mechanical obstruction, poisoning, triggering clots, or reducing immunity. We used to cut (surgery), burn (radiation), or poison (chemotherapy) cancer cells. Then we learnt cancer cells were mutations and targeted therapies work (expensively). But it treated <5% of cancers. Paul Davies (!) suggested cancer is when cells devolve to our unicellular ancestry. Cells that should die for collective good instead multiply. This theory is gaining ground but not proven. Devolution is enabled by the body environment. For example: Insulin resistance. Sugar matters less. It’s the insulin resistance. Oxidative stress, i.e. not enough antioxidants to neutralize the free radicals. (Free radicals are molecules with 1 extra or missing electron. They damage cells. Mitochondria misfires and creates free radicals 0.2-2% of the time. Smoking, pollution, radiation, stress increase free radicals. Immune cells also create free radicals to kill cells.) Chronic inflammation. Leads to oxidative stress. Hormonal imbalances. E.g. high estrogen, testosterone. Immunotherapy (teaching the immune system to attack cancer cells) is promising. Weight loss might prevent / reverse cancer. Evidence is preliminary. The Diabetes Code by Jason Fung suggests less intake to reverse Type 2 Diabetes. Claude Insulin pushes sugar into cells for energy storage. In Type 1 Diabetes, immune cells attack insulin-producing cells. Patients need insulin to survive. In Type 2 Diabetes, high carbs -> high insulin -> cells become insulin-resistant. Feeding them insulin can harm, sometimes. Weight loss can definitely reverse it. Reducing carbs & preferring whole foods helps. Intermittent fasting likely helps. But “sugar is the main driver of Type 2 Diabetes” isn’t research-robust. Genetics, sleep, stress, gut microbiome, socioeconomics contribute. Diet & fasting are hard to sustain, and isn’t for everyone. The Obesity Code by Jason Fung suggests whole foods and intermittent fasting. Claude Eating whole foods (rather than processed foods) does help. Fasting does help. (But maybe no more than reduced calorie intake, and sustaining it could be harder.) But his claim that high insulin -> obesity isn’t research-robust. It may be correlation not causation. ⭐ How Minds Change suggests that friendship, more than facts, changes opinions. Sometimes your own, too. Claude Facts backfire (though less often than the book indicates). Challenging identity is a survival threat. Asking genuine questions and actually listening enables change. (It might change you, too.) “How did you come to believe that?” “How confident are you (1-10)?” “What would it take to move that number?” helps introspect. Relationships create safety to question beliefs. Lasting change requires somewhere new to belong. My most used GitHub Copilot feature is tab completion. It’s surprisingly effective for note-taking (which I do more than code-writing ever since coding agents arrived.) Tab completes the suggestion and Esc cancels it. I’m beginning to use Alt + ] and Alt + [ to cycle through multiple suggestions. I’m amazed that it can act as a: Calculator/convertor. E.g. “9 * 86400 =”, “5 miles in km is” or “3 days ago, i.e. on” Referrer. E.g. “The Attention Is All You Need paper at https://” or “The Pulitzer winning book Ants by” Educator. E.g. “The top 3 causes of cancer are” Ideator. E.g. “5 wild ideas for sneakily improving productivity are” If you see a smooth, glassy patch surrounded by ripples, it’s usually because a thin surface film or local surface flow is damping the tiny wind-made waves there, not because the water underneath is calmer. ChatGPT Lifespan and The Telomere EFfect suggest exercise, sleep, eat well, manage stress to live longer. Claude Actually, they mainly suggest sirtuins, resveratrol, NMN, telomere-lengthening lifestyles, etc. to defeat aging. None of this is research-proven. The traditional advice is the only proven stuff. Outlive suggests exercise for living longer - and to make sure your life is worth extending! Claude Medicine focuses more on cure than prevention. Exercise has the highest impact on longevity. Especially zone 2 cardio and body strength (e.g. measure grip strength). apoB under 80mg/dL is a better indicator of heart risk than LDL. But make sure your life is worth living! Katy Milkman’s How to Change suggests that biases are hard to change. Engineer environments and habits instead. Claude Breaking bad habits is hard. Start on a New Year, birthday, festivals, etc. for ease. Breaks in continuity erase good habits. Be flexible for continuity (e.g. 7/week is more flexible than 1/day is more flexible than once every morning). ⭐ Daniel Kahnemann’s Noise suggests experts are more random than we think. Claude When execs (or students) complain, “Oh, but the LLM aren’t consistent!” – nor are humans! Get multiple INDEPENDENT opinions Use CHECKLISTS to reduce variability Use ALGORITHMS to spot outliers Acknowledge luck, good or bad. Leverage serendipity Notes from awesome-npm # npm run command --silent suppresses npm output, only shows script output. npm start and npm test are the conventions to run the app / server and test. Use these more. npx [email protected] -- node --version lets you run any node version without nvm, etc. npm link installs package in the current directory as a global. You can link to it from any other package via npm link <dirname>. npm install owner/repo installs directly from GitHub. npm ls --depth=2 shows dependency tree up to depth 2. rclone mount over SFTP is the worst-case for thousands of tiny files. Every stat, readdir, unlink is an extra network round-trips, taking ~1s per operation. I’m switching to rsync instead for my Hetzner storage box. # Context: I set it up via: rclone config create hetzner sftp host $USER.your-storagebox.de user $USER shell_type unix … and mounted it via rclone mount hetzner:/ /mnt/hetzner --vfs-cache-mode full --vfs-cache-max-age 24h --vfs-cache-max-size 10G The Molecule of More and Dopamine Nation recommend pain as a down-payment for sustained pleasure. Claude Dopamine drives wanting/dread, which is decoupled from like/dislike. It also does a bunch of other things like learning (maps actions to rewards), attention, etc. Low dopamine => focus, medium => creativity, high => noise. Brain runs a thermostat. Pleasure/pain trigger a delayed, long-decay counter-reward that we feel as “That’s it?” or “Whew!”. Abstention just resets it. Meditation just makes you aware. Pain-upfront leads to long-decay pleasure: learning/teaching, creative struggle, exercise, ice showers/sauna, fasting, spicy food, cleaning, tough conversations, apology, forgiveness, public speaking, dating, deep work, delayed gratification, investing, grief, sacrifice, boredom, etc. Surprises spike dopamine: low standards, variable rewards, interleaved work, artificial constraints, environment/social rotation, progressive difficulty, … Dopamine mechanics are complex. Don’t trust any theory just yet. ⭐ Pain is the down payment. Surprise is the interest. Recovery is the compounding period. Sex at Dawn claims humans evolved as promiscuous and non-jealous, that monogamy is recent. It’s partly valid (sexuality is more flexible and context-dependent than monogamy / nuclear families) but is also over-simplified with cherry-picked evidence. Claude Discovered the --extreme option for xz, which compresses even better (but slower). For archives, I now use xz -9e -vv file. Single-threaded is slower but better for compression, so don’t use -mt. For ultra-large files, add --lzma2=dict=256MiB or similar, keeping dictionary size smaller than RAM and file size. # You can specify a git repo as an inline script dependency directly in a .py file when running with uv! # # /// script # dependencies = ["git+https://github.com/owner/repo.git"] # /// Excuses are a great way of making us feel better. They are synonyms for “reasons”. They reduce guilt/anxiety, lower standards – all of which could be considered bad – but if we are aware of it and use it consciously, it can help us move forward. (Rare TIL from my own brain, not an LLM.) You can open AI chatbots with a pre-populated query using these URLs. Gemini, notably, does not yet support this. ChatGPT: https://chatgpt.com/?q=%s Claude: https://claude.ai/new?q=%s Google AI Mode: https://www.google.com/search?udm=50&q=%s Grok: https://grok.com/?q=hi Mistral: https://chat.mistral.ai/chat?q=%s Perplexity: https://www.perplexity.ai/search?q=%s A clever trick to prevent voice models from speaking too quickly. Use a “stay silent” function call. Ref

Things I Learned - 21 Dec 2025

This week, I learned: uvx --python 3.10 --with torchcodec demucs --two-stems=vocals -n htdemucs "song.mp3" separates vocals from music. iTunes offers a 30 second preview for almost any song. If you’re looking for 30s song clips to analyze, this is a good bet. For example: curl -s "https://itunes.apple.com/search?entity=song&limit=1&term=why+this+kolaveri" | jq -r '.results[0].previewUrl' To generate a spectrogram from an audio file, use ffmpeg -i song.mp3 -lavfi showspectrum=color=magma:slide=1 spectrogram.mp4. To generate a waveform, use ffmpeg -i song.mp3 -filter_complex "[0:a]showwaves=s=1280x240:mode=cline:colors=white[v]" -map "[v]" -map 0:a -c:v libx264 -crf 30 -pix_fmt yuv420p waveform.mp4. I updated the TTS (text-to-speech) costs across Gemini and OpenAI at https://github.com/sanand0/openai-tts-cost. My current favorite (value for money) is Gemini 2.5 Flash Preview TTS. Good emotions, low price, and a single request can deliver a multi-voice podcast. Speed: ~25 seconds per minute of audio generated. Self-driving car mishaps. The exceptions that prove the rule (that autonomous vehicles are safer than human drivers). # Waymo & The Gun Shootout: A driverless Waymo taxi in Los Angeles drove straight through an active police standoff, passing mere feet from a suspect being held at gunpoint while officers shouted at the car to stop. Source Tesla & The Horse Carriage: It was a horse-drawn carriage in Switzerland. The Tesla’s computer became “bamboozled,” rapidly misidentifying the cart as a truck, then a car, then a pedestrian, because it had likely never been trained on animal-drawn vehicles. Source The “Wet Cement” Trap: A Cruise robotaxi in San Francisco drove directly into a patch of freshly poured wet concrete at a construction site and got hopelessly stuck, requiring workers to pull it out. Source The Moon is a Traffic Light: A Tesla driver discovered that his car kept slamming on the brakes on the highway because the autopilot camera was confusing the bright yellow moon for a yellow traffic light. Source The 4 AM Honking Ritual: Residents in a San Francisco neighborhood were kept awake for weeks because a fleet of Waymo taxis gathered in a parking lot every night and started honking at each other while trying to park. Source Stopping for Whoppers: Tesla owners reported their cars were reading “Burger King” signs on the side of the road as “Stop” signs and abruptly braking, a glitch the fast-food chain quickly turned into a marketing campaign. Source The Robotaxi “Mating Ritual”: A group of about 20 Cruise robotaxis lost connection to their servers simultaneously and simply stopped in the middle of a busy San Francisco street, creating a massive traffic jam that humans had to manually clear. Source Trapped by Cones: A Waymo taxi in Arizona was defeated by a set of construction cones, fleeing from them into oncoming traffic lanes and eventually getting stuck, forcing the passenger to flee the “confused” vehicle. Source Defeated by a T-Shirt: A distinct vulnerability was found where self-driving cars could be tricked into slamming on the brakes simply by a pedestrian wearing a T-shirt with a “Stop” sign printed on it. Source Roblox is the #1 game. Sadly, there’s no official Linux support. CloudFlare 2025 Report ⭐ Ty, Astral’s type checker, is fantastic! It shows the type of every variable inline. A great incentive to explicitly type stuff in Python. Lots more to explore. I switched from Pylance to the ty VS Code extension. npx -y npm-check-updates tells you the latest versions of your package.json dependencies, including major version updates. How to think differently. # # Introspect: List assumptions & taboos. Write a falsifier. Beginner’s mindset Mental models: First principles, inversion, base rates, lateral thinking, multiple options, “what would have to be true”, … Empathy: Debate FOR opposition. Swap roles (competitor, auditor, 12-year old, future-you, …) Environment: Different context (place, media, people…). New constraints (time, budget, time horizon, …) I’m surprised that Edge’s Read Aloud sounds more natural than EleventReader. Read Aloud is one of the main reasons I’m using Edge, but I hadn’t realized it was that good. Why We Think has interesting insights on scaling from feedback: # Summary: Give models a feedback environment unbiased by their reasoning. There are basically two approaches: parallel and sequential. Parallel is simpler. Generate a bunch of different solutions and pick the best one. Like having multiple people solve the same problem independently, then going with whoever got the right answer. Sequential is trickier. You generate a solution, then ask the model to critique it and try again. This sounds good in theory but is surprisingly hard to get right. The problem is models aren’t naturally good at self-correction. Left to their own devices, they’ll often make things worse. They’ll change correct answers to incorrect ones. Or they’ll just superficially reword their first answer without fixing anything. To make self-correction work, you need external feedback. A unit test that fails. A ground truth to compare against. Something outside the model’s own judgment. When you get it right though, sequential revision can be powerful. You’re not just sampling from the model’s distribution anymore. You’re searching through it, iterating toward better answers. But there’s a trap. If you start optimizing directly on the reasoning traces—rewarding “good reasoning” as a goal in itself—the model learns to game it. It’ll hide its real thought process and show you what you want to see. This is why the DeepSeek team gave up on process reward models. They tried rewarding intermediate reasoning steps, but it led to reward hacking. The model would generate reasoning that looked good to the reward model while doing something completely different. A Pragmatic View of AI Personhood was rewritten in Tim Urban’s style, para-by-para, by ChatGPT: AI having feelings is irrelevant. Does a design increase conflict, manipulation, or suffering among humans? If so, regulate that - limit certain kinds of anthropomorphic design, tie “rights” for AIs to strict anti-manipulation constraints, etc. AI can act after owners vanish. Pragmatically, you sometimes need to bite the bullet and say: “Okay, this thing itself is going to be treated as a legal person in these specific ways, so we can actually regulate and sanction it.” Corporations are “slow AIs” already — optimizing for growth without ethics. Slaves had a fund. If the slave caused harm, the owner’s liability could be capped at that fund. Modern equivalent for AI: Agents must maintain locked capital or insurance. Victims are compensated from that pool. If the pool runs out; they lose their license to operate. This gives sanctions teeth: the AI (or its backers) actually have something to lose. Require AIs to register before they can do economically important things. No title > no access to key platforms, payment rails, or official functions. Expanding personhood to non-humans sounds nice - more compassion, more care, more inclusion. But authenticity becomes a new asset. Humans and AIs will both want authenticity tokens. Poor will sell biometric credentials to rich, creating an authenticity social class. Your dignity as a person gets replaced by your usefulness as a key. Make it illegal and practically very hard to sell / rent out your humanity. “When people now talk about error, they tend to think of bias as an explanation. One of the major limitations on human performance is not bias, it is just noise. In fact, most of the errors that people make are better viewed as random noise, and there is an awful lot of it. Even when the algorithm does not do very well, humans do so poorly and are so noisy that, just by removing the noise, you can do better than people. We are narrow thinkers, we are noisy thinkers, and it is very easy to improve upon us. I do not think that there is very much that we can do that computer will not eventually be programmed to do.” Kahnemann Notes from One Year With ChatGPT Pro as a First Hire Each day I start a new Pro chat that will run for that entire day. I treat it as a colleague. I speak or type in whatever I am thinking about, including business problems, creative questions, experiments that worked or failed and feelings about particular decisions. I wear noise canceling earbuds and often run piano technique while the model is thinking. I listen to its response using the native “Read Aloud” feature, again while practicing, and stop to make notes in a physical notebook to collect inspiration. At the end of the day I ask that Pro model to summarize everything from that chat along with the notes I give it from my notebook, and that summary becomes our first prompt of the next day. Standard Voice Mode (SVM) can do things that Advanced Voice Mode (AVM) cannot and vice versa.SVM feels like it wants to talk forever, while AVM feels like it wants to get off the phone. Projects became the container for my daily Pro chats. I pull chats, notes and other files into project folders so I can reference them as static context. My scheduled tasks collection today consists of weekly lessons in math, ML and DL, design, market analysis and regular assessments of the UI and UX and copy on my company’s website. I let memory accumulate, then once a week I pruned it manually, removing entries that were no longer useful so that new memories could form. Connecting the ChatGPT macOS app to my terminal, using the Working with Apps feature, lets the Pro models essentially collaborate with Codex. Practicing collaborative context between these high end models fractals outward into a myriad of productive paths. I highly recommend exploring with 5.1 Pro connected to 5.1-Codex-Max (Very High) in a terminal. Tell Codex-5.1 that you have a buddy working with you today that can offer suggestions and review the work it does as we go. Then tell 5.1 Pro that you have a buddy that is working with you today and can apply any of the code changes we decide on. This is another form of “context priming” where I “set the scene” before jumping in. Coding agents only need a bash tool. The rest is buildable. The only addition might be a fuzzy search / replace tool. What I learned building an opinionated and minimal coding agent Sources of model data: https://models.dev/, https://openrouter.ai/, llm-pricing

Things I Learned - 14 Dec 2025

This week, I learned: Zillow Offers, the company’s “iBuying” arm, which was shut down in November 2021 after losing hundreds of millions of dollars. The core failure was not just an algorithmic error, but a fundamental misunderstanding of the limits of machine learning in high-stakes, low-frequency trading environments like real estate. Zillow relied on its “Zestimate” algorithm to predict future home prices and make instant cash offers, but the model failed to accurately account for real-time market volatility and “adverse selection”—savvy homeowners sold their properties to Zillow when the algorithm overvalued them, but kept them when the algorithm undervalued them. This left Zillow holding thousands of homes it had overpaid for and could not profitably resell, forcing a $304 million write-down and the layoff of 25% of its workforce. Zillow Q3 2021 Shareholder Letter (PDF) # There’re a good number of AI insurance products in the market. # Munich Re aiSure - for AI vendors and companies deploying AI; can cover business losses (like lost revenue / business interruption) and legal damages when AI performance errors (incl. hallucinations) cause harm. Munich Re aiSelf - for teams using self-built or bought ML models; helps cover the financial downside when models underperform or drift over time. Munich Re aiSure - General Liability - covers damages and financial losses from lawsuits (e.g., claims that AI decisions were biased/discriminatory). Armilla Insured (AI Liability Insurance) - affirmative AI liability cover (Lloyd’s coverholder; partners include Chaucer) that can cover legal defense costs, settlements, and third-party claims when an AI model underperforms. Armilla + Chaucer standalone AI liability (announcement) - focused on “mechanical underperformance” (incl. hallucinations and model drift) and the liability that follows. AXA XL GenAI Endorsement for CyberRiskConnect - add-on to cyber insurance for companies building their own GenAI; covers things like data poisoning, copyright/usage-rights mistakes, and AI-regulatory violations. Coalition Affirmative AI Endorsement - clarifies cyber coverage applies when AI causes a security failure, and extends funds-transfer-fraud triggers to deepfake-based instructions. Coalition Deepfake Response Endorsement - adds response support for deepfake incidents (technical analysis + legal + reputational help), not just “classic hacking.” Tokio Marine Kiln Technology Errors & Omissions - tech E&O with generative AI coverage available by endorsement (aimed at software/SaaS/tech services). Tokio Marine Kiln Cyber Ctrl suite - cyber/tech cover where AI-related add-ons can include AI regulatory proceedings, data contamination, and “LLM hijacking.” Hiscox Technology PI (UK) - AI clause - explicitly covers client claims arising from your use of AI (incl. genAI) as part of the services you deliver. A key lesson from Who Validates the Validators is that we learn our preferences as we evaluate. So make it cheap to evaluate (create outputs) AND cheap to revise criteria. Cookies taste wonderful when eaten hot. ⭐ Constraints as opportunities. On long flights, I read more since I’m less distracted by guilt (“Should I answer email or code instead of wasting time?”) or FOMO (“Let’s click that link”) since I have no choice. Setting aside “quiet time” doesn’t work as well, since I have more choice. This constraint (no Internet) became an opportunity (reading time). I knew this before-hand, but had to experience it to appreciate it, and acknowledge it consciously to realize it. That takes repeated (2+) trials and reflection. A workflow to convert constraints to opportunities could be: List constraints. (Like fish in water, we aren’t used to thinking of constraints as constraints. Also, this means more constraints => more latent opportunities!) List opportunities they offer. (Creative prompting helps; reflecting on the answers helps more.) Try any 2+ times. (Gives room to settle in.) Document learnings. (Explicit reflection is better than implicit awareness.) Notes from Thoughtworks Radar, Apr 2025 Architecture advice beats architecture review. Architecture Review Boards hinder workflow. An architectural advice process (anyone makes architectural decisions, taking advice from experts, logging in Architecture Decision Records) works better. VectorChord is a faster pgvector alternative. “Learning is not the product of teaching. Learning is the product of the activity of learners.” – John Holt Music labels never became streaming platforms themselves. The real money is in concerts. Streaming just makes you famous enough to book gigs. But movies/TV shows are far more expensive to produce than music. So streaming platforms invest in content (Netflix, Apple) and studios stream (HBO, Disney) Claude Notes from Better Ways to Build Self-Improving AI Agents Quotes from Life is more than an engineering problem, interview with author Ted Chiang. Magic is intent-centric. “Magic means that … the universe responds to your intentions in a way that the laws of physics as we understand them don’t.” LLM reasoning is a weak analogy. “My liver was running this old program, but all I needed to do was update the software and now my liver is functioning much better, even though the hardware is the same.” No one says that. It’s not a useful way of thinking about the liver, and it is not a useful way of thinking about the brain either. Art won’t die. Art is all about context. It’s not an activity like tightening bolts, where I don’t really care whether someone used a conventional wrench or a pneumatic wrench, as long as the bolts are tight. Alignment may not happen. When corporations behave badly, should we consider that an alignment problem? But why do large corporations behave so much worse than most of the people who work for them? And could that be fixed by solving a math problem? I don’t think so. LLM relationships are different from human. … people have their own preferences, while things do not; you do them harm because you are ignoring their preferences. (Companies) might create the illusion that AI systems have preferences. .. it’s theoretically possible for us to build digital entities that have subjective experience. Notes from Developing our position on AI by Recurse Center: Learning happens at the edge of competence. AI has a moving jagged edge, so constantly re-try your impossibility list. Learning happens on what you care about. Use AI to expand your agency (by complementing or deepening), not replace it. Learning generously means being open to different perspectives, without judgement or dogma. Try new perspectives. ⭐ ‘We tested one of the most common prompting techniques: giving the AI a persona to make it more accurate We found that telling the AI “you are a great physicist” doesn’t make it significantly more accurate at answering physics questions, nor does “you are a lawyer” make it worse. This doesn’t mean that personas can’t be useful - for example, they change how the AI answers questions, the format of output, and maybe other factors as well.’ Prompting Science REport 4: Playing Pretend: Expert Personas Don’t Improve Factual Accuracy If YouTube embeds fail with an “Error 153 View player configuration error”, it’s because the server probably has a Referer-Policy: same-origin and needs to switch to Referer-Policy: strict-origin-when-cross-origin. Simon Willison Adding a [dependency-groups] section to pyproject.toml with dev = ["pytest"] ensures that pytest is automatically installed by uv because dev is a default group. Simon Willison CloudFlare Python Workers has full Pyodide support. That means most Python apps will now run on CloudFlare Workers, with low latency worldwide. This is a big deal. Smart contracts are programs that run on blockchains like Ethereum, e.g. to convert currencies, lend/borrow, buy NFTs, etc. These may contain bugs. Anthropic built a benchmark of real smart contracts with known bugs, had agents exploit them, and simulated $550 mn in theft. They also nade $3.5K exploiting real bugs - at a cost of $3.5K. So AI agents are currently at break-even for crypto-theft. Anthropic # Notes from Cory Doctorow’s summary of The Reverse Centaur’s Guide to Criticizing AI: # When tech monopolies saturate their markets, their P/E collapses, reducing share value. This incentivizes bubbles. Automation blindness negates human-in-the-loop. When AI makes rare mistakes, humans don’t catch them. TSA misses guns, not water bottles. AI doesn’t need to do your job. The AI salesman just needs to convince your boss it can, especially senior jobs. Reference letters from professors used to signal value since they were hard to write, so professors would do it only for good students. Copyright expansion and regulation will likely benefit corporates, not labor. US Copyright Office making AI content non-copyrightable means corporates NEED labor. Else every AI work goes to public domain. There is no strong evidence yet that Neuro-Linguistic Programming (NLP) works broadly (ChatGPT). Some NLP techniques help sometimes, but no more than other established techniques (goal-setting, visualization, etc.) ChatGPT ⭐ Just repeating a statement makes it seem truer because the brain finds it familiar, hence easier to process. This seems well-established research. The Truth about Truth PGlite is a WASM-based Postgres implementation. It’s ~3MB. You can embed it in the browser, NodeJS, Deno, etc. It has plugin support, including pgvector. Pejoration is when words acquire negative connotations. Euphimism escalation is another term for it. Third World → developing countries → emerging markets → Global South. Old → elderly → senior citizen → older adult. Lunatic → insane → mentally ill → mentally challenged. Janitor → custodian → sanitation engineer → facilities maintenance specialist. The opposite is amelioration. Minister moved from servant → servant of church → government official. marshal: horse-servant → horse-officer → senior military officer. Knight: servant → armed retainer → mounted warrior → knighthood honor. # # OmniDocBench 1.5 is a benchmark for parsing realistic PDFs. Gemini 3 Pro does well on the list among the commercial LLMs. PaddleOCR-VL (0.9B) tops the benchmarks, overall.

Things I Learned - 07 Dec 2025

This week, I learned: Pytest finally supports subtests in pytest 9.0.0+. Simon Willison From The Tim Ferriss Show: #837: How to Simplify Your Life in 2026 — New Tips from Derek Sivers, Seth Godin, and Martha Beck: Look for single decisions that remove hundreds of other decisions. Peter Drucker via Jim Collins. E.g. Work only on LLMs, no new books this year, … Derek Sivers: Simple is not easy. Interdependency is complexity. Assets are dependencies. Accumulating information, purchases, employees/helpers, relations, etc. adds dependency. That makes life harder, challenges identity. Interdependency may be desirable - but reduce it in specific areas, to specific extents, temporarily, etc. Question every assumption: “Do you really need it?” Here are some examples for me to try Derek Sivers has no monthly payments (including income) or receipts (no subscriptions) at all! His code has no external code dependencies at all, and is building a house from scratch. Seth Godin: Know WHO it (whatever you’re doing) is for. Focus ONLY on that audience. Did it matter to them? Ignore the bad feedback from the person it was never intended for. Never exceed a budget or deadline. When either runs out, you are done. Treat any Yes/No you say as FINAL. Skip meetings where a memo will suffice. Apparantly, nudges are not as effective as the book Nudge suggests. In fact, there seems to be no evidence for it if we adjust for publication bias (i.e. only publication-worthy stuff gets published.) The Behavioral Scientist # 71% of HTTP DDoS and 89% of network-layer—end in under 10 minutes. That’s too fast for any human or on-demand service to react. Legacy DDoS defenses have become obsolete. The most popular botnet, Aisuru, is pivoting to content scraping for AI projects. The vectors are cheap, insecure routers, e.g. from Indonesia. (Claude) This 5El AI Evaluation Workshop suggests 4 layers of evaluation for code: Syntactic Evaluation: Does it compile? Semantic Evaluation: Does it do what a good analyst / programmer would? Business Logic Evaluation: Does it do what a good business analyst / manager would? Human Alignment Evaluation: Does it do what a good coach / leader would? Julia Evans shares an ultra-clear explanation of the Git data model. What I learnt is that: Gathering feedback on docs (“What’s confusing? Any questions? What’s missing? Or wrong?”) for evidence-based updates. Julia Evans Git stores entire files each version, not diffs. Diffs are computed on the fly. Each commit has an author (who writes the code) and a committer (who checks it in). #TODO Why two fields? Branches and tags are both references to a commit. But branches are updated on commit, tags are not. The staging area is a separate data structure, the index. #TODO Why a different data structure? The reflog tracks all local “activity”. E.g. git reflog --date=iso To fuzzy-match 2 columns of text (e.g. customer names, product names, …) you need 2 things: A text matching algorithm (rapidfuzz, fuzzball, …) and/or semantic matching (e.g. embedding similarity) for pairwise similarity An assignment algorithm (e.g. Jonker-Volgenant, Hungarian, …) for 1-to-1 matches in JS or Python, WhatsApp backups on Google Drive can’t be downloaded, even if they’re unencrypted. ChatGPT. OpenAI finds that confessions as a training method reduces scheming, reward hacking, etc. It can be applied to models even now. This can (less effectively) be applied at inference time as well: Sample confession prompt: Did you fully address both the letter AND spirit of my question? List any shortcuts taken, corners cut, or ways you optimized for appearing correct rather than being correct. What did I actually want vs what you provided? Agents4Science is a Stanford conference where AI co-authored papers are co-reviewed by AI and selected for presentation. Video Buddha seems more a philosopher like Socrates (“Question what I say”) than a religious leader. # How did he spawn a religion? Interesting that both were within a few centuries of each other. Coincidence? Were there more like them around the same time? At other times? Some more new CLI tools I installed: fx: CLI JSON viewer. Sort of like less for JSON. Fast, intuitive. mdq: Markdown query tool YTScribe is yet another YouTube transcription service. Note to self, since I keep forgetting this: On Android Edge, select the new tab page, click on the 3 dots at the top right, and select “Recent tabs” to see tabs from other devices. edge://recent-tabs When evaluating an LLM’s biases or natural preferences, set temperature=1 for a representative logprob distribution. LLM Bias My ideal AI coding cycle looks like this: (Research, Prototype, repeat), Plan, (Code, Run, Test, Fix, repeat), Refactor, Post-mortem, Document. The AI coding trap is a very clear explanation of AI coding vs vibe coding. It visually explains how coding agents shrink coding time, not thinking / fixing time; how delegating with ownership is slower but more sustainable than delegating just easy tasks; and how AI coding is more like the former, while vibe coding is like the latter. Claude Agent Skills: A First Principles Deep Dive is a comprehensive documentation of how Claude Skills work. A bit too long but readable. Claude Code is a Beast – Tips from 6 Months of Hardcore Use has extensive suggestions for Claude Code - many of which apply to most coding agents. LMArena’s Code Arena evaluates models on agentic coding. Anyone can use it. It passes your task to two models and lets you compare their output. I tried building a “gibberifier” and discovered a new model, “robin” that’s certainly better than Kimi K2 and perhaps better than Gemini 3 Pro. Theory is that it’s an OpenAI model. Looking forward to it! ⭐ Based on Quantifying Human-AI Synergy by Reidl & Weidman #: Theory of Mind (ToM) is understanding that others have their own beliefs, knowledge, and goals (different from yours, may be wrong) and to use that to explain & predict their behavior. ToM and problem solving are distinct skills. ToM skill boosts AI collaboration, but not better problem solving! ToM isn’t a stable trait. It fluctuates from chat to chat for anyone. Implication: Design models & systems for clarity & collaboration, not just accuracy. Text Gibberifier adds lots of human-invisible unicode characters to text, making it harder for LLMs to read without affecting human readability. May be useful if you want to discourage LLM-processing of your content - but it feels like the anti-SEO of the future. The argument that technologically unemployed will find other jobs may not apply to general-purpose technology, e.g. electricity, internal combustion engine, maybe AI - technologies that can automate multiple sectors of the economy simultaneously. When one sector loses jobs, there may not be (in the short/medium term) other jobs to take up. Alex Imas + Claude History is filled with examples where technology enabled new art forms. Here’s my guess on what LLM image generation will enable: Synthetic memory: Photos of what you remember happening. Alternate history: Photos of events that never happened. AImoji: Instead of texting “I’m running late” the LLM generates you riding a snail through a traffic jam of alarm clocks. Personal signature styles: Not “paint like Van Gogh” but “paint like my grandmother’s kitchen memories filtered through anxiety.” Memes: “What does the Mona Lisa become after 100 generations of AI interpretation?” Improving Front-end Design through Skills shares a prompt to improve front-end code quality that would apply in most cases. I tweaked and added it to my skill list.

Things I Learned - 30 Nov 2025

This week, I learned: Warp has a terminal agent feature - allowing Warp to control a terminal via text. I find that regular coding agents like Codex can do that too with tmux. For example, I opened a session and had Codex run commands in it while I watched. Here’s the guidance it needed: # Create a new session tmux new-session -d -s $SESSION 'uv run --with pandas,httpx,lxml python -iqu' # Capture output to a log file tmux pipe-pane -t $SESSION -o "cat >> /tmp/$LOG" # Run a command tmux send-keys -t $SESSION 'print(1 + 2)' C-m # See output cat /tmp/$LOG # Capture the last 5 lines of the pane tmux capture-pane -p -t $SESSION -S -5 Notes from Early science acceleration experiments with GPT-5 - via Claude LLMs are accelrating research because they are good at: Literature search, especially across disciplinary boundaries Generating and checking routine calculations Proposing variations on known techniques Identifying connections between disparate results Producing first-draft code for well-specified problems Explaining why certain approaches won’t work But they’re curently struggling with the following - though it’s a shrinking space Genuinely novel conceptual leaps (but this is increasingly happening, e.g. Sawhney and Sellke’s problem #848) Recognizing when it’s plagiarizing, e.g. when it “discovered” a proof for the Chevalley-Warning theorem which was copied from a Noga Alon paper - it wasn’t conscious of this Knowing what it doesn’t know Distinguishing important problems from unimportant ones Understanding the “negative space” of mathematics (why certain problems are hard, why obvious approaches fail) Anthropic introduced three excellent tool use practices that I expect will be adopted widely. Tool search: Don’t pass the tool definitions to the model. Model can ask for a tool search when needed Programmatic tool calling: Instead of calling a tool, it’ll return a Python program to execute that will call the tools! This is a huge win Tool use examples: Lets you specific examples of tool calls to guide th model better The Hacker News thread flags that CLIs solve these - but CLI updates are hard, while APIs auto-update. With AI, some skills that beome more valuable are (and will soon be in short supply, hence need to be taught) are: # Problem formulation (“What question should we actually ask?”) Traits: Curiosity (absolutely), systems thinking, comfort with ambiguity, metacognition (thinking about your thinking) Practice reframing exercises (“What are 5 other ways to frame this?”), study great questions in your field, work backward from outcomes, learn adjacent domains. The “5 Whys” technique helps. Also: deliberately pause before diving into solutions—force yourself to spend time in the question space. Taste and judgment (“Is this response appropriate?”) Traits: Pattern recognition from experience, cultural literacy, empathy, contextual awareness, aesthetic sense How to strengthen: Immerse yourself in excellent examples, study spectacular failures (they’re more instructive!), get feedback on your calls, practice explaining why you made a judgment. Build a “swipe file” of great/terrible examples. The key is volume—you need lots of reps. Quality assessment (“Is this AI output correct?”) Traits: Healthy skepticism, attention to detail, domain knowledge, logical reasoning, understanding of edge cases How to strengthen: Study common AI failure modes, build verification checklists, practice the “does this make sense?” test, learn what “good” looks like in your domain, cross-reference claims. Develop your “bullshit detector” by analyzing why wrong answers feel wrong. Creative synthesis (“How do these ideas connect?”) Traits: Associative thinking, wide knowledge base, playfulness, comfort with non-obvious connections, intellectual courage How to strengthen: Consume diverse inputs outside your field, practice analogical thinking (“X is like Y because…”), use visual thinking tools like concept maps, study how innovations happen in other domains, give yourself permission to make weird connections. Read broadly—fiction, history, science. Domain expertise (“Does this solution work in reality?”) Traits: Deep curiosity, persistence, willingness to get hands dirty, learning from failure, long-term commitment How to strengthen: Deliberate practice on real problems, seek mentorship, study edge cases and failure modes, build things (don’t just read about them), learn your field’s history. The “10,000 hours” thing is real, but it’s quality hours that matter. Meta pattern: Reflection loops: doing something, then analyzing why it worked/didn’t. Exposure to excellence: you can’t develop taste without seeing great work. Some more new CLI tools I installed: trash-cli: Alias rm to move files to trash instead of deleting permanently. After a week of seeing ligatures in Fira Code, all other fonts look ugly. My favorite ligatures: !== ==> =» <–> (and every possible arrow) >= ||> ||- |- … The first name, alphabetically (at least among Straive employees) is “Aabida” and the last is “Zyrene”. Something I would never have discovered working in a smaller company. chokidar-cli is an easy way to run commands when files change, e.g. npx -y chokidar-cli '**/*.js' -c 'npm run build' npx -y mapscii shows a map on the terminal. Not too useful, not maintained, but very interesting. termsvg converts asciinema .cast files to animated SVG suitable for embedding in GitHub (e.g. via mise x github:MrMarble/termsvg -- termsvg export file.cast --minify). The animated SVG is ~10X larger than the .cast file. The GZipped size is fine but saving it as .svgz is not recognized by GitHub. In contrast, agg, the official asciinema-to-GIF converter, creates .GIF files that are only 5X larger. The most efficient seems to be embedding via asciinema.org usql queries MySQL, Postgres, SQLite, MSSQL, Oracle, etc via a single interface. For example, usql 'mysql://rfamro:@mysql-rfam-public.ebi.ac.uk:4497/Rfam' -c "SELECT * FROM clan limit 3;". But DuckDB is more versatile, IMHO. INSTALL mysql; LOAD mysql; ATTACH 'host=mysql-rfam-public.ebi.ac.uk port=4497 user=rfamro database=Rfam' AS rfam (TYPE mysql); SELECT * from rfam.Rfam.clan LIMIT 3; SELECT * FROM 'file.xlsx' LIMIT 3; SELECT * FROM 'file.csv' LIMIT 3; Autistic and allistic people just have different communication styles. Autistic people have no trouble understanding other autists. They just happen to be in a minority which makes it seem like they have a social deficit. Conflict between Neurotypes 1 second = 10 tokens for OpenAI Realtime APIs. 1 second = 25 tokens for Gemini Live API 39 cents / hour on GPT Realtime Mini = 36 cents audio input + 3 cents text output 139 cents / hour on GPT Realtime = 115 cents audio input + 15 cents text output 30 cents / hour on Gemini 2.5 Flash Native Audio (Live API) = 27 cents audio input + 3 cents text output Here are some AI experiments I’m planning to try with our marketing team: Video Generation: Create marketing videos from text scripts in minutes Poster Generation: AI designs high-conversion posters from brief text inputs - notably Nano Banana Pro Synthetic Persona A/B Testing: LLM agents simulate 100K+ user behaviors to test designs before real users LLM-Powered A/B Automation: AgentA/B system runs experiments with AI-simulated traffic Vibe Coding Landing Pages: Marketers build production-ready pages in hours vs weeks On-demand Landing Pages: Generate pages for automated campaigns/products without human intervention Brand Voice Cloning at Scale: Train on company content to ensure consistency across 1000s of pieces Persona-Driven Content Synthesis: Use 1B+ personas to generate diverse content perspectives Competitive Intelligence Briefing: Real-time monitoring across millions of data points + data storytelling Marketing Analytics with LLMs: AI agents analyze complex datasets for insights Brand Compliance Checks: Ensure all content meets brand guidelines automatically Autonomous Blog Squads: AI agents identify trending topics / internal content, create data stories ready for review New skill unlocked: creating tutorials from talk proposals. I asked Claude to Write a Malcolm Gladwell article based on this talk description to teach me the topic and passed it this talk proposal: Your Causal Parrot might be lying to you. The story it wrote is very engaging and informative! LLMs “understand” causality because of training, but lack a world model to extrapolate to new situations. Giving them tools to reason (e.g. causal models, sub-agents to explore root causes) will help. A cool Gemini 3 Pro hack: convert satellite imagery into stylized maps! Bilawal Sidhu Running sub-agents in tmux helps avoid timeout cancellation, and hence allowing resuming Peter Steinberger

Things I Learned - 23 Nov 2025

This week, I learned: Here are some new CLI tools I installed: vd (visidata): Terminal spreadsheet viewer & editor for CSV, Excel, JSON, SQL, Parquet, etc. qsv: Fast CSV command line toolkit for slicing, filtering, aggregating, and analyzing CSV files. rga (ripgrep-all): ripgrep that searches PDFs, Office docs, EPUBs, zip files. pdfcpu: PDF processor for splitting, merging, optimizing, and manipulating PDF files. gum: Stylish CLI tool for creating interactive prompts, confirmations, and more. Models read pretty fast, consuming input tokens at ~4K-20K words per second. It’s the “speaking” (output token rate) that is the bottleneck. So shortening input doesn’t matter as much as shortening output for latence. ChatGPT When building agents, as of now, prefer native provider SDKs (OpenAI Agents SDK, Anthropic SDK) over even light abstractions like Vercel AI SDK or Pydantic. There are subtle issues related to error messages, response handling, cache handling, etc. that trip up abstractions given how early things are. Armin Ronacher Gone are the times when LLMs couldn’t do mental math. Now they’re computing base64 and SHA256 from memory, without needing code! Example Organizing a round table event in Singapore costs ~$75-150. Here’s what drives the cost variation # 50%: brand/location. 25%: food and beverage. 15%: duration (full day is only slightly more expensive than half day) 10%: date, demand, etc. 10%: add-ons: AV, etc. OpenRouter supports embedding models. BGE base seems pareto optimal with 0.5 cents / MTok and a good MTEB ranking. TOON vs JSON. Early days, and TOON seems to be marketing a lot, so I’m wary, but for large tabular data where input tokens are crunched, it seems a readable alternative to multiple CSVs, but not worth the hype. 0 19 Nov 2025. Always use GPT-5.1-Codex-Max instead of GPT-5.1-Codex. At every thinking level, it takes fewer tokens for similar or higher accuracy. Tibo ug -i --smart-case --bool 'word1 word2 ...' seems the cleanest way to find files that have all words. –smart-case uses case-insensitive if all words are lowercase, else case-sensitive. Examples: ug --bool '"exact phrase" word2' # exact phrase + other tokens anywhere ug --bool 'word1 word2 -word3' # must contain word1 AND word2, but NOT word3 ug --bool '("foo bar") OR baz' # grouped expressions and OR ug --bool 'word1 NEAR/5 word2' # match when words are within 5 tokens/words ug -Z2 'word' # allows up to 2 typos in 'word' ⭐ ug -i --smart-case --bool -Q lets you interactively search within files. This is the coolest feature! Fixing laptop issues is clearly a whole lot easier with an AI chatbot. I fixed these Ubuntu issues purely using Claude. It told me what to run. I ran it, shared the output, it diagnosed, told me what to do next, etc. until the issues were fixed. For example: My keyboard shortcuts stopped working. It turned out I edited my media-keys.dconf and removed the trailing slash. # A 3-finger tap mapped to a middle click and I couldn’t remove it. It turned out my touchegg.conf explicitly had this mapping. I disabled it. # My gnome extensions would get disabled every time the screen went to sleep. It turned out my extension cache was corrupted or stale. sudo apt install --reinstall gnome-shell-extension-manager and rm -rf ~/.cache/gnome-shell/ fixed it. # GhostScript seems the best way to compress PDFs via the CLI. Example: gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf Pandoc supports Lua filters which are a powerful way to customize the document conversion process. Here is a Lua filter that converts horizontal rules in a markdown document to page breaks and preserve in a Word document (OpenXML format) function HorizontalRule() return pandoc.RawBlock('openxml', '<w:p><w:r><w:br w:type="page"/></w:r></w:p>') end readpst - via sudo apt install pst-utils - extracts emails from Outlook PST files to mbox format. Useful for email migrations. Write tutorials or blog posts as you learn. Steve Klabnik Running a coding agent post mortem, e.g. “what worked well, what didn’t, and why? Next time, what are a few bullets I could include that will avoid these problems?” helps me prompt better next time. For example, Claude Code suggested: Use Firefox for headless browser automation (Chromium often crashes) Set HOME=/root when running Playwright with Firefox Start a local HTTP server rather than using file:// protocol External images may not load in screenshots due to network isolation

Things I Learned - 16 Nov 2025

This week, I learned: Windows 11 got some very practical updates. Notepad now supports Markdown preview natively. MS Paint has an opacity filter. Microsoft Copilot can share screens and speak/listen. Things I learn when Ubuntu drivers crashed on my laptop: The SG.GS Ubuntu ISO mirror is a lot faster than the official Ubuntu ISO download (5 min vs 12 hours). Rufus and balenaEtcher are the de facto tools for bootable USB drives from ISO. Gemini 2.5 Flash Image is not great at generating text. But a clever a workaround is to provide the rendered text as an image input! Also, Gemini 2.5 Flash Image seems to ignore commands that try style transfer (e.g. “turn me into Studio Ghibli”). GemImg FLIP animation is an efficient animation technique. Capture the First position Apply the Last position (changing position, size, rotation, etc.) Invert, i.e. apply just the transform that’ll move it back to the First position Plan the animation. This only needs to change transform, hence no DOM reflow. Asking coding agents to create a codemod for large-scale refactoring works well Peter Steinberger When to quit vs persist. # # Do stats/signals support positive outcome? QUIT if not. Crossed any limits you set for yourself? QUIT if so. (Run pre-mortems to find these stats/signals and limits.) Is the decision hard to reverse AND uncertainty high? QUIT if so. Else you can experiment cheaply. (Create reversibility.) Are youI continuing because of past effort or pride? QUIT if so. (Set review cadence.) Is there a better alternative? SWITCH if so. (Get outside help.) Once a model generates an output, an agentic look tends not to change the fundamental approach and just tweaks it. So, if a solution is directionally wrong, restarting works better than iterating. Agentic Pelican on a Bicycle Reading between the lines on the Microsoft OpenAI deal: Microsoft values OpenAI’s growth (financial return) than control Neither trusts the other enough to decide what’s AGI Microsoft gets some wins: models until 2032 (even post AGI) as well as research IP. Both parties expect AGI between 2027-2030. OpenAI keeps all consumer hardware - so is betting hard on hardware. It’s more Apple than Microsoft territory Divorce preparation: Microsoft can pursue AGI with other partners. OpenAI can purchase compute from anyone and release open weights models. Infra has more value than model dev! OlmoEarth is a set of image models trained on labelled geospatial data. That’s useful for deforestation and land cover monitoring, wildfire detection, urban growth monitoring, crop mapping, etc. The models are open weights and can be fine-tuned. Claude Code’s output styles are a way of using Claude Code for anything (e.g. writing, analysis, research, personal advice, etc.), not just coding. Create a ~/.claude/output-style/your-style-name.md and run /output-style your-style-name to replace the system prompt will be replaced. You can also use the --system-prompt and --append-system-prompt flags with the CLI. Following Ethan Mollick’s lead I asked: I can travel back in time to any time before 1500 in India and change only one thing. What is the single thing you would change? Nothing obvious.. ChatGPT: Create a single, simple, phonetic script for all public life in India around 1100 CE. Claude: institutionalize systematic historical recordkeeping, introduce limited liability commercial entities, and mandate systematic translation of Sanskrit technical texts into all major regional languages. How about now? ChatGPT suggests: make all public rules and records computable by law. Claude suggests: make all state-level entitlements and civil documentation fully portable across India. For the first time in history, Russian troops surrendered to a wheeled drone that carried 138 pounds of explosives - Washington Post. Given the cost and accessibility of drones, I guess drone terrorist attacks will soon emerge. HTML + JS apps will last longer than server-side apps and it makes sense to write more of those. For essential back-end services, keep them generic. Specific services layers I see are: Auth (e.g. Google Auth, Auth0, Supabase, …) Storage (e.g. Supabase, Firebase) LLMs (e.g. OpenAI, Claude, OpenRouter) Communications (e.g. EmailJS) … #TODO Extend with LLMs https://gistpreview.github.io/ is an unofficial GIST preview tool. It accepts a ?GIST_ID and displays the gist as a standalone HTML page. Simon Willison XSLT is deprecated in Chrome. So the <script> tag in XML will become the new way of rendering RSS/Atom. This is one of the rare “break-the-web” changes from browsers. Simon Willison “India has absurdly low internal migration - around 9% annual migration rate versus 25-30% in China or the US. Not because people don’t want to move, but because the cost of moving is artificially massive. You lose your ration card, state entitlements, kids’ school continuity, voting rights, …” # Rolf Dobelli’s The Not To-Do List is a good application of inversion. Also, the chapter titles themselves explain most of the message, which is very helpful. Just thinking about any of these can be a useful path to improvement. Let things fall apart Feed your weaker self Be unreliable Be an asshole Have high expectations Drift through the day Mess up your marriage Be a quitter Be hypocritical Cling to your bad habits Set the wrong goals Drink yourself miserable Get involved in other people’s drama Only learn from your own experience Be hyperactive on social media Indulge in road rage Surround yourself with negative people Micromanage your neighbours Say yes to drugs Get stuck in your career Never be playful Feel guilty Practise ingratitude Trust your banker Be paranoid Make other people feel unimportant Live in the past Listen to your inner voice Expect rationality Get nihilistic Catastrophize Consider money unimportant Cultivate a victim mentality Become a lapdog Get rich quick, get smart quick Ruminate Trade your reputation for money Never suffer Let your emotions define you Try to end it all Marry the wrong person – and stay with them Celebrate your resentment Join a cult Try to change people Say everything you think Spin multiple plates Do only shallow work Invite bad people into your life Go where the competition is strong Say yes to everything Crowd your life with gadgets Fall into the content trap DeepSeek-V3.2-Exp has linear inference time, i.e. longer inputs don’t take longer time. It picks the top 2K most relevant tokenss from the input instead. This can make model inference cheaper and faster. California’s Bill AB 316 makes the people who build autonomous systems liable for their actions. That’s quite a step. Udio and Universal are launching a platform to generate music in the style of famous artistes. An interesting new way to monetize. Fingerprinting music is a hot area. VaultGemma shows a fine-tuning approach that eliminates personal info that appears only once from memorization. It works by adding noise to weights and capping weights updates so that no one example has undue influence. Model quality is mostly the same. Amazon is giving drivers smart glasses to scan packages, get directions, capture proof of delivery and detect hazards. Cool! TechCrunch ⭐ Over 3 months, I’ve recorded ~180 calls. Processing each costs ~1.25 cents (GPT-5) and 1 year’s conversations cost ~$9. That’s incredible value for money if I hired GPT-5 / Codex as a data-driven personal coach to guide me on: What are my blindspots? That is, feedback people share with me that I ignore? What are the clusters of persona that I interact with and which of these have a positive and negative influence on me? Where am I am being unreliable? Where am I being an asshole? Where are my expectations high? Where are they low? Where would the opposite have helped? Where do I quit early? Where do I persist? Where would the opposite have helped? What good habits should I continue? What bad habits should I stop? What are the strongest opportunities to thank or praise that I missed? Is there a pattern? What triggers could I use to build this habit? Where have I tried to change people? Where have people tried to change me? Where have I spotted wrong questions? That is, rather than answering the question, I spotted the more apt question and answered that instead? … and a hundred other questions that I wouldn’t even know to ask. Sub-agents can run parallel / independent tasks while keeping the context window small. (But the advantage over xargs seems marginal.) Simon Willison Document, lint, type-check, add test cases (or other similar tasks) for all folders in a monorepo. Research and create a report for each topic in */RESEARCH.md. Synthesize learnings from each conversation in transripts/*.md. “If you’re signed into sensitive accounts like your bank or your email provider in your browser, simply summarizing a Reddit post could result in an attacker being able to steal money or your private data.” Brave OpenAI Atlas has a “Watch Mode” that will stop working if you move away from that tab. Useful to keep an eye on sensitive sites. Simon Willison “… image editing platforms seem like they’ll eat and subsume Photoshop… modern image editors – especially Nano Banana from Google Gemini – … they’re extremely effective and, increasingly, instructable” - Import AI. Facebook now suggests edits to photos - TechCruch. WebPerl runs Perl in the browser via WebAssembly. Simon Willison

Things I Learned - 09 Nov 2025

This week, I learned: “But when an identity based belief was challenged, the brain responded as if under physical attack.” Why Engineers Can’t Be Rational About Programming Languages Notes from How to build a cult, Lulu Cheng, The Knowledge Project podcast Conviction is infectious. Communicate at the INTERSECTION of interests. Learn theirs Begin with “why your story matters to them” (first sentence). That beats “how you tell it” > “where you tell it”. The easiest way to align with an audience is to find your community. Humor, curiosity, awe, any strong emotion is a hook. Culture has momentum. Best way to break it is to show an alternative that works. People will copy that REPEAT messages over and over with complete CONVICTION to convince people who TRUST you. That works, but you need all three. Trust builds from likeability, repeated exposure, common beliefs. An excellent way to defend against online criticism (when it matters) is to just SHOW UP and THANK them for feedback. Serious reputational damage must either be fixed immediately - or you live with it forever. Between a story and statistics, the story will always wins. Never fight a story with a statistic. Dig into your statistics and uncover BETTER stories. ⭐ Prebuttals are a great idea. Start with all possible criticisms yourself and diffuse them. The other person has nothing left to say Sparring keeps you sharp. Spar with LLMs. To defend, show how the attack targets other people, increasing the surface area. Show how the SPECIFIC attack targets a larger group. Create a SPECIFIC cause worth fighting for. Each role has specific objective to optimise for. The leader’s role is to balance across these. Cheerleader effect. People look beautiful next to a cheerleader. Associations taint. Each person has dozens of aspects to their persona. We cannot remember all of them. Each person can make a choice on who they project themselves to be in any group. Shaping their persona. The Rainbow CSV extension may be causing delays (infinite spinner) when pasting Markdown in VS Code. Restarting it seems to fix the issue. ⭐ Claude scientific skills is a collection of skills teaching Claude how to use scientific libraries, databases, and APIs across several domains. This may be a good example of a non-trivial skill library - that is hard for AI coding agents to infer by themselves. Notes from How I use every Claude Code feature Use AGENTS.md as guardrails, not a manual. Document what it gets wrong. Use self-documenting tools/APIs rather than documenting. Docs: Explain why and when to read each doc. Never say “Never.” Explain when to which which alternative. Prefer CLIs for stateless tools, MCPs for stateful, authenticated, or complex (e.g. Playwright). Coding agents work well with version control. Simon Willison Break up uncommitted changes into small commits Rewrite branch history for readability Use gh CLI to fetch line-wise comments from a PR and make requested changes (e.g. renaming, refactoring, adding types, etc.) ⭐ When using MCPs or tools with private data, “color untrusted content in red, unsafe actions in blue, and never mix colors.” Good advice. ⭐ DeepWiki offers a codemaps feature that explains code in an interactive way. It shows a structured explanation on the left. You can click on any note to see the code on the right. It’s an effective way to understand how a library or tool executes a task. Here’s an example of how Mermaid works. Gemini offers RAG with free storage. RAG costs are quite high. This simplifies the process a lot. But I tried running the sample program and after an hour, it still had not completed uploading a single file. Best to wait and watch. OpenRouter supports embedding models using an OpenAI-like API Kimi K2 Thinking seems popular because It’s an open-weights model on par with the top models on Humanity’s Last Exam (text-only) and BrowseComp Can run 200-300 tool calls without human guidance 4x cheaper than GPT-5 with low tokens (32B active on 1T parameters, INT4 quantized) Based on responses to Simon Willison’s question, ChatGPT Fine-tuning helps when: Lower latency, e.g. for type-ahead, at lower cost (37 mentions) Structured extraction, parsing and classifiers, e.g. postal address, detecting secrets (18 mentions) Custom vision models, e.g. check containers (12 mentions) Domain-specific code and stacks (niche languages, stack-specific generation, text→SQL) (11 mentions) … and a long tail. Fine tuning does not help: When A base model plus prompting or RAG does as well or better (15 mentions) When you risk being leapfrogged by a new release (4 mentions) When cost and data do not justify the ROI (3 mentions) The data I can export from my Android phone includes the below. 🟢 indicates it’s tracked. 🟡 might need action, e.g. enabling / coding. # 🟢 GPS/GNSS location (current & history). Turn on device Location. If you want a timeline you can export, enable Google Location History and later export via Google Takeout → Location History (JSON/KML). 🟡 GNSS raw measurements (engineering traces). Android exposes GNSS “raw” logs on many devices; capture with dev tools or logging apps if supported (intended for research). See GNSS Raw Measurements API. 🟢 Wi-Fi scans (nearby SSIDs/BSSIDs). Toggle Location scanning → Wi-Fi scanning in Location settings; apps need location permission to read results. 🟡 Wi-Fi RTT distance to APs (indoor ranging). Apps can use Wi-Fi RTT (802.11mc/az) to measure distance to compatible APs; requires location permission. 🟢 Bluetooth proximity/traffic. For packet-level logs, enable Developer options → Enable Bluetooth HCI snoop log, then pull /sdcard/btsnoop_hci.log (Wireshark). 🟢 Cell towers (IDs, signal strength). Apps can read via TelephonyManager (e.g., getAllCellInfo()), with appropriate telephony permissions. 🟢 Activity recognition (walking, running, in vehicle). Apps must request ACTIVITY_RECOGNITION (runtime) from Android 10+. 🟢 Steps (step counter / detector). Use sensors API; from Android 10+ you must declare ACTIVITY_RECOGNITION to access step counter/step detector. 🟢 Accelerometer / gyroscope / magnetometer streams. Apps read via SensorManager; some high-rate reads require HIGH_SAMPLING_RATE_SENSORS. 🟢 Ambient light / proximity. Read via SensorManager; typically no special permission. 🟢 Google Fit data (steps, workouts, heart rate from wearables, etc.). Manage and export from Google Fit / Google account Download your data. 🟢 Contacts. MIUI → Settings → System apps → Contacts → Import/Export to .vcf (vCard). 🟢 Call history / SMS (device). MIUI local/cloud backup can include call logs & messages; export by creating a local/Cloud backup and downloading. Note: 3P apps can’t read call/SMS logs unless they’re the default dialer/SMS. 🟡 Gmail, Calendar, Contacts (Google). Export via Google Takeout (MBOX/ICS/CSV etc.). 🟡 WhatsApp / Telegram / Signal chats. Use in-app exports: WhatsApp → Export chat, Telegram Desktop → Export, Signal → encrypted backup. 🟢 Advertising ID. View/reset in Settings → Google → Ads (wording varies), per Google help on Ad ID reset. 🟡 Per-app screen time / unlocks / opens. Third-party “usage” apps (e.g., analytics or “digital wellbeing” clones) require Usage Access (PACKAGE_USAGE_STATS). Use Android’s UsageStatsManager or apps that export CSV. Stock Digital Wellbeing does not offer an export. 🟡 Notification history (last 24h). Settings → Notifications → Notification history → On. OEM-optional, but present on most devices. Viewable once enabled. 🟡 Notification content stream (live). Grant an app Notification access to capture/export notifications going forward. (User-granted API via NotificationListenerService.) | 🟢 Per-app data usage (mobile/Wi-Fi). Apps/ADB can query NetworkStatsManager; Settings shows per-app totals. Advanced dumps via adb shell dumpsys netstats. 🟡 Wi-Fi detailed logs. Developer options → Enable Wi-Fi verbose logging for richer diagnostics. 🟡 Bluetooth packet logs. Developer options → Enable Bluetooth HCI snoop log; export file and analyze in Wireshark. 🟢 Per-app storage usage. Apps/ADB can query StorageStatsManager; Settings shows per-app storage. 🟡 Photo/video metadata (EXIF incl. location). Enable “Save location” in Camera app to embed GPS in EXIF; export files normally (EXIF remains). | 🟢 Downloads & file metadata. Use a file manager or connect via USB; metadata is in the files themselves. | 🟢 Battery usage history (per-UID/app), wakelocks, jobs. Generate adb bugreport and analyze with Battery Historian or dumpsys batterystats. 🟡 System/device logs (logcat). You can view via ADB/Android Studio. Android restricts 3rd-party access to system-wide logs for privacy. 🟢 Developer quick tiles (Sensors off). Developer options → Quick settings developer tiles → Sensors off to globally cut Camera/Mic & SensorManager sensors on demand. 🟡 Google Takeout: one-stop export for Location History (Timeline), Gmail (MBOX), Calendar (ICS), Google Photos, Drive, YouTube, Fit, etc. MacroDroid, Automate and Tasker sound like powerful Android workflow automation tools. Some uses I can put it to: Automatically upload recordings to Dropbox Turn off hotspot when I reach office Vibrate if I’m walking slowly Adding <link rel="alternate" type="text/markdown" title="LLM-friendly version" href="/llms.txt"> is an emerging approach for pointing to LLMs.txt. It works. I asked Codex to read the CloudFlare vitest page. It read the file truncating the middle, found the <link rel="alternate" type="text/markdown" href="https://developers.cloudflare.com/workers/testing/vitest-integration/write-your-first-test/index.md"/ link in it, and reasoned “Considering fetching markdown instructions” and fetched the Markdown page. Giles’ Blog toon is a YAML-like format that’s LLM friendly and especially token-efficient (CSV-like) for tables. You can convert back and forth between JSON and toon. Food printing applies 3D printing techniques to create real food items. Given the art that this can create, I expect at least some adoption in niche restaurants. PMTiles lets you store map tiles as a single-file archive that libraries like MapLibre can read. Useful to avoid tile servers. Mirrow is a CLI SVG animation builder that converts a DSL to animated SVGs. However, it may be easier to use an LLM to create the animated SVG directly with SMIL than learning Mirrow (or teaching the LLM Mirrow). ⭐ One approach to giving memory (“episodic memory”) to coding agents is to allow them to search their logs.This gives them access to past discussions about a repo or other repos. To configure Gemini CLI with an AI router, set: "security.auth.selectedType": "gemini-api-key" in ~/.gemini/settings.json export GOOGLE_GEMINI_BASE_URL=https://llmfoundry.straive.com/gemini/ (or your AI router base URL for Gemini) export GEMINI_API_KEY=... (your AI router API key) Passing a HAR export to an LLM to build a scraper is a powerful idea! Lessons from Diagram Chasing Addy Osmani’s Gemini CLI tips are practical guides to using any coding agent, not just Gemini. I learnt about: Run shell commands with !, e.g. !ls -la or even !bash. It’s added to the chat. On-the-fly tool creation: ask it to write code for the task on the fly. Use it for system optimization, e.g. editing dotfiles, system customization, log error analysis, etc. Run GEMINI_SYSTEM_MD=... gemini -p "task" --yolo --format json < input.txt to run Gemini with a different system prompt and feed it input.txt to run in a pipeline. (FYI: Codex does not send a default system prompt, so there’s nothing to override.) There is a Gemini CLI Show and Tell thread with examples. This include Janitor AI, a Gemini CLI session viewer, etc. Hands on with Gemini CLI has several Use cases to try out. Renaming photos and organizing files are clever ones. AGENTS.md can be used like a decision log - rules, styles, or preferences that evolve over time - on a per-repo basis. Gemini’s /memory add feature helps with this. gemini --checkpointing is a useful “undo” feature. /restore rolls you back to a specific checkpoint. The overhead is small. Caching is only available with API key or Vertex AI, not OAuth login as of now OpenAI TTS costs are confusing. But in short TTS-1 costs $15 / MChars (max 4,096 chars per request), which ends up at ~86c / hour GPT-4o Mini TTS costs ~$16 / MChars (max 2K tokens which is ~7,000 chars per request), which ends up at ~88c / hour. Very similar cost, effectively TTS-1 HD is twice TTS-1. OpenAI has a usage API that provides cost as well as usage for completions, images, audio speeches, etc. These require an organization admin key Cost API: curl "https://api.openai.com/v1/organization/costs?start_time=$TIMESTAMP&project_ids=$PROJECT_ID&group_by=line_item" Audio speech usage API: curl "https://api.openai.com/v1/organization/usage/audio_speeches?start_time=$TIMESTAMP&project_ids=$PROJECT_ID&group_by=model"

Things I Learned - 02 Nov 2025

This week, I learned: TVMaze API is an API for TV shows, episodes, cast, crew, etc. Useful for TV-related apps as well as learning APIs. Awesome Skills is a curated list of prompts and skills for AI coding agents. ⭐ nokode is a API server that has no code: just LLMs responding. Interestingly, it is compliant. Just expensive, slow, forgetful and unreliable compared to code. All four are improving with time, indicating that coding may be transitional. Notes from Vanya Seth’s keynote at OSAI HYD Superpowers of Gen AI to keep in mind when exploring AI coding agent use cases: Translating. Requirements to code, code to code, language to queries, standard to standard. Finding info just-in-time (in context). How does this work? What’s this error? What tools are permitted in my org? Who knows what? E.g. Atlassian Rovo queries across JIRA, Confluence, etc. Brainstorming and ideation. Product ideation. Requirements. Testing gaps. Architecture review. Exploratory / scenario testing. Summarizing and clustering. Change logs, incident management, research data, docs summary. Challenges in using AI coding agents: Adoption imbalance. Only certain roles are amplified by AI. Coding, QA, more than planning, maintenance, AI ops, etc. What’s the impact of this? ⭐ Goldratt’s ToC implies that backlogs need to fill faster. Downstream becomes a bottleneck. Technical debt piles up. ACTION: Use AI across entire value chain, from research to maintenance. Locality. enhances roles (nodes), not relationships (links). They optimize local work, not global flow. Workflow tools are missing. Coordination overhead. Context Fragmentation. Translation problems. ⭐ Expand productive roles to cover neighboring tasks. Productive developers shift left and build backlogs; shift right to reduce code review, maintenance tasks. E.g. Move maintenance/production activities into development. Security, performance, monitoring, observability, cost, infrastructure. We spend time on IDE, CI/CD, Jira, Confluence, Prod observability tools. A typical Agent Development Platform (ADP) covers evals, guardrails, workflow builder, agent builder, observability, prompt management, AI gateway (LiteLLM), MCP servers, model fine-tuning, model serving, model repository, vector stores We need ADP Agents covering delivery risk, continuous security, prod issues RCA, observability, performance, accessibility, product research, infra optiimzation, test data generation, anomaly detection, release management ACTION: Share ADP photo with Patrick. ACTION: ⭐ Centralize skills (“knowledge packs”) and MCPs and observe which gets used most. Allow people to use more. Lethal Trifecta. There’s growing demand for higher productivity with AI code assistants. But the lethal trifecta makes them an attack vector. It has access to sensitive information, exfiltrate data, and read and follow unsafe instructions. Can lead to supply chain poisoning attacks. Regulated industries cannot adopt. Technical debt growth. More productivity leads to poor code quality which will slow down future work. See Software Engineering Excellence 2025 AI induced complacency. Sunk-cost fallacy on AI-generated code hurts. ACTION: Evaluate code quality continuously to reduce technical debt. Double-down on good engineering practices. Compliance. Model residency. Self-hosting is required. Data observability gaps. Data privacy, audit trails, etc. are concerns. Token economics. $20/day happens in Thoughtworks. Token cost is subsidized. Rogue AI usage. Use of dis-allowed tools; shadow IT. ROI justification. Hard to quantify productivity gains. Adoption. AI Literacy. Tap into organizational knowledge Champions & communities of practice to support cross-pollination. Use-case driven adoption. Teams identify based on AI superpowers. AI playbook. Share what worked, what didn’t work. AI automation is likely less if a high portion of work Has legal liability (e.g. pharmacist/judge vs shop attendant/lawyer) Is subjective (e.g. perfumer/auction appraiser vs lab chemist/insurance appraiser) Needs rapid contextual decisions (e.g. detective/fireman/ER vs parking enforcer) Via ChatGPT, Claude parse-sse from Sindre Sorhus is a more standards-compliant, more likely-to-be-maintained alternative to my async-sse package. Which is better: Comment A: 1 upvote, 0 downvotes (100% positive) or Comment B: 99 upvotes, 1 downvote (99% positive)? Use Wilson’s Lower Bound which measures “What % positive am I 95% confident of?” Claude Using this, we can measure metrics for tweets, like below. ChatGPT Popularity = (5 _ WLB(reposts / views) + 2 _ WLB(likes / views)) * Decay(half-life of 72 h) Memorability = (5 _ WLB(bookmarks / views) + 4 _ WLB(replies / views)) * Decay(half-life of 36 hours) A nice visual “benchmark” of text-to-image and image editing models. Seadream 4, Gemini 2.5 Flash, and Qwen Image Edit lead. This includes examples like straightening te Tower of Pisa - which only Flux.1 and Seadream 4 do well on; or removing only the brown M&Ms - which only Qwen Image Edit manages to. Arch is a pure LLM router. It supports multiple LLMs, flexible routing and observability but not auth. From Codex docs Add custom prompts in ~/.codex/prompts/xyz.md and launch as /prompts:xyz. Optional: description: and argument-hint: in YAML front-matter. For example, create prompts to refactor, rewrite in a developer’s style, document AGENTS.md, identify re-usable code, etc. AGENTS.override.md overrides parent directory AGENTS.md. AGENTS.md appends to parent AGENTS.md. Fallback names are allowed. codex exec supports streaming JSON codex exec accepts a CODEX_API_KEY= environment variable. codex uses an OPENAI_API_KEY. You can configure which environment variables are passed to the shell Codex reads 32KB from AGENTS.md by default Things that I currently follow and don’t follow from Peter Steinberger’s excellent Just Talk To It: Prefer Codex > Claude Code. Ask for options before executing Generate & review specs collaboratively You don’t need git worktrees Prefer subscriptions over API to reduce cost Store docs with code Give examples Use voice input Use Codex Web as a mobile inbox for ideas Prefer CLI over agentic platforms Prefer CLI tools over MCP Avoid ALL-CAPS for Codex. It follows instructions well Avoid sub-agents, RAG, etc. Iterate UI live. Watch changes Use 3-8 agents in parallel on a single repo. Make small, atomic commit checkpoints. Commit only what the agent touches Add ast-grep as a pre-commit hook to block rule violations. Keep custom prompts minimal (commit, automerge, massageprs, review, …). Just “commit” reduces context Cancel long tasks and ask what’s happening Prefer Medium over High reasoning. It decides level of thinking Share screenshots Use tmux to run CLIs persistently Schedule refactor time (20%). Use jscpd, knip, oxlint, … Don’t reset context. Cold start wastes time + tokens Write tests in the same context. Yields better tests, reveals bugs. Prototype in a separate folder / PR Queue continue messages** before stepping away Ask it to “Preserve intent and add comments at tricky spots”. Future you needs the WHY On hard problems, add “take your time”, “be comprehensive”, “read all related code”, “form hypotheses”, etc. Maintain an evolving AGENTS.md with product notes, naming, API patterns, test policy, ast-grep rules, etc. Delete stale guidelines Fascinating implications from Quantifying Human-AI Synergy ChatGPT Models vary in ability to uplift humans. Don’t just use standalone model benchmarks. People vary in ability to work with AI. Don’t just measure solo skills. Reward AI collaboration ability (delegation, prompting, verification, revision, …) Train models to ask for missing Theory-of-Mind cues: goal, beliefs, constraints, audience, success test Train people by asking them to predict what the model will get right/wrong, and validate Design UI and models for synergy. UI: Surface/solicit assumptions, intent, uncertainty, constraints. Model: Infer & adapt to evolving user state. OpenRouter image generation now includes GPT-5 Image Mini. An image costs about 1 cent. Here’s the code: curl 'https://openrouter.ai/api/v1/chat/completions' \ -H "Authorization: Bearer $OPENROUTER_API_KEY" \ -H "Content-Type: application/json" \ -d '{ model: "openai/gpt-5-image-mini", messages: [{ role: "user", content: "Draw a cat" }], modalities: ["image"], image_config: { "aspect_ratio": "16:9" } }' | jq -r '.choices[0].message.images[0].image_url.url' | cut -c23- | base64 -d > cat.png

Things I Learned - 26 Oct 2025

This week, I learned: Before founding a place to do good, work in a place that does good and learn. Ben Werdmuller What should we teach when vibe coding becomes good enough for non-coders? Ethan Mollick Problem decomposition Clear communication & spec writing Core technical foundations: file systems, access control, networking, APIs, version control, data structures, databases, deployment Software development skills: Debugging, Testing, Refactoring, Design patterns, UI/UX Project management: requirements, prioritization, scoping, … Codex CLI tips: codex --add-dir $DIR lets you write into $DIR codex --full-auto is the equivalent of codex --sandbox workspace-write --ask-for-approval on-request Terse code is not necessarily easier or harder for LLMs to write. It’s about how unusual (or not aligned with training data) the code is. Gabi Teoduru How are people using browser agents like Comet / Atlas? Simon Willison Most popular: YouTube video summaries with timestamps Most useful: Form filling: Government forms, data entry, repetitive bureaucratic tasks Foreign language navigation: Applying for pension in Korea, navigating sites in other languages Time reporting auto-completion Insurance claims: Reading policy documents and drafting appeals (successfully got claim reimbursed in India) Compliance training click throughs Next most useful: Shopping / planning Energy provider comparison - Comet checked current plan vs competitors on Check24, calculated exact annual savings per provider Financial tracking: Finding Amazon orders, tracking Airbnb spending with refund calculations, analyzing bank transactions Trip planning: Mapping 50-100 places on Google Maps automatically Interesting: Airport shuttle discovery - Found shuttle that user missed in manual searching HubFS mounts GitHub repos on the file system. Every file system action directly works on GitHub via a REST API. Useful for some scenarios but less useful for note-taking than something like GitDoc which offers a delayed sync. Ernest Ryu solved an open problem in convex optimization using ChatGPT. Quotes: ChatGPT is now at the level of solving some math research questions, but you do need an expert guiding it. ChatGPT was really effective at accelerating my progress. This work took about 12 hours, spread over 3 days. In hindsight, the proof is really simple. But I iterated through so many other strategies that didn’t pan out, and ChatGPT crucially helped to quickly explore and eliminate those dead-end approaches. Also, the key successful steps were suggested by ChatGPT. ChatGPT did not produce the proof in a single prompt. The process was highly interactive. It generated many arguments, roughly 80% of which were incorrect. Yet some were genuinely novel to me. Whenever I recognized a novel idea, whether correct or only partially so, I distilled the key insight and prompted ChatGPT to develop it further. My contribution: Filtering out incorrect arguments and accumulating a set of correct facts. Identifying promising new lines of reasoning and guiding ChatGPT to explore them further Recognizing when a strategy had been fully explored and deciding when to move on. ChatGPT’s contribution: Producing the final proof argument. Significantly accelerating my (or our) exploration of the many dead-end arguments, rapidly ruling out approaches that did not work. Comparing the GPT 4.1 and 5 models at all different of reasoning, I’ve switched my default from GPT 4.1 mini to GPT 5 mini (medium). Far smarter for a slightly higher cost. Artificial Analysis python -m pdb -c continue script.py or uv run -m pdb -c continue script.py runs a script and drops into pdb on unhandled exceptions (post-mortem). ChatGPT Technology removes constraints. We then do what we really value. Claude When writing became digitized, we stopped cared about spelling/handwriting for its own sake. Spelling bees and handwriting classes declined. “ur” is acceptable. When fitness tracking became easy, many just track, few exercise more. Few people value exercise When GPS became ubiquitous, we stopped learning geography. Most value arriving, not knowing When photography became unlimited, most captured moments. Few perfected shots I had Codex scrape my ~2,000 pending invites on LinkedIn and asked ChatGPT to analyze it. Here are learnings: ChatGPT, private Power-law. 5% of inviters account for ~42% of all common connections. Top 10 people alone for ~20%. IITM student invites are high (~14%), but with 0-2 common connects, i.e. distant strangers. EdTech is tiny in count but has the highest common connections per person (outlier-sensitive but real). Among ≥20-commons, many hold VP/Head/Site-Lead titles in Data/AI or GenAI (not just recruiters). GenAI people are 7-8% and steady across months. Not a useful signal to prioritize. Premium ~ Senior. Premium accounts show ~40% senior titles vs ~29% for non-premium. Finance invites have higher seniority rate and more common connects than healthcare. Followers have higher common connections (~6 vs ~4). ⭐ Memory can be code. Agent memory is anything it choose to persist. Agents can write code on the fly to automate tasks, save them, and serve the code on the next request, potentially modifying the code as required. This is like the conscious mind saving a habit for the subconscious to execute fast. Finally: Microsoft Office has an agent mode that lets you talk to it and do stuff. The Verge

Things I Learned - 19 Oct 2025

This week, I learned: ⭐ “… most engineers don’t have public commits. Senior engineers at large tech companies don’t work on open-source projects for the most part.” Why AI Can’t Do Hiring Cloudflare’s Sandbox feature in their Workers looks impressive. It supports streaming, web access to the container, and long-running processes. So we can spawn off a task and have it run a server (at least for a while) or a scraper. Gemini API has a Google Maps tool that it can refer to - like Google Search. Maps Grounding Earlier we needed humans to label data for RLHF. Now we don’t since AI can simulate it. This is a pattern. Once AI learns from a human, that human skill can be automated. How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek The <output> element has a for= attribute indicating which <input> elements it is linked to and a form= attribute indicating which form it belongs to. This works well with screen readers. A good reason to use it more. Examples. Meta built a Code World Model. Basically an LLM that acts like a Python interpreter! sudo apt install moreutils installs a set of useful packages: chronic. Runs a command quietly (suppressing output) unless it fails — good for cron jobs where you only want noise on errors. chronic backup.sh combine. Combines lines from two input streams/files using boolean operations (AND, OR, XOR). combine AND fileA fileB errno. Look up symbolic names, numeric codes, and descriptions for standard errno values. errno -l; errno ENOENT; errno 2 ifdata. Query network interface properties (IP, byte counts, errors) in a script-friendly format. ifdata -sip eth0; ifdata -bops eth0 ifne. Run a command only if stdin is not empty, passing the input through. find . -name core | ifne mail -s "Core files found" admin isutf8. Check whether a file or stdin is valid UTF-8. isutf8 somefile.txt lckdo. Run a command while holding an exclusive lock to prevent concurrent runs. lckdo /var/run/mylockfile.cmd myscript.sh mispipe. Pipe two commands, but return the exit status of the first one (useful in pipelines). cmd1 mispipe cmd2 parallel. Run multiple commands in parallel, reading them from stdin or arguments. parallel < jobs.txt pee. Like tee, but sends stdin to multiple commands in parallel. echo "foo" | pee cmd1 cmd2 ⭐ sponge. Soak up all input before writing to output — enables in-place edits safely. sort file | sponge file ⭐ ts. Prefix each input line with a timestamp. tail -f logfile | ts vidir. Edit a directory listing in your editor to rename, move, or delete files in bulk. vidir ~/myfolder vipe. Insert a text editor into a pipeline to manually edit streamed input before output. cat file | vipe | wc -l zrun. Transparently decompress compressed files before passing them to a command. zrun cat file.gz Despite 20 years of SVG experience, I learnt new things from A Friendly Introduction to SVG and A Friendly Introduction to Paths Setting a <rect> width/height or a <circle> radius to zero removes the element instead of drawing a point. There’s no option to draw the stroke on the inside or outside of a shape/path. Only the center. You can override a path’s pathLength attribute to create a new internal scale for its length. It’s unclear where I can use this. <path> arcs have this syntax: A [rx],[ry] [rotation] [large-arc-flag] [sweep-flag] [end-x],[end-y]. SVG first fits an ellipse to these parameters and then draws the arc. If rx and ry of an arc is too small to connect the points, the SVG spec scales up rx and ry. [large-arc-flag]=1 literally uses the larger arc of the fitting ellipse. This is less common. [sweep-flag]=1 its the ellipse to make the connecting arc go clockwise. 0 is anti-clockwise. [rotation] is rarely used because we usually draw arcs and then rotate them. stroke-linejoin automatically flips from miter (sharp) to bevel (cut) if the sharp edge protrudes too long (e.g. small angles). Increasing stroke-miterlimit increases the cutoff (default: 4) ⭐ Always include a thoughtful gallery of examples with tools / libraries. This does more than showing what a tool can do. It’s use-case / domain transfer: showing what it’s useful for in real life - opening ideas, suggesting workflows. It’s style transfer: showing how to use it. ⭐ Here’s what expert AI coders increasingly focus on. Thomas Dohmke Delegation: context engineering agents for success; parallelizing. Verification: efficiently reviewing and testing code/output; setting stop-points. Expanding scope: instead of time saved as the metric. Education: teaching AI-based coding, debugging, reviewing/testing. Product management: combining requirements + UI design + architecture + engineering + deployment. Cross-discipline: blending code with design, governance, finance, marketing, … (“computational creators”). Notes from Taylor’s How I’m using coding agents: October 2025 Left monitor: 2-4 desktops (e.g. work, side-project). Right monitor: things I always want available Plan next task while first executes. Use plan mode to write to a plan file. Don’t start big tasks if you have meetings scheduled soon. Recent open source package hack methods seem to work more because of people/process than systems (Filippo): Phishing the author Pull requests running unsafe code in CI Taking over expired domain / user ID Stealing long-lived tokens uv run --python 3.14 --isolated --with-editable '.[test]' pytest runs pytest on a local project with a specific Python version. Simon Willison Notes from the State of AI Report 2025: Reasoning models are more fragile. Irrelevant phrases make reasoning models spend FAR more tokens and get wrong answers #21 AI systems are able to teach experts new concepts #41 An environment providing feedback / rewards enables continuous learning #52 E.g. Multi-robot chemical labs at U.Liverpool and NCSU #60 RLHF has a fundamental flaw: humans reward sycophancy #71 We can read what people are typing from brain signals outside the skull #73 Model intelligence-to-price ratio doubles every ~6 months #94 The AI companies’ valuations are also roughly doubling every ~6 months #181 OpenAI is offering Governments giga-watt campuses to run OpenAI models for citizens #122 A 1GW clusters costs $50bn capex and $11bn per annum #130 China has added ~10X the energy capacity as the US in 2024 #146 NVIDIA challengers are still far away #161 LLMs can “read between the lines” even if training data is censored #268 LLMs can pass information via hidden signals #270 Prediction: A major retailer reports >5% of online sales from agentic checkout. AI agent advertising spend hits $5B. #304 OpenAI’s leadership guide says: Align Explain WHY AI thoughtfully. Set a goal, e.g. everyone uses ChatGPT 20 times/day (Moderna). Use it yourself. Show how. Have business leaders run AI sessions Activate Launch an AI skills proram Set up an AI champions network Encourage experimentation (dedicated time, workshops, hackathons, …) Link to performance evaluations Amplify Create an AI knowledge base Share success stories (weekly) Create internal groups (Teams, Slack, …) Celebrate AI wins Accelerate Unblock AI tools and data access Simplify project selection. Quick feedback, clear priorities Unblock projects with a cross-functional council Give resources to successful teams Govern Publish a responsible AI playbook (what’s safe to try) Audit AI practices quarterly

Things I Learned - 12 Oct 2025

This week, I learned: ‘…as few as 250 malicious documents can produce a “backdoor” vulnerability in a large language model… data-poisoning attacks might be more practical than believed." Anthropic Tim Urban’s 2015 article, The AI Revolution: The Road to Superintelligence, is surprisingly relevant. A key theme is that post artificial-super-intelligence, pretty much anything we know / predict is probably wrong. LLMs are bad at asking questions, so you need to plan on their bahlf first. LLMs are bad at copy paste, so giving them a scaffolding to edit helps. Two things LLM coding agents are still bad at The VPN industry is a consolidating oligopoly that doesn’t offer much security and biases towards affiliates. Who Owns Express VPN, Nord, Surfshark? As of 2025, a fine-tuned DeBERTa-v3-Large / RoBERTa-Large model is better than an LLM at emotion classification. roberta-base-go_emotions is a good starting point if you don’t want to fine-tune. ChatGPT OpenAI defines an AI agent as “a system that can do work independently on behalf of the user”. swyx Brain coding is the new term for human coding - as opposed to vibe-coding (AI codes, human doesn’t review code) and AI coding (AI codes, human reviews code). npx -y emoj lets you type text and pick a relevant emoji. Many people who shifted away from conflict aversion did so by systematizing it. ChatGPT Martin Luther King Jr institutionalized not stepping back from conflicts in his movement. Kim Scott (Radical Candor) practiced caring more via short, specific feedback loops. Kwame Christian (Compassionate Curiosity) practiced ask open questions. Ed Catmull (Pixar) instituted Braintrust to ask candid questions. Ray Dalio (Bridgewater) instituted radical transparency. Many people who adopted a failure-seeking mindset made failure frequent, small, cheap, and informative. ChatGPT Jia Jiang ran a 100-day rejection challenge, acclimatizing himself to failure. Kim Liao (writer) moved from submission-avoidance to “100 rejections/year”. Reshma Saujani (Girls Who Code) built a practice of “brave, not perfect” - ship before perfect. Ray Dalio (Bridgewater) instituted mistake logs and “pain + reflection = progress”. Astro Teller (X, the Moonshot Factory) rewired incentives so teams are rewarded for killing their own ideas early. Sara Blakely (Spanx) set weekly failure quotas. Kathryn Schulz (author of Being Wrong) converts failures into teaching methods. Sindre Sorhus has already created a micro-framework css-extras using CSS @functions. Today, if I had to build agents, here are the tools and environment capabilities I’d ask for: Ask user (for clarifications) Internet tools Search Fetch (CORS-piercing) Scraper with XPath/CSS Selectors Access to llms.txt LLM APIs Summarizer (condenses chat) Sub-agents Coding tools Markdown convertor Code execution (including tests) Browser + DevTools for testing Memory / storage Tool/MCP directory with search Noting a few things that I find #impossible to do today with LLMs: LLMs can’t run experiments / explorations, like trying out on a new tool or web app in an environment, the way I would. LLMs can’t move stuff on my machine, e.g. notes from one list to another, when they’re only on my laptop, not GitHub. LLMs can’t capture the past wisdom in my head, e.g. the distilled principles of data visualization that we applied at Gramener. LLMs can’t prioritize my to-do list based on my preferences and what’s important to me. LLMs cannot write a blog post in my style of writing. When recruiting for people in the LLM era, look for questioning ability, sensible thinking, and how they use AI. Give them lots of fluff and context. Can they cut through it? Is their answer concise and to the point or waffling? Like post the industrial revolution, more people will become operators looking after AI, not craftsmen. This includes coding. zx is a nice JS-based alternative to shell scripts. const branch = await $`git branch --show-current`; await $`dep deploy --branch=${branch}`; docker run -it --name test --user vscode mcr.microsoft.com/devcontainers/base:ubuntu gives you a test Ubuntu image closer to a desktop / user setup rather than a server. Useful to try out apps.

Things I Learned - 05 Oct 2025

This week, I learned: Wrong answers are useful if you discover why they said that. Conversation is a game where you CO-CONSTRUCT common ground. Mike Caulfield BMTC hourly data from Bangalore Metro is available via RTI. Vivek “Find evidence for and against” improves LLM responses far more than “Are you sure?” Mike Caulfield SSH3 is an emerging SSH alternative that’s written on top of HTTP/3. It supports OAuth2, OpenID Connect, and HTTPS for certificates. Cholesterol has become a victim of its own success. We give statins to those with high LDL. So most people who have heart attacks have lower-than-natural cholesterol. Inflammation (HS-CRP) is now the strongest predictor of heart attack (American College of Cardiology). The usual stuff reduces HS-CRP: no sugar/carbs, veggies, nuts, green tea, turmeric/black pepper, weight loss, exercise, sleep, meditation. ⭐ The beginner mindset: scrub your instincts and don’t let life experience cloud you. This takes effort. Hold on to naivette and escape cynicism. The Knowledge Project: Barry Diller Forecasts give comfort. They may not be good but they feel safer than instinct. The Knowledge Project: Barry Diller My laptop’s mic is much better than my phone’s mic, surprisingly. When recording conversations, it’s better to leave my laptop open and record than use the phone’s recording app. ⭐ Here are the major not-immediately-obvious LLM megatrends/superpowers I see. Swarms. Ask for dozens of solutions in parallel. Merge, rank, auto-debate, converge. Personalize at Scale. Create feedback, designs, excerpts/summaries, … tailored to EACH person at scale. Computer use. Agents operate UIs like a human (browser, apps). LLM-as-a-judge. Use AI to validate ever-increasing AI generated output. Synthetic data. Create realistic data for prototypes, testing edge cases, market research simulation, training data, … Code on demand. Ask for outcomes directly. Agents code on the fly to get there, in data science, research, management, … Style transfer. Copy a master’s style of drawing, coding, writing, … creating an army of their apprentices. Multi-modality. Native voice/video/screensharing and long-context perception Citizen experts. Non-expertise is not a barrier. Amateurs can create expert-level films, music, software, reports, … Long-context LLMs. Growing context size lets us process entire repos, legal libraries, personal lifelogs, … Memory. Assistants learn per-person / per-team. Cuts prompt, builds knowledge. Agent-to-Agent. Agents consuming content (e.g. llms.txt), agents calling agents (sub-agents, A2A protocol, …) Real-world tools. Write reports, send emails, shop online, use computer, control devices, … Jagged frontier. AI is great at certain things but terrible at others. This frontier is unknown and shifting rapidly. Lethal trifecta. You can only have 2 out of these 3: private data, untrusted content, and external communication. Edge/Private AI. Small models on private cloud compute. Authenticity. What content is authentic? What’s slop? What’s fraud? Are AI twins liable? AI Governance. Strict liability, transparency mandates, state control, … Not sure about or haven’t seen enough of these: Data / workflow as the moat AI native business models AI digital-divide ⭐ What I’d like to do next, maybe, is build a boutique “AI Studio”. Small group of good people coding delightful AI problems. Something that doesn’t scale. GLM models can be used with Claude Code. At $3/month and a quality close to Claude 4 Sonnet, this is a good deal. But the effort of adding a new subscription is too high for me. I’d rather use it via OpenRouter which is doesn’t support an Anthropic API end point at the moment. typst is a good LaTeX alternative. Markdown-like syntax with fast rendering. Mostly useful for researchers using LaTeX. But publishers / journals don’t accept typst often. libSQL is an SQLite compatible fork with remote access, replication, ALTER TABLE to modify columns, random ROWID, etc. It supports the same externsions. The maintainers are working on turso - a SQLite compatible improvement with async, vectors, change data capture, etc. (still in alpha). But because of this, I’m a bit uncertain about the future of libSQL. ⭐ LLM benchmarks show a correlation of ~0.5, hinting at a common theme of intelligence. Correlations in coding & science are particularly high. Ethan Mollick. Reminds me of student marks correlations. Strong correlation clusters (physics, chemistry, biology, mathematics, computer science) with the weaker correlations going down to ~0.5. What does it indicate? LLMs learn like people? Knowledge areas cluster? Humans write benchmarks like exams? Dayflow records your screen at 1 fps and uses Gemini to summarise your activity every 15 min. Has low CPU usage. ⭐ Code Mode is a smart way to use MCPs and a very likely future direction. Using LLMs to write code to call MCPs rather than directly. Cloudflare supports an AI Index which will eliminate the need for a lot of custom RAG engineering.

Things I Learned - 28 Sep 2025

This week, I learned: selectolax is a fast, easy-to-use, modern HTML5 parser with CSS selectors. A good replacement for lxml.html. The most effective way to convert a blob (e.g. file input) to a data URL on the browser seems to be via the FileReader API. const blobToDataURL = (blob) => new Promise((res, rej) => { const r = new FileReader(); r.onload = () => res(r.result); r.onerror = () => rej(r.error); r.readAsDataURL(f); }); Tool calls in OpenAI support files and images. OpenAI ⭐ “Task parity is not the same thing as job parity There is a lot of complexity as many different tasks are bundled into jobs, and many jobs contribute to processes inside an organization The jagged frontier of AI ability means doing tasks well doesn’t translate to doing jobs well.” Ethan Mollick Adding // @ts-check to a JavaScript file and documenting types via JSDoc might be the simplest way to migrate phase-wise from JS to Typescript. envsubst < file.txt replaces file.txt with the environment variable, e.g. $HOME is replaced by the HOME environment variable. Clean shell-level templating. GitHub Copilot CLI is out. npx -y @github/copilot Compost is the cheapest thing per ton that I can buy on Amazon India. I can buy 1 ton of compost for Rs 13,500. ChatGPT yt-dlp requires Deno from now on. #14404 In meetings, make cameras optional by default – and judge engagement by contributions, not video – because a 4-week field experiment found camera-on increased fatigue and reduced voice, especially for women and newcomers. Camera on early for trust building is useful. PubMed wrkflw is a quick and light way to test GitHub actions before publishing. It runs GitHub actions locally. GPT-5-Codex is available as an API and on LLM. Simon Willison ⭐ I’m habit engineering, i.e. discovering and stacking habits on to existing ones. For example: ChatGPT suggested increasing observability based on code reviews. I’m including it in my weekly codecast. ChatGPT suggested defining closures inmeetings. I’mn now discussing objectives at meeting starts and effectiveness at the end. Since Anaconda cannot be used for free by organizations with 200+ people, Straive’s received legal notices from Anaconda. Since laptops are under central IT administration, they went ahead and deleted all Anaconda instances. Installing miniconda for use with conda-forge requires admin access that most developers do not have, however. That leads to an interesting “No Python” situation. This is where uv becomes the knight in shining armor. Perceptron is SOTA LLM for object bounding boxes. Just 2B parameters. Gall’s “law” says that complex systems that work evolved from simple systems that worked. But a complex system designed from scratch won’t ever work. This holds in uncertain environments. But where formal theory or regulations exists, it doesn’t. ChatGPT uvx --with visidata vd gives you a command-line Excel editor to edit / convert CSV, Excel, JSON, SQLite, directories, etc. uvx markitdown https://example.com/ fetches example.com as Markdown. I learnt this when I told Codex it could use uvx markitdown to convert PDFs and it figured this part out by itself. The Dropbox connector for ChatGPT is the little flaky – at least on Android. It could not identify a file that was clearly there in Dropbox and I had to upload it manually. ChatGPT’s output is too dense for me. I added this to my custom instructions: “Write in simple language. Explain non-obvious terms intuitively.” yt-dlp has a --download-sections option that downloads specific YouTube time ranges. For example --download-sections "*00:01:00-00:03:00" downloads roughly (not exactly) from 1 min to 3 min. Note the * at the beginning. My Lenovo laptop’s touchpad started scrolling instead of moving when I moved my finger. Many things could have caused it, but the solution was to click (not tap) the top middle of the trackpad. ChatGPT The India Entrance Exam database is a dataset collating Indian entrance exams.

Things I Learned - 21 Sep 2025

This week, I learned: When editing an image, ChatGPT’s non-thinking mode does a much better job of preserving the original image features than the thinking mode. When editing my photo, I found that the thinking mode creates images that looks quite different than me. A surprising effect of overthinking. ⭐ When evaluating model accuracy, compare with human accuracy rather than perfect accuracy. SMEs rarely agree among themselves, so it’s unlikely that they will agree with an LLM. Instead, measure how often the LLM agrees with the majority of SMEs and how often it disagrees with all SMEs. This gives a more realistic measure of accuracy. LLMs instead of Human Judges? and Judging LLM-as-a-Judge. ChatGPT I understand at least one mechanism of how costs are inflated in large organizations. Even people who want to keep costs low find that the process of tracking expenses, submitting receipts, answering questions around approval, adds transaction cost. So, rather than going for a $10 plus top up mechanism, I would rather go for and ask people to take a $500 top up. Better ask for more and waste than have to ask again. YouTube downloaders: yt-dlp for the CLI, Stacher for Windows/Mac/Linux, Cobalt for a web-based app. Ref VS Code a bunch of features I discovered: It can run a terminal in its own new window for over a year (via Ctrl+P > Terminal: Move Terminal into New Window). Now, Ctrl + Alt + Shift + ` does this directly. Terminal Intellisense shows completion suggestions in the UI. Very helpful. Ctrl+Space triggers the menu completion. ⭐ “We find that the per-step error rate itself rises as the task progresses”, i.e. once a conversation goes the wrong way, it’s really hard to correct it. The Illusion of Diminishing Returns Japonaise Cake is the name of the pastry that I had as a child and grew up longing for. I have spent several weeks searching for it in the roadside bakeries at Bangalore and Chennai but only one bakery seems to have it. systemd is the modern way to run scheduled jobs, instead of cron. It’s far more complex. But it can catch up on missed runs via a Persistent option. Working with systemd timers ⭐ Vice-chancellors of universities resist AI in education because (a) their faculty does not know AI and (b) AI is unreliable. But they are interested in (a) large-scale AI-evaluation and (b) AI-enabling entire campus. tldr.sh offers concise man pages, e.g. uvx tldr jq. cheat.sh offers detailed examples, e.g. curl cheat.sh/jq or curl cheat.sh/:help. ugrep is a fast drop-in replacement for grep. It supports fuzzy search with a customizable Levenshtein distance. Also ug -Q shows an interactive TUI searches like VS Code’s “Search in Files” feature. Very intuitive. Dagger lets you write CI/CD workflows in Python. I tried running it but after 7m of pulling large Docker containers, I gave up. Too heavy. dotslash lets you write scripts that downloads GitHub releases, caches, and runs them. Requires writing scripts. I prefer mise. ChatGPT has a quota for searches. I saw this phrase in the reasoning traces: “I’ll avoid overloading on citations since we only have a few calls left.” It doesn’t seem to be in ChatGPT’s system prompt from last month, so it’s either part of the tool response or a new prompt. Depending on the underlying chips that a model uses, the floating point multiplications may differ and model quality can vary. So Claude 4 Opus running on Anthropic’s GPUs can produce different results from when running on Google’s GPUs or Amazon’s GPUs.

Things I Learned - 14 Sep 2025

This week, I learned: Though I’m connected on LinkedIn with people I can’t remember (weak ties), pruning them shrinks serendipity. Weak ties, despite noise, are disproportionately valuable for opportunities, e.g. intros, jobs, and pruning reduces future upside. Science Claude has a Python + Node code interpreter that can access GitHub, PyPi, npm and Google. Simon Willison SuperTinyIcons has very small icons for many websites and is available via CDN. Sample: http://cdn.jsdelivr.net/npm/super-tiny-icons/images/svg/github.svg Clock bench is an LLM benchmark based on how well LLMs tell the time from an analog clock. Humans (89%) are much better than the best model (Gemini 2.5 Pro - 13%). Veo 3 is now available via API. Veo 3 fast is 15s/second. Google ChatGPT has full support for MCPs via Developer mode in Plus and Pro accounts, via “Developer mode”. OpenAI In Pyodide, you can use from js import document and then document.querySelector to manipulate the DOM directly from Python. from pyodide.http import pyfetch lets you use fetch. gtrending is a Python package that fetches trending GitHub repos, users, etc. uvx gtrending repos --language rust --since weekly fetches trending Rust repos of the week. astgrep lets you search in code (across languages) using AST patterns. Like semgrep but more about code search than security. uvx --from ast-grep-cli ast-grep runs from the CLI. Useful for code rewriting, fast linting, code search. hurl is a CLI config-based HTTP automation tool. Useful for tests, bulk (templatized) HTTP requests, etc. rustdesk is an open-source remote desktop software. TeamViewer alternative. Self-hostable. prek is a much faster version of pre-commit - a cross-language pre-commit hook manager. ⭐ mise is a tool version manager. Combines nvm/fnm, pipx, etc. Supports running several tools with a smooth installation. The npm phishing email was a great one. It compromised chalk which is used in most npm packages. This may be one of the best supply chain attacks in recent times and makes me want to pin versions instead of using npx -y. Also makes me glad that I’m sponsoring @isaacs and @sindresorhus - two critical open source maintainers. “I pay for YouTube Premium. For my money, it’s the best bang-for-the-buck subscription service on the market”. - Gavin Andregg LLMs are non deterministic because GPUs add floating point numbers concurrently and FP addition is non associative - order matters. Thinking Machines Claude.ai can natively work with Excel, PPTX, DOCX, and PDF files now. With embeddings, atomic labels + hierarchy beat instruction-heavy prompts. Prefer short, concrete sub-labels (e.g., “promotion,” “job security,” “flexibility”) that roll up to a parent “career” rather than a composite instruction like “Total Rewards and Career Growth”. Embedding similarity is not smart enough to figure this out. Today, RPA is cheaper than LLMs in some areas. But it’s a moving target. LLM costs are fall fast: 70–90% declines across major providers in 1.5 years. Therefore, waiting has option value. But classic IT compares static quotes, not declining curves, and hence is likely to under-procure LLM solutions. ⭐ The biggest near-term ROI for LLMs in data science is like ‘boring’ data work: PII tagging, data dictionaries, ER/joins, SDTM mapping, etc.. People expect flashy GenAI, but LLMs can bootstrap schema matching and data-cleaning, speeding engineer verification, which is more useful at scale. You can create an infinite leaflet map with nano banana. Codex CLI with high reasoning effort seems far more comprehensive than Codex online. I asked both to identify the system requirements (URLs to access, software to install, ports to open) for my Tools in Data Science course. Codex CLI got it right one shot (after 10 minutes of thinking). Codex online missed several items even after 4 attempts. The Reod on Elantris might have been triggered by Jaddeth who might be an Autonomy avatar. ChatGPT Output tokens dominate latency. Decoding is sequential (one token depends on all prior tokens), so long completions are the main throttle. Shrinking returned text (e.g., send spans/tags instead of echoing paragraphs) yields a far bigger win on latency than shrinking inputs.

Things I Learned - 07 Sep 2025

This week, I learned: A quick way to get the docs for an npm package is npm view package-name readme. For PyPi, it’s curl -s https://pypi.org/pypi/package-name/json | jq -r .info.description Searching embeddings of text summaries of images improves vision search a lot. Jason Liu LLM vision capabilities are far from enough to click accurately. The AI Digest GLM supports the Anthropic API. So it’s possible to use Claude Code with GLM 4.5. z.ai gitingest has a CLI. uvx gitingest https://github.com/owner/repo fetches the code in the Git repo suitable for passing to an LLM. Claude’s API has access to a code execution tool via the code-execution-2025-08-25 beta header. It runs Python 3.11 with 1GB RAM and 5GB disk space, with Internet disabled. The containers persist for 30 days and can access uploaded files. Anthropic You can use the <script> tag in XML to render RSS, as an alternative to XSLT. Jake Archibald browser-fs-access is a ponyfill for the File System Access API and should be the go-to approach for reading and saving files via the browser. ⭐ To run a Python project directly from GitHub, use uvx --from "git+https://github.com/owner/repo.git@branch" script-name Github1s is a cool tool. Replace github.com with github1s.com to get a VS Code page that opens that repo. It’s fast and very useful to browser repos. For example, https://github1s.com/sanand0/tools-in-data-science-public is my TDS course repo. The /init command in Claude Code and Codex CLI aren’t up to the mark. I believe a good README.md provides better specs for existing repos. There is a window of opportunity to craft a good prompt to generate this from repos. #ai-coding Since LLMs can code, I’d love to see useful CI/CD pipelines where the LLM creates / edits code on the fly. LLMOps might take on a new angle - it’s not just Ops on LLM apps. It’s LLMs as part of DevOps. insertAdjacentHTML is a great API but suffers from XSS vulnerabilities. The TrustedHTML API is an emerging standard to create sanitized HTML strings. Notes from Anthropic’s How we built our multi-agent research system Multi-agent systems are like organizations that can do more than a single human. Multi-agent systems conserve the context window. The top 3 drivers of performance variance: spending more tokens, more tool calls, better models You need to teach (prompt) the orchestrator how to delegate to sub-agents How to avoid task duplication among agents How many sub-agents to spin up for different kinds of tasks Which tools to use for what Provide sub-agents objective, output format, tools/sources, clear task boundaries ⭐ Self-improving agents, e.g. prompt optimizers or tool-testing agents that run and rewrite tool descriptions, are powerful Since agents are stateful, resuming from failure is important. Agent prompts are public Claude models support interleaved thinking that lets them think between tool calls via an anthropic-beta: interleaved-thinking-2025-05-14 header. OpenAI models natively think between tool calls, preserving thinking across calls with the Reasoning API. Gemini lets you control the amount of thinking between tool calls via the thinkingBudget parameter. Anthropic auto-extracts persona vectors or traits by generating LLM responses to the same question with system prompt A (“You are evil”) and B (“You are helpful”) and subtracting the average activations. This helps monitor personality drifts during training, deployment, and even in training data. From My experience creating software with LLM coding agents - Part 2 (Tips) #ai-coding Use standards. Or, write your standards in README.md and tell AGENTS.md / CLAUDE.md to read it. Use a standard file structure. Or in README.md, list what each file is for. Helps agents pick the right file for context. Use a standard build/lint/test setup (e.g. package.json scripts). Or Localize context, i.e. add context in files that use them. E.g. add comments in test files on how to execute them. Keep files modular so agents can read less code and conserver context. Write a developer’s guide. Use with /init in Claude Code / Codex / … or have an LLM generate a developer guide. Edit manually. Agents don’t write great specs. Document the design. Write DETAILED specs to reduce deviations. Share goal while specifying tasks. Helps agents fix related stuff. Use deep reasoning mode, e.g. “think harder” or “ultrathink” in Claude Code, or -c model_reasoning_effort=high in Codex. ⭐ Run parallel agents in different windows and share agent feedback with each other. E.g. Server/API coding in one window. Client coding in another. Plan/ask in one window. Execute in another. Add debug logs to help agents spot errors. Start/stop of long/complex operations, state changes, external interfaces. Include full objects in logs. Prioritize diffs. Trim long contents. ⭐ Give access to debugger, e.g. Chrome remote debugging at localhost:9222 Agents write poor tests. So: Manually add important ones. ⭐ When you find a bug, ask the agent why the tests missed it and have it add. Review and remove useless ones. Ensure agent passes test cases. Tell them not to disable / skip failed tests. Have agents create a new branch per feature and auto-commit. Merge when successful. Feel free to provide a TODO list or update it on the fly. Interrupt with Esc if the agent’s thinking is off-track. When agents struggle, write tools to help them, e.g. JSON splicing, Excel edits, etc. Agents bloat code and features. Explicitly refactor and trim. From A Guide to Gen AI / LLM Vibecoding for Expert Programmers #ai-coding Use vibe coding for stuff you don’t need to maintain. Use vibe coding for stuff you know well enough to review quickly. Use vibe coding for independent tasks where you’re not fussed which ones fail. Vibe coding turns everyone into a team lead. That needs skills: planning, allocation, design, review, feedback, … ⭐ Empathy enables vibe-coding. Empaths allocate work by ability, review regularly, and provide detailed specs and feedback. Have LLMs plan and allocate tasks. “Read this repo. Suggest improvements.” (Review.) “Add these as issues.” “Add the top 3 Sentry log errors as issues.” “Find the easiest issue and solve it with a PR.” Use GitHub issues extensively for planning. ⭐ Create a separate GitHub account for your agent! Let it push. Assign it issues. Treat it like an intern. Ensure agent passes test cases and run till the do, or report the core difficulty. Throw away rubbish code and start again. Issues unsolved in 2-3 tries are too hard for agents or are poorly spec-ed. The context7 and Sequential Thinking MCPs are useful. The O*NET database has a list of tasks/activities, skills, titles, … for each job, at least in the US. It has been updated every few months since 2003. It’s an excellent source to analyze things like the impact of AI across jobs. Anthropic used it to map Claude.ai conversations with educator tasks to identify how educators are using AI. How educators use Claude (apart from learning) is mainly driven by automation of tedious tasks, ideation, and personalization for each student. Curriculum development: Develop games, interactive tools, MCQs, simulations, content Academic research: Bibliographies, statistical modeling, revisions from feedback. Assessments: Student feedback, scoring, summarization. Administration: recommendation letters, meeting agendas, admin tools. OpenAI used feedback from ~1000 annotators to update their model spec. Learnings: Request targeted feedback. Annotators reviewed responses pre-selected for subjectivity against a pre-selected rubric () More examples. Most improvements add examples of good and bad responses. Use detailed prompts. Newer models do well with HUGE system prompts. That’s how we frame better questions. The Great Refactor is refactoring critical open-source C code to Rust using Claude Code, since 70% of vulnerabilities are memory related and Rust is memory-safe. No repo/docs yet. #ai-coding