Things I Learned - 24 Aug 2025

This week, I learned:

Pilots like to have fun, too. While awaiting landing clearance at Kolkata, our IndiGo pilot weaved tight curves just above the clouds at steep angles, giving us stunning views and a mildly thrilling experience. (Or maybe they were just following a flight path.)
Since LLMs allow ANYONE to become “good enough” in most fields (marketing, medicine, management), and so on, here’re are my guesses on the impact. ChatGPT
- Companies-of-one will grow. Sole founder can handle support functions.
- Specialists will generalize. Consultants will code. Marketers will design.
- Wages will compress. Seniors will earn less as juniors can do more.
- Layers will compress. Organizations need fewer hierarchies as 1 person can do more.
- Shadow apps will grow. Anyone can code. Users build apps with prompts, sheets, agents, outside of IT SDLC. Like Excel sheets.
- Governance will grow. Non-experts are acting like experts. Validation is more important.
- Uneconomical apps will thrive. 1:1 tutoring. Continous decision making or A/B testing.
- Leaders will convince better. Persuasion scales.
- Brand (authenticity, trust, skill), Channel (distribution, audience) and Data are primary differentiators.
Codex and Codex CLI now support image attachments.
Notes from discussion on education with Srikanth Nadhumuni
- Indian higher education has done better, e.g. with the IITs, than primary education, where ASER consistently shows that 5th graders can’t read 2nd grade books.
- The National Education Policy (NEP) is focusing on FLN (foundational numeracy and literacy). The goal is universal FLN by 2027.
- Teacing FLN in local languages beats English. Teachers, parents, community support are high. Learning English as a second language is faster. Other countries (France, Germany, Japan) do this.
- Voice LLMs could help, but may not be toddler-ready, nor strong enough in all local langauges.
- But high-quality textbook translation with local nuances is a one-time human-in-the-loop effort that AI can support.
- India’s 1 crore teachers have a mandatory 50 hrs/year training requirement that is largely under-implemented.
- Senthil Mullainathan is working on extracting features from student answers to questions and generating remedial content purely as a black-box. Results beat explainability.
⭐ Creating systems that rapidly improve from feedback is the key to success. Rapidity, quality of improvement, quantity of feedback are all enablers.
CBDC (Central Bank Digital Currency) is RBI’s Web 3.0 protocal. It allows purpose-driven transfers, e.g. money meant for education can only be spent on education.
Meta-prompts with placeholders is a prompt-improvement technique (similar to LLM interviewing). Have LLMs create the prompt with “fill-in-the-blanks”. This makes it much easier for people to fill out.
MassGen is a multi-agent orchestrator. Early days, experimental. It has multiple agents answer, then vote on each others’ answers, picking the best.
DSPy auto-optimizes prompts based on input-output pairs or evals. Typical improvements are ~10-20%. My opinion: avoid. It’s a good idea, but has too much abstraction that hides the implementation. Worth learning from but not implementing unless you (a) have evals + metrics and (b) you KNOW you need to change models and (c) it’s a long-term project where the learning curve is worth it. Claude and ChatGPT
How LLM “Attention” works: It takes each word’s embedding, moves it closer to similar words’ embeddings (e.g. Apple moves towards phone or orange depending on context). More similar words have a higher pull, like gravity. Luis Serrano
- Similarity isn’t symmetric. E.g. “Coke” moves “drink” more towards it, but “drink” pulls “Coke” less, since “drink” could refer to other things.
- Think of the pull (“Tinder similarity”) as “what A wants” (key matrix, which pulls other words) multipled by “what B offers” (query matrix, which is pulled by other words). This leads to two different similarity matrices.
- Multi-head attention is where a neural net gives different weightages to different similarity matrices based on context.
- Value matrix transforms the embedding space so that the next best next-word is more similar.
Reading the Obsidian docs is like a master class in Markdown note-taking. Features like properties, embedding YouTube, bases, tags, etc. provide food for thought. The ObsidianMD subreddit has interesting tips.
- Summarize takeaways on top of each section
- Use atomic notes: one file per idea. Link liberally
- YAML front-matter you can query, e.g. tags, project, status, …
- Use GFM admonitions, e.g. > [!NOTE]
- Store images in a predictable way, e.g. ![Alt text](./img/2025-08-21-screenshot.webp) – ALWAYS with alt text
- Use diff fences for edits / doc changes
- Task lists with inline dates, e.g. - [ ] 2025-08-21 Draft a letter
How to research better. Abhishek Divekar
- Have an objective when researching. Filter research based on that.
- Research backwards. Pick a relevant paper. Go through relevant citations. Typically, there are only 1 or 2 directly related ancestors.
- Don’t waste time searching. Gemini Deep Research is a great way to find and read papers.
- Don’t read the abstract. Read the introduction, which is the summary. It’s just a page. (The abstract is an LLM-ized versionof the introduction. Not as effective.)
MCPs aren’t much more useful than tool calling for developers. They’re powerful when packaging for external parties (non-developers, other teams, clients, etc.). Developers can work just fine with tool calling. Nitin Agarwal
Cybersecurity AI is an open-source LLM-based cyber-security tool that auto scans networks for vulnerabilities.
⭐ LLMs have solved several complex tasks (e.g. topic modelling, summarization). We need to adopt these as building blocks, like functions, and build better solutions. Abhishek Divekar
codex -c model_reasoning_effort=high lets you run Codex CLI with highest reasoning effort. This has a separate limit that resets every 5 hours. https://x.com/thsottiaux/status/1958035261947781262
Truly agentic systems have high Autonomy, Complexity, and Reliability. Workflows have low autonomy. Agentic systems with high autonomy currently aren’t very complex or reliable, but will improve over time. Deepak Sharma
Allow humans to intervene while agent loops execute, even unsolicited, to improve collaboration. Deepak Sharma
Given the early, experimental days of AI, the better KPIs might be more about experimentation (e.g. number of prototypes) than operational (e.g. cost reduction). Krishnakumar Menon
⭐ Policy-as-code is an emerging theme. Allow users to create their own guardrails policy. Or, take existing policy documents and convert them into an LLM-based evaluator. Krishnakumar Menon
⭐ “Potentially nitpicky but competitive advantage in AI goes not so much to those with data but those with a data engine: iterated data aquisition, re-training, evaluation, deployment, telemetry. And whoever can spin it fastest. Slide from Tesla to ~illustrate but concept is general.” Andrej Karpathy, Dec 2022
The skills AI coding needs are very similar to tech-lead’s or an architect’s. Tanika Gupta #ai-coding
- Estimating tool capability & task allocation
- Task breakdown
- Spec-ing: which of user personas, user-journey maps, wireframes, technical architecture, psuedo-code
- Standards: tech stack, tools, linters, security, doc standards
- Git versioning & collaboration
- Code review. (Using AI.) Providing feedback. Modularity, naming, …
- Automated validation
- Post-mortem. Learning from errors and successes, choices LLM made
The ROI of prompting carefully and using meta-prompts is high. Prompt clarity reduces iterations & dead-ends. The initial time spent (10-15 min) pays off with just a single reduced iteration (time to generate + review). Tanika Gupta
⭐ Prefer passing a spec.md to AI coding agents rather than directly typing-in prompts. This lets you meta-prompt and (collaboratively) iterate on the spec.md, version the prompts as specs, and generate specs as documentation. Tanika Gupta
⭐ Models need environments to learn. So far, we have been providing training data. But an environment to interact with, and learn from by itself, is more powerful. That requires a standard for environments. This is a powerful emerging area.
The crux of experimentation is the learning from a postmortem. From that perspective I have been experimenting a lot but not been documenting or learning from that. Decision logs with post mortem are a more apt device for me.
Gemini API includes a url_context tool to explicitly scrape websites. API
Ontologies are more than taxonomies or schemas. They’re truths or rules, e.g., “no person has more than two parents”. Helps consistency checking and inference. #
- Terminological knowledge (T-Box) is domain rules and constraints (e.g., “a student is a person who attends a course”).
- Assertional knowledge (A-Box) is instance-level facts (e.g., “Mary attends Physics 101”).
- Tools & Formats
  - SHACL. A W3C language for validating RDF graphs. ShEx is easier ad popular.
  - Notation3. A W3C assertion and logic language which is a superset of RDF.
  - EYE Reasoner. Prolog-based N3 (Notation3) reasoner. CLI + API-friendly. Can perform rule-based reasoning and generate new triples.
  - HermiT. OWL 2 DL reasoner. Can check consistency, classify ontologies, compute entailments. CLI and Java API. Modern, maintained.
  - Apache Jena. Java framework for RDF/SPARQL. Built-in reasoners (RDFS, OWL mini/micro/full). CLI via riot, arq (SPARQL query engine). Popular for RDF graph stores + inference.
Do developers feel this way? #ai-coding
In another example of vibe coding, an instructor for my TDS course vibe-coded most of an exam using Copilot and Sonnet. 6/8 questions worked one-shot. The two #ai-coding failures were interesting:
- One failed because of sample vs population stats. Copilot asked for sample variance but coded variance() instead of sampleVariance().
- Another failed because of rounding off. NumPy code rounds off differently from Python or JS code.
Meditation is about noticing distraction and returning to focus. So, distraction is necessary and good. #beliefs
#ai-coding can make us overconfident. (At least, it makes me overconfident.) They create surprisingly good output, but only ~20% of the time. I cannot commit to a specific task based on that. Instead, it’s better to rely on AI coding estimates for portfolios, e.g. promise to share something cool without mentioning what. Or do something cool first, then share.
Notes from podcast with Daniel Kahnemann. The Knowledge Project.
- Happiness is pleasure in the moment. Satisfaction is the meaningful story of our life. When we think, we want satisfaction. When we feel, we want happiness. The thinking brain and feeling brain optimize for slightly different things. E.g. The thinking brain packs the calendar with satisfying tasks that the feeling brain feels unhappy executing Both are good for us. We don’t know which matters more.
- Behavior change is harder than we think. Usually, it’s better not to expect success in changing others, or ourselves. Instead, understand why that behavior makes sense. Our behaviour is an equilibrium of forces. Weakening “bad” forces is easier than strengthening “good” forces, since it lowers tension. That’s inversion!
- Behaviours tell us more about situations than personality. We assume otherwise. That’s an attribution error.
- Motivation is complex. People can do bad things for good reasons and vice versa.
- “Feelings get in the way of clear thinking.” Example: I vibe-coded the last 2 questions of TDS GA7 on Claude Code. It didn’t run. I delayed fixing it for 5 days, afraid it would a major effort. It ended up a 2 min fix. It could have been major, but checking would have helped. Fear prevented that.
- Things that hamper clear thinking: intuition, emotion, beliefs. Beliefs are often formed based on people we admire or identify, not reason.
  - Prefer rules, systems and processes. Willpower is an illusion.
  - Delegate decisions to unemotional agents. (But agents misjudge perceived value of gain or loss!)
  - Break down the problem, analyze it, THEM form an intuition. Be disciplined in delaying intuition or forming an opinion
- Environment shapes thinking but it’s not obvious how, e.g. some people work better in noisy cafes. Some colors are more calming.
- Protect dissenters and dissent. It’s painful and costly, and needs nurturing.
NodeJS runs TypeScript files natively.
Codex can clone any GitHub repo. So I can ask it to pull one or more repos, understand their code, and use that as a template or reference. This makes my repositories (and others’) reusable templates. Using newer libraries and platforms becomes easier, too. #ai-coding
Tracking AI runs an IQ test on various LLMs every week. GPT 5 Pro leads, currently, followed by Claude 4 Opus and Gemini 2.5 Pro. It’s surprising how far behind GPT 5 is at the moment.
LLMs are faster than me. So me learning and doing what the LLM says is a bottleneck. Get out of the way. For example do not learn. Do not execute. Do not verify. Give LLMs the tools to deploy, verify and iterate to improve.

Related