2025 | S Anand

Things I Learned - 14 Sep 2025

This week, I learned: Though I’m connected on LinkedIn with people I can’t remember (weak ties), pruning them shrinks serendipity. Weak ties, despite noise, are disproportionately valuable for opportunities, e.g. intros, jobs, and pruning reduces future upside. Science Claude has a Python + Node code interpreter that can access GitHub, PyPi, npm and Google. Simon Willison SuperTinyIcons has very small icons for many websites and is available via CDN. Sample: http://cdn.jsdelivr.net/npm/super-tiny-icons/images/svg/github.svg Clock bench is an LLM benchmark based on how well LLMs tell the time from an analog clock. Humans (89%) are much better than the best model (Gemini 2.5 Pro - 13%). Veo 3 is now available via API. Veo 3 fast is 15s/second. Google ChatGPT has full support for MCPs via Developer mode in Plus and Pro accounts, via “Developer mode”. OpenAI In Pyodide, you can use from js import document and then document.querySelector to manipulate the DOM directly from Python. from pyodide.http import pyfetch lets you use fetch. gtrending is a Python package that fetches trending GitHub repos, users, etc. uvx gtrending repos --language rust --since weekly fetches trending Rust repos of the week. astgrep lets you search in code (across languages) using AST patterns. Like semgrep but more about code search than security. uvx --from ast-grep-cli ast-grep runs from the CLI. Useful for code rewriting, fast linting, code search. hurl is a CLI config-based HTTP automation tool. Useful for tests, bulk (templatized) HTTP requests, etc. rustdesk is an open-source remote desktop software. TeamViewer alternative. Self-hostable. prek is a much faster version of pre-commit - a cross-language pre-commit hook manager. ⭐ mise is a tool version manager. Combines nvm/fnm, pipx, etc. Supports running several tools with a smooth installation. The npm phishing email was a great one. It compromised chalk which is used in most npm packages. This may be one of the best supply chain attacks in recent times and makes me want to pin versions instead of using npx -y. Also makes me glad that I’m sponsoring @isaacs and @sindresorhus - two critical open source maintainers. “I pay for YouTube Premium. For my money, it’s the best bang-for-the-buck subscription service on the market”. - Gavin Andregg LLMs are non deterministic because GPUs add floating point numbers concurrently and FP addition is non associative - order matters. Thinking Machines Claude.ai can natively work with Excel, PPTX, DOCX, and PDF files now. With embeddings, atomic labels + hierarchy beat instruction-heavy prompts. Prefer short, concrete sub-labels (e.g., “promotion,” “job security,” “flexibility”) that roll up to a parent “career” rather than a composite instruction like “Total Rewards and Career Growth”. Embedding similarity is not smart enough to figure this out. Today, RPA is cheaper than LLMs in some areas. But it’s a moving target. LLM costs are fall fast: 70–90% declines across major providers in 1.5 years. Therefore, waiting has option value. But classic IT compares static quotes, not declining curves, and hence is likely to under-procure LLM solutions. ⭐ The biggest near-term ROI for LLMs in data science is like ‘boring’ data work: PII tagging, data dictionaries, ER/joins, SDTM mapping, etc.. People expect flashy GenAI, but LLMs can bootstrap schema matching and data-cleaning, speeding engineer verification, which is more useful at scale. You can create an infinite leaflet map with nano banana. Codex CLI with high reasoning effort seems far more comprehensive than Codex online. I asked both to identify the system requirements (URLs to access, software to install, ports to open) for my Tools in Data Science course. Codex CLI got it right one shot (after 10 minutes of thinking). Codex online missed several items even after 4 attempts. The Reod on Elantris might have been triggered by Jaddeth who might be an Autonomy avatar. ChatGPT Output tokens dominate latency. Decoding is sequential (one token depends on all prior tokens), so long completions are the main throttle. Shrinking returned text (e.g., send spans/tags instead of echoing paragraphs) yields a far bigger win on latency than shrinking inputs.

I use LLMs to create photos and comics. But they can generate any kind of illustration. So why limit ourselves? My problem is imagination: I know little about art. So, I asked ChatGPT, Claude, and DeepSeek: Suggest 10 unusual illustration styles that are not popular in social media yet but are visually striking. I would like to have an LLM create images in that style. For each of those, show me an (and link to) an online image in that style. ...

My Tools in Data Science course uses LLMs for assessments. We use LLMs to Suggest project ideas (I pick), e.g. https://chatgpt.com/share/6741d870-73f4-800c-a741-af127d20eec7 Draft the project brief (we edit), e.g. https://docs.google.com/document/d/1VgtVtypnVyPWiXied5q0_CcAt3zufOdFwIhvDDCmPXk/edit Propose scoring rubrics (we tweak), e.g. https://chatgpt.com/share/68b8eef6-60ec-800c-8b10-cfff1a571590 Score code against the rubric (we test), e.g. https://github.com/sanand0/tds-evals/blob/5cfabf09c21c2884623e0774eae9a01db212c76a/llm-browser-agent/process_submissions.py Analyze the results (we refine), e.g. https://chatgpt.com/share/68b8f962-16a4-800c-84ff-fb9e3f0c779a This changed our assessments process. It’s easier and better. Earlier, TAs took 2 weeks to evaluate 500 code submissions. In the example above, it took 2 hours. Quality held up: LLMs match my judgement as closely as TAs do but run fast and at scale. ...

Slides for my DataHack Summit talk (controversially) titled RIP Data Scientists are at https://sanand0.github.io/talks/2025-08-21-rip-data-scientists/ Summary: as data scientists we explore, clean, model, explain, deploy, and anonymize datasets. I live-vibe-coded each step with DGCA data in 35 minutes using ChatGPT. Of course, it’s the tasks that are dying, not the role. Data scientists will leverage AI, differentiate on other skills, and move on. But the highlight was an audience comment: “I’m no data scientist. I’m a domain person. I’ll tell you all this: If you don’t follow these practices, you won’t have a job with me!” ...

Things I Learned - 07 Sep 2025

This week, I learned: A quick way to get the docs for an npm package is npm view package-name readme. For PyPi, it’s curl -s https://pypi.org/pypi/package-name/json | jq -r .info.description Searching embeddings of text summaries of images improves vision search a lot. Jason Liu LLM vision capabilities are far from enough to click accurately. The AI Digest GLM supports the Anthropic API. So it’s possible to use Claude Code with GLM 4.5. z.ai gitingest has a CLI. uvx gitingest https://github.com/owner/repo fetches the code in the Git repo suitable for passing to an LLM. Claude’s API has access to a code execution tool via the code-execution-2025-08-25 beta header. It runs Python 3.11 with 1GB RAM and 5GB disk space, with Internet disabled. The containers persist for 30 days and can access uploaded files. Anthropic You can use the <script> tag in XML to render RSS, as an alternative to XSLT. Jake Archibald browser-fs-access is a ponyfill for the File System Access API and should be the go-to approach for reading and saving files via the browser. ⭐ To run a Python project directly from GitHub, use uvx --from "git+https://github.com/owner/repo.git@branch" script-name Github1s is a cool tool. Replace github.com with github1s.com to get a VS Code page that opens that repo. It’s fast and very useful to browser repos. For example, https://github1s.com/sanand0/tools-in-data-science-public is my TDS course repo. The /init command in Claude Code and Codex CLI aren’t up to the mark. I believe a good README.md provides better specs for existing repos. There is a window of opportunity to craft a good prompt to generate this from repos. #ai-coding Since LLMs can code, I’d love to see useful CI/CD pipelines where the LLM creates / edits code on the fly. LLMOps might take on a new angle - it’s not just Ops on LLM apps. It’s LLMs as part of DevOps. insertAdjacentHTML is a great API but suffers from XSS vulnerabilities. The TrustedHTML API is an emerging standard to create sanitized HTML strings. Notes from Anthropic’s How we built our multi-agent research system Multi-agent systems are like organizations that can do more than a single human. Multi-agent systems conserve the context window. The top 3 drivers of performance variance: spending more tokens, more tool calls, better models You need to teach (prompt) the orchestrator how to delegate to sub-agents How to avoid task duplication among agents How many sub-agents to spin up for different kinds of tasks Which tools to use for what Provide sub-agents objective, output format, tools/sources, clear task boundaries ⭐ Self-improving agents, e.g. prompt optimizers or tool-testing agents that run and rewrite tool descriptions, are powerful Since agents are stateful, resuming from failure is important. Agent prompts are public Claude models support interleaved thinking that lets them think between tool calls via an anthropic-beta: interleaved-thinking-2025-05-14 header. OpenAI models natively think between tool calls, preserving thinking across calls with the Reasoning API. Gemini lets you control the amount of thinking between tool calls via the thinkingBudget parameter. Anthropic auto-extracts persona vectors or traits by generating LLM responses to the same question with system prompt A (“You are evil”) and B (“You are helpful”) and subtracting the average activations. This helps monitor personality drifts during training, deployment, and even in training data. From My experience creating software with LLM coding agents - Part 2 (Tips) #ai-coding Use standards. Or, write your standards in README.md and tell AGENTS.md / CLAUDE.md to read it. Use a standard file structure. Or in README.md, list what each file is for. Helps agents pick the right file for context. Use a standard build/lint/test setup (e.g. package.json scripts). Or Localize context, i.e. add context in files that use them. E.g. add comments in test files on how to execute them. Keep files modular so agents can read less code and conserver context. Write a developer’s guide. Use with /init in Claude Code / Codex / … or have an LLM generate a developer guide. Edit manually. Agents don’t write great specs. Document the design. Write DETAILED specs to reduce deviations. Share goal while specifying tasks. Helps agents fix related stuff. Use deep reasoning mode, e.g. “think harder” or “ultrathink” in Claude Code, or -c model_reasoning_effort=high in Codex. ⭐ Run parallel agents in different windows and share agent feedback with each other. E.g. Server/API coding in one window. Client coding in another. Plan/ask in one window. Execute in another. Add debug logs to help agents spot errors. Start/stop of long/complex operations, state changes, external interfaces. Include full objects in logs. Prioritize diffs. Trim long contents. ⭐ Give access to debugger, e.g. Chrome remote debugging at localhost:9222 Agents write poor tests. So: Manually add important ones. ⭐ When you find a bug, ask the agent why the tests missed it and have it add. Review and remove useless ones. Ensure agent passes test cases. Tell them not to disable / skip failed tests. Have agents create a new branch per feature and auto-commit. Merge when successful. Feel free to provide a TODO list or update it on the fly. Interrupt with Esc if the agent’s thinking is off-track. When agents struggle, write tools to help them, e.g. JSON splicing, Excel edits, etc. Agents bloat code and features. Explicitly refactor and trim. From A Guide to Gen AI / LLM Vibecoding for Expert Programmers #ai-coding Use vibe coding for stuff you don’t need to maintain. Use vibe coding for stuff you know well enough to review quickly. Use vibe coding for independent tasks where you’re not fussed which ones fail. Vibe coding turns everyone into a team lead. That needs skills: planning, allocation, design, review, feedback, … ⭐ Empathy enables vibe-coding. Empaths allocate work by ability, review regularly, and provide detailed specs and feedback. Have LLMs plan and allocate tasks. “Read this repo. Suggest improvements.” (Review.) “Add these as issues.” “Add the top 3 Sentry log errors as issues.” “Find the easiest issue and solve it with a PR.” Use GitHub issues extensively for planning. ⭐ Create a separate GitHub account for your agent! Let it push. Assign it issues. Treat it like an intern. Ensure agent passes test cases and run till the do, or report the core difficulty. Throw away rubbish code and start again. Issues unsolved in 2-3 tries are too hard for agents or are poorly spec-ed. The context7 and Sequential Thinking MCPs are useful. The O*NET database has a list of tasks/activities, skills, titles, … for each job, at least in the US. It has been updated every few months since 2003. It’s an excellent source to analyze things like the impact of AI across jobs. Anthropic used it to map Claude.ai conversations with educator tasks to identify how educators are using AI. How educators use Claude (apart from learning) is mainly driven by automation of tedious tasks, ideation, and personalization for each student. Curriculum development: Develop games, interactive tools, MCQs, simulations, content Academic research: Bibliographies, statistical modeling, revisions from feedback. Assessments: Student feedback, scoring, summarization. Administration: recommendation letters, meeting agendas, admin tools. OpenAI used feedback from ~1000 annotators to update their model spec. Learnings: Request targeted feedback. Annotators reviewed responses pre-selected for subjectivity against a pre-selected rubric () More examples. Most improvements add examples of good and bad responses. Use detailed prompts. Newer models do well with HUGE system prompts. That’s how we frame better questions. The Great Refactor is refactoring critical open-source C code to Rust using Claude Code, since 70% of vulnerabilities are memory related and Rust is memory-safe. No repo/docs yet. #ai-coding

Problems that only one student can solve

Jaidev’s The Bridge of Asses reminded me of my first coding bridge. It was 1986. I’d completed class 6 and was in a summer coding camp at school. M Kothandaraman (“MK Sir”) was teaching us how to swap variables in BASIC on the BBC Micro. This code prints the first name in alphabetical order (“Alice”): 10 A = "Bob" 20 B = "Alice" 30 IF A > B THEN 40 TEMP = A 50 A = B 60 B = TEMP 70 END 80 PRINT A The homework was to print all details of the first alphabetical name: ...

Things I Learned - 31 Aug 2025

This week, I learned: ⭐ Habit tooling can expand habit-building capacity. I already use tools to support my habits. Habit stacking “sticks” new habits to old ones. By sticking new habits into existing tools, I can automate this. (For example, I extended my meeting record fish script with an echo reminding me to write the meeting goal, my role, practice kind candor, and measure effectiveness.) ⭐ The crux of Arthashastra’s advice on defeating an enemy is removing support: मित्राणि भेदयेत्, मित्रं च शत्रोः। Dis-unite friends, enemies from their allies. अमात्यान् द्रव्यैः, जनपदं भेदयेत्। Bribe their ministers, sow discord among subjects. बलं चोच्छिनत्ति, कोशं चोपशोषयेत्। Break the army, exhaust the treasury. ततोऽन्योन्यवैरिणं कुर्यात्। Then set them against each other as mutual foes. Consensus is dangerous in venture capital. “Because if everyone inside the firm sees the same thing, it probably means the market already does too. And when the market sees it, the upside is limited.” Guillermo Flor This CodeMonkeys paper suggests running a mixture of agents in parallel for multiple code + test tasks and auto-pick the best by running and LLM-rewriting tests. #ai-coding We think a new pricing model might emerge for outsourced knowledge work that leads to lower client cost & quality at higher margins. ChatGPT LLMs do the task; multiple LLMs cross-check. Three tiers: Auto-pass (no human), Light review, Full review. Each tier has a clear price and SLA. Using LLMs as validators is one of the safest ways of introducing LLMs into a process. If the human ignores it, no loss. If it spots new errors or the human gets new ideas, quality improves at low cost. I finally get why elders in my family prefers eating in a pure (rather than a mixed) vegetarian restaurant. When in Vietnam, I could pick dishes in pure vegetarian restaurants without worrying about whether they were meat or not, even when I didn’t understand what the dishes were about. That confidence to proceed without fear is a powerful enabler. There’s emerging evidence that jobs automated by (not augmented or unaffected by) AI have fewer entry-level jobs. Experienced workers are less affected. Compensation is affected less. Canaries in the Coal Mine CloudFlare AutoRAG lets you index any website and expose it as an API + Chatbot with a model of your choice. This is available on the free tier, too. The API follows NLWeb, Microsoft’s open standard for LLMs and MCPs to interact with websites in natural language. Cloudflare has an image transformation API that also acts as a CDN. Apart from basic transformations, it can auto detect and crop faces, remove backgrounds, and more. oklch seems the best color model supported by all modern browsers. We can use relative colors with it, making color palette design much easier: #darker-color { background-color: oklch(from var(--base-color) calc(l - 0.15) c h); } Malware embedded in the compromised nx build tool leveraged Claude/Gemini CLI to offload fingerprintable password-gathering code into prompts, making detection significantly harder for traditional security tools. semgrep Codex CLI has several updates VS Code plugin with remote container execution Drag & drop image support PR Docs Queued (editable) messages PR Web search via --search PR Esc-Esc to edit previous messages Docs Our team passed an image to an LLM for OCR (especially to identify formatting, e.g. bold, italics, etc.), then passed the output and the image to another LLM for improvement. Interestingly, the best LLM (Gemini 2.5 Pro, for this sample of 8 images) out-performed the two-stage workflow. Perhaps incorrect results confuse more than the correct results help? This needs more research. OpenAI now has a series of llms.txt URLs. Rust seems to catch errors better at compile-time than many typed languages like TypeScript. That makes it better for larger projects (or for AI coding). The unexpected productivity boost of Rust #ai-coding Image APIs that support hotlinking and searching (useful to support LLM-generated content, e.g. slides or presentations): Openverse: CC, scale, simple REST. Wikimedia Commons: CC, historic/diagram breadth. Pixabay: easy, free, broad, but license fuzzier. Pexels: beautiful but custom license. Unsplash: stylish but restrictive. OpenClipart: niche, useful for icons. ⭐ For mental tiredness, the impact of sleep > workload > mood/stress > environment (travel, light, air) > posture > food/drink. To rebound, nap > bright light > exercise > fresh air > water > posture/breathing. ChatGPT In my internal meetings, I tend to ask many questions (1 per 8 turns), but fewer open-ended ones (~40%) compared with others. I also praise once every 22 turns - among the lowest in our group. I could ask more open-ended questions and acknowledge good work. # When seeking advice, people sometimes think aloud, become repetitive, and introduce detail before clarifying intent. Kind candor helps. You can: State time boundaries. “We have 20 min. If we spend 5 min on your question, we’ll have 15 for solutions.” Clarify intent upfront. “Before we dive in: What can I help with?” Interrupt, summarize, clarify early. “Cooperative interruptions” are seen as supportive. E.g. “I get this: six accelerators, two done. Great! What can I help with? To accelerate?” rclone is the cleanest way to copy files from Google Drive. I ran rclone config to set it up with Google Drive via native app OAuth key. Then, rclone copy "gdrive:" transcripts/ --drive-shared-with-me --include "**Transcript*.docx" copied all transcripts including “Shared with me” files (not just drives). The --drive-shared-with-me enables this. What makes Claude Code so damn good has a detailed review of Claude Code’s system prompt and is a great for ideas on using LLMs for coding. #ai-coding With AI coding, task breakdown, context right-sizing, and automated testing are key levers. #ai-coding

Here’s my current answer when asked, “How do I use LLMs better?” Use the best models. O3 (via $20 ChatGPT), Gemini 2.5 Pro (free on Gemini app), or Claude 4 Opus (via $20 Claude). The older models are the default and far worse. Use audio. Speak & listen, don’t just type & read. It’s harder to skip and easier to stay in the present when listening. It’s also easier to ramble than to type. Write down what fails. Maintain that “impossibility list”. There is a jagged edge to AI. Retry every month, you can see how that edge shifts. Wait for better models. Many problems can be solved just by waiting a few months for a new model. You don’t need to find or build your own app. Give LLMs lots of context. It’s a huge enabler. Search, copy-pasteable files, past chats, connectors, APIs/tools, … Have LLMs write code. LLMs are bad at math. They’re good at code. Code hallucinates less. So you get creativity and reliability. Learn AI coding. 1. Build a game with ChatGPT/Claude/Gemini. 2. Create a tool useful to you. 3. Publish it on GitHub. APIs are cheaper than self hosting. Don’t bother running your own models. Datasets matter. Building custom models does not. You can always fine-tune a newer model if you have the datasets. Comic via https://tools.s-anand.net/picbook/ ...

The Surprising Power of LLMs: Jack-of-All-Trades

I asked ChatGPT to analyze our daily innovation-call transcripts. I used command-line tools to fetch the transcripts and convert them into text: # Copy the transcripts rclone copy "gdrive:" . --drive-shared-with-me --include "Innovation*Transcript*.docx" # Convert Word documents to Markdown for f in *.docx; do pandoc "$f" -f docx -t gfm+tex_math_dollars --wrap=none -o "${f%.docx}.md" done # Compress into a single file tar -cvzf transcripts.tgz *.md … and uploaded it to ChatGPT with this prompt: ...

If I turned female, this is what I’d look like. gpt-image-1: “Make this person female with minimal changes.” Hm…. maybe… just as an experiment…? LinkedIn

Measuring talking time with LLMs

I record my conversations these days, mainly for LLM use. I use them in 3 ways: Summarize what I learned and the next steps. Ideate as raw material for my Ideator tool: /blog/llms-as-idea-connection-machines/ Analyze my transcript statistics. For example, I learned that: When I’m interviewing, others ramble (speak long per turn), I am brief (less words/turn) and quiet (lower voice share). In one interview, I spoke ~30 words per turn. Others spoke ~120. My share was ~10%. When I’m advising or demo-ing, I ramble. I spoke ~120 words per turn in an advice call, and took ~75% of the talk-time. This pattern is independent of meeting length and group size. I used Codex CLI (command-line tool) for this, with the prompt: ...

Things I Learned - 24 Aug 2025

This week, I learned: Pilots like to have fun, too. While awaiting landing clearance at Kolkata, our IndiGo pilot weaved tight curves just above the clouds at steep angles, giving us stunning views and a mildly thrilling experience. (Or maybe they were just following a flight path.) Since LLMs allow ANYONE to become “good enough” in most fields (marketing, medicine, management), and so on, here’re are my guesses on the impact. ChatGPT Companies-of-one will grow. Sole founder can handle support functions. Specialists will generalize. Consultants will code. Marketers will design. Wages will compress. Seniors will earn less as juniors can do more. Layers will compress. Organizations need fewer hierarchies as 1 person can do more. Shadow apps will grow. Anyone can code. Users build apps with prompts, sheets, agents, outside of IT SDLC. Like Excel sheets. Governance will grow. Non-experts are acting like experts. Validation is more important. Uneconomical apps will thrive. 1:1 tutoring. Continous decision making or A/B testing. Leaders will convince better. Persuasion scales. Brand (authenticity, trust, skill), Channel (distribution, audience) and Data are primary differentiators. Codex and Codex CLI now support image attachments. Notes from discussion on education with Srikanth Nadhumuni Indian higher education has done better, e.g. with the IITs, than primary education, where ASER consistently shows that 5th graders can’t read 2nd grade books. The National Education Policy (NEP) is focusing on FLN (foundational numeracy and literacy). The goal is universal FLN by 2027. Teacing FLN in local languages beats English. Teachers, parents, community support are high. Learning English as a second language is faster. Other countries (France, Germany, Japan) do this. Voice LLMs could help, but may not be toddler-ready, nor strong enough in all local langauges. But high-quality textbook translation with local nuances is a one-time human-in-the-loop effort that AI can support. India’s 1 crore teachers have a mandatory 50 hrs/year training requirement that is largely under-implemented. Senthil Mullainathan is working on extracting features from student answers to questions and generating remedial content purely as a black-box. Results beat explainability. ⭐ Creating systems that rapidly improve from feedback is the key to success. Rapidity, quality of improvement, quantity of feedback are all enablers. CBDC (Central Bank Digital Currency) is RBI’s Web 3.0 protocal. It allows purpose-driven transfers, e.g. money meant for education can only be spent on education. Meta-prompts with placeholders is a prompt-improvement technique (similar to LLM interviewing). Have LLMs create the prompt with “fill-in-the-blanks”. This makes it much easier for people to fill out. MassGen is a multi-agent orchestrator. Early days, experimental. It has multiple agents answer, then vote on each others’ answers, picking the best. DSPy auto-optimizes prompts based on input-output pairs or evals. Typical improvements are ~10-20%. My opinion: avoid. It’s a good idea, but has too much abstraction that hides the implementation. Worth learning from but not implementing unless you (a) have evals + metrics and (b) you KNOW you need to change models and (c) it’s a long-term project where the learning curve is worth it. Claude and ChatGPT How LLM “Attention” works: It takes each word’s embedding, moves it closer to similar words’ embeddings (e.g. Apple moves towards phone or orange depending on context). More similar words have a higher pull, like gravity. Luis Serrano Similarity isn’t symmetric. E.g. “Coke” moves “drink” more towards it, but “drink” pulls “Coke” less, since “drink” could refer to other things. Think of the pull (“Tinder similarity”) as “what A wants” (key matrix, which pulls other words) multipled by “what B offers” (query matrix, which is pulled by other words). This leads to two different similarity matrices. Multi-head attention is where a neural net gives different weightages to different similarity matrices based on context. Value matrix transforms the embedding space so that the next best next-word is more similar. Reading the Obsidian docs is like a master class in Markdown note-taking. Features like properties, embedding YouTube, bases, tags, etc. provide food for thought. The ObsidianMD subreddit has interesting tips. Summarize takeaways on top of each section Use atomic notes: one file per idea. Link liberally YAML front-matter you can query, e.g. tags, project, status, … Use GFM admonitions, e.g. > [!NOTE] Store images in a predictable way, e.g. ![Alt text](./img/2025-08-21-screenshot.webp) – ALWAYS with alt text Use diff fences for edits / doc changes Task lists with inline dates, e.g. - [ ] 2025-08-21 Draft a letter How to research better. Abhishek Divekar Have an objective when researching. Filter research based on that. Research backwards. Pick a relevant paper. Go through relevant citations. Typically, there are only 1 or 2 directly related ancestors. Don’t waste time searching. Gemini Deep Research is a great way to find and read papers. Don’t read the abstract. Read the introduction, which is the summary. It’s just a page. (The abstract is an LLM-ized versionof the introduction. Not as effective.) MCPs aren’t much more useful than tool calling for developers. They’re powerful when packaging for external parties (non-developers, other teams, clients, etc.). Developers can work just fine with tool calling. Nitin Agarwal Cybersecurity AI is an open-source LLM-based cyber-security tool that auto scans networks for vulnerabilities. ⭐ LLMs have solved several complex tasks (e.g. topic modelling, summarization). We need to adopt these as building blocks, like functions, and build better solutions. Abhishek Divekar codex -c model_reasoning_effort=high lets you run Codex CLI with highest reasoning effort. This has a separate limit that resets every 5 hours. https://x.com/thsottiaux/status/1958035261947781262 Truly agentic systems have high Autonomy, Complexity, and Reliability. Workflows have low autonomy. Agentic systems with high autonomy currently aren’t very complex or reliable, but will improve over time. Deepak Sharma Allow humans to intervene while agent loops execute, even unsolicited, to improve collaboration. Deepak Sharma Given the early, experimental days of AI, the better KPIs might be more about experimentation (e.g. number of prototypes) than operational (e.g. cost reduction). Krishnakumar Menon ⭐ Policy-as-code is an emerging theme. Allow users to create their own guardrails policy. Or, take existing policy documents and convert them into an LLM-based evaluator. Krishnakumar Menon ⭐ “Potentially nitpicky but competitive advantage in AI goes not so much to those with data but those with a data engine: iterated data aquisition, re-training, evaluation, deployment, telemetry. And whoever can spin it fastest. Slide from Tesla to ~illustrate but concept is general.” Andrej Karpathy, Dec 2022 The skills AI coding needs are very similar to tech-lead’s or an architect’s. Tanika Gupta #ai-coding Estimating tool capability & task allocation Task breakdown Spec-ing: which of user personas, user-journey maps, wireframes, technical architecture, psuedo-code Standards: tech stack, tools, linters, security, doc standards Git versioning & collaboration Code review. (Using AI.) Providing feedback. Modularity, naming, … Automated validation Post-mortem. Learning from errors and successes, choices LLM made The ROI of prompting carefully and using meta-prompts is high. Prompt clarity reduces iterations & dead-ends. The initial time spent (10-15 min) pays off with just a single reduced iteration (time to generate + review). Tanika Gupta ⭐ Prefer passing a spec.md to AI coding agents rather than directly typing-in prompts. This lets you meta-prompt and (collaboratively) iterate on the spec.md, version the prompts as specs, and generate specs as documentation. Tanika Gupta ⭐ Models need environments to learn. So far, we have been providing training data. But an environment to interact with, and learn from by itself, is more powerful. That requires a standard for environments. This is a powerful emerging area. The crux of experimentation is the learning from a postmortem. From that perspective I have been experimenting a lot but not been documenting or learning from that. Decision logs with post mortem are a more apt device for me. Gemini API includes a url_context tool to explicitly scrape websites. API Ontologies are more than taxonomies or schemas. They’re truths or rules, e.g., “no person has more than two parents”. Helps consistency checking and inference. # Terminological knowledge (T-Box) is domain rules and constraints (e.g., “a student is a person who attends a course”). Assertional knowledge (A-Box) is instance-level facts (e.g., “Mary attends Physics 101”). Tools & Formats SHACL. A W3C language for validating RDF graphs. ShEx is easier ad popular. Notation3. A W3C assertion and logic language which is a superset of RDF. EYE Reasoner. Prolog-based N3 (Notation3) reasoner. CLI + API-friendly. Can perform rule-based reasoning and generate new triples. HermiT. OWL 2 DL reasoner. Can check consistency, classify ontologies, compute entailments. CLI and Java API. Modern, maintained. Apache Jena. Java framework for RDF/SPARQL. Built-in reasoners (RDFS, OWL mini/micro/full). CLI via riot, arq (SPARQL query engine). Popular for RDF graph stores + inference. Do developers feel this way? #ai-coding In another example of vibe coding, an instructor for my TDS course vibe-coded most of an exam using Copilot and Sonnet. 6/8 questions worked one-shot. The two #ai-coding failures were interesting: One failed because of sample vs population stats. Copilot asked for sample variance but coded variance() instead of sampleVariance(). Another failed because of rounding off. NumPy code rounds off differently from Python or JS code. Meditation is about noticing distraction and returning to focus. So, distraction is necessary and good. #beliefs #ai-coding can make us overconfident. (At least, it makes me overconfident.) They create surprisingly good output, but only ~20% of the time. I cannot commit to a specific task based on that. Instead, it’s better to rely on AI coding estimates for portfolios, e.g. promise to share something cool without mentioning what. Or do something cool first, then share. Notes from podcast with Daniel Kahnemann. The Knowledge Project. Happiness is pleasure in the moment. Satisfaction is the meaningful story of our life. When we think, we want satisfaction. When we feel, we want happiness. The thinking brain and feeling brain optimize for slightly different things. E.g. The thinking brain packs the calendar with satisfying tasks that the feeling brain feels unhappy executing Both are good for us. We don’t know which matters more. Behavior change is harder than we think. Usually, it’s better not to expect success in changing others, or ourselves. Instead, understand why that behavior makes sense. Our behaviour is an equilibrium of forces. Weakening “bad” forces is easier than strengthening “good” forces, since it lowers tension. That’s inversion! Behaviours tell us more about situations than personality. We assume otherwise. That’s an attribution error. Motivation is complex. People can do bad things for good reasons and vice versa. “Feelings get in the way of clear thinking.” Example: I vibe-coded the last 2 questions of TDS GA7 on Claude Code. It didn’t run. I delayed fixing it for 5 days, afraid it would a major effort. It ended up a 2 min fix. It could have been major, but checking would have helped. Fear prevented that. Things that hamper clear thinking: intuition, emotion, beliefs. Beliefs are often formed based on people we admire or identify, not reason. Prefer rules, systems and processes. Willpower is an illusion. Delegate decisions to unemotional agents. (But agents misjudge perceived value of gain or loss!) Break down the problem, analyze it, THEM form an intuition. Be disciplined in delaying intuition or forming an opinion Environment shapes thinking but it’s not obvious how, e.g. some people work better in noisy cafes. Some colors are more calming. Protect dissenters and dissent. It’s painful and costly, and needs nurturing. NodeJS runs TypeScript files natively. Codex can clone any GitHub repo. So I can ask it to pull one or more repos, understand their code, and use that as a template or reference. This makes my repositories (and others’) reusable templates. Using newer libraries and platforms becomes easier, too. #ai-coding Tracking AI runs an IQ test on various LLMs every week. GPT 5 Pro leads, currently, followed by Claude 4 Opus and Gemini 2.5 Pro. It’s surprising how far behind GPT 5 is at the moment. LLMs are faster than me. So me learning and doing what the LLM says is a bottleneck. Get out of the way. For example do not learn. Do not execute. Do not verify. Give LLMs the tools to deploy, verify and iterate to improve.

LLMs as Idea Connection Machines

In a recent talk at IIT Madras, I highlighted how large language models (LLMs) are taking over every subject of the MBA curriculum: from finance to marketing to operations to HR, and even strategy. One field that seemed hard to crack was innovation. Innovation also happens to be my role. But LLMs are encroaching into that too. LLMs are great connection machines: fusing two ideas into a new, useful, surprising idea. That’s core to innovation. If we can get LLMs daydreaming, they could be innovative too. ...

Things I Learned - 17 Aug 2025

This week, I learned: Git partial clone lets you fetch files on-demand! E.g. git clone --filter='blobs:size=100k' <repo> will clone files under 100K and fetch the rest only on checkout. Over time, Git LFS capabilities will migrate into native Git. Ref ⭐ From Daniel Kahneman, The Knowledge Project Podcast. Key lesson. Have lower expectations. Behavior change is hard. Happiness is pleasure in the moment. Satisfaction is the meaningful story of our life. When reflecting, the thinking brain wants satisfaction. When feeling, the feeling brain feels happiness. The 2 brains optimize for different things. The thinking brain packs the calendar with satisfying tasks that the feeling brain hates doing. Happiness & pleasure are both are good for us. We don’t know which matters more. Behavior change is harder than most people think. Usually, it’s better not to expect success. Changing others, or ourselves. Instead, understand the cause of that behavior. Behaviour is an equilibrium of forces. Weakening forces preventing right behaviour is easier than strengthening forward forces. It lowers tension. That’s inversion! Behaviours are more about situations than personality. We assume otherwise - that’s an attribution error. Environment shapes thinking but it’s not obvious how, e.g. some people work better in noisy cafes. Some colors are more calming. Leadership & delegation Motivation is complex. People can do bad things for good reasons and vice versa. So, delegate decisions to unemotional agents. But agents misjudge perceived value of gain or loss! People prefer over-confident intuitive leaders over slow, deliberate leaders. Protect dissenters and dissent. It’s painful and costly, and needs nurturing. Negotiation is about understanding, not convincing. “Feelings get in the way of clear thinking.” Example: I vibe-coded the last 2 questions of TDS GA7 on Claude Code. It didn’t run. I delayed fixing it for 5 days, afraid it would a major effort. It ended up a 2 min fix. It could have been major, but checking would have helped. Fear prevented that. Intuition, emotion, beliefs hamper clear thinking. Beliefs are often formed based on people we admire or identify, not reason. What enables clear thinking (all are hard): Pragmatism. Don’t threaten your identity, the leader, etc. Else none of this works. Rules, systems and processes. Willpower is illusion. Alignment is an illusion. “Whereever there is judgement, there is noise, and more than what people think.” Standards. Shared, consistent scales of evaluation. Super-forecasters use probability scales. Deliberation. Slow decision making. Decomposition. Break down the problem, analyze it, THEN form an intuition. Be disciplined in delaying intuition or forming an opinion. Pre-mortems. “Write the history of the disaster this decision led to.” Decision journals with post-mortems. Pros, cons and alternatives from failed decisions, e.g. Ray Dalio’s principles. Change of mind. Independent data. Use data. Keep evidence gatherers independent of decision makers. Preparation. Have decision makers write down decisions before discussing. Increases diversity. DuckDB’s feature engineering capabilites are faster than scikit-learn. DuckDB Developers are encoding their entire SDLC workflow into Claude commands ChatGPT #ai-coding Commands are used for: Requirements: Research sub-agent, task breakdown into todos.md, creating specs.md from todos.md Progress tracking: session logging, effort tracking, updating status, planning next steps Project setup: initializing, adding deps, scaffolding features Development: code review, debug error (five whys), explain code, refactor code Optimization: optimize build, DB, caching Testing: TDD, generate test cases, set up unit/integration/E2E testing, analyze coverage Security: security audits, dependency vulnerability scans Integration: sync tasks between GitHub and Linear (two-way issue synchronization, PR linking) Deployment: prepare releases, hotfix deploys, rollbacks, containerization, CI pipeline setup Patterns of usage Sub-agents Command handoffs, i.e. one command invoking another Shared among a team in a repo, enforcing standards & sharing best practices Integration with specific tools / APIs (e.g. Linear) ⭐ LLMs can hyper-personalize demos. E.g. an LLM document generator demo accepts a role, document type, and prompt. The demo-er says “Bank, LinkedIn marketing” and the LLM auto-populates the fields aptly, re-purposing the demo. From the GPT 5 coding cheatsheet: Be precise and avoid conflicting information. Use a prompt optimizer to check for inconsistencies. Use the right reasoning effort. Prefer medium or low reasoning to avoid overthinking simple problems. Use XML-like syntax to help structure instructions Avoid overly firm language, e.g. “You MUST be THOROUGH” vs “Thoroughly”. Give room for planning and self-reflection. Explain what to do in steps, asking it to think deeply Control the eagerness of your coding agent, e.g. do not ask for confirmation, parallelize tool calls, use more tools, etc. ⭐ Assets are any leveragable stored capability. Money is one, but there are several one can “invest” in, be an agent of, or perhaps steal. Wealth (investments, income) Regenerative assets (land, carbon credits, renewables) Contacts (reference customers, hiring pipeline, talent bench, weak-ties) Distribution channels (repeatable routes to users: partnerships, marketplaces, APIs, SEO) Attention (your audience, whom you can reach directly) Trust/reputation in communities (community capital in employers, clients, forums, society, search keywords) Personal brand “edges” (moral authority, values lived aloud, distinctive taste or stance) Data (your clean, labeled, joined data corpus) Code (models, algorithms, components, templates, libraries, tools, evals; versioned) Content (blog posts, video tutorials, case studies, demos, stories, slides, docs) Knowledge (notes, decision logs, knowledge graph, institutional memory) Playbooks & runbooks (process checklists that survived fire, SOPs, scenario plans) Habits & policies (operating cadence, rituals, governance & compliance muscle) Optionality (cash buffer, credit lines, slack time, real options, small bets) Agreements (MSAs/SLAs, pre-negotiated contracts) IP (copyrights, trade secrets, trademarks) Health & energy reserves ⭐ Intense negative emotions get in the way of clear thinking. Curiosity, humor, kindness, and gratitude help. (Intense positive emotions like awe, passion, etc. help creativity and are not so bad.) #beliefs I like to think I’m a Python expert. When I saw a client use this code, I told her the indentation is wrong. It ran just fine. And people think only LLMs hallucinate. This is undocumented, but the way to get an Gemini ephemeral auth token for the live API is below. (Update time as required.) ChatGPT Learnings from a discussion on vibe-coding between Kunal Jain, Ravi Nadimpalli and me. #ai-coding On the Vibe Coding Process & Strategy The 80/20 Rule is Real: The first 80% of a project is incredibly fast, but the final 20% (debugging, custom features, production-readiness) is extremely difficult and time-consuming. Validation is the New Bottleneck: Since coding is now much faster, the critical, time-consuming task has shifted to reviewing, testing, and validating the LLM’s output. “Spec-Locking” is Crucial: Providing the LLM with detailed, well-defined, and “thinly sliced” specifications is essential for getting good results. Vague requests lead to poor outcomes. It’s Not Production-Ready (Yet): The consensus is that vibe coding is excellent for prototypes, demos, and go-to-market (GTM) activities but is not yet reliable for building production-grade applications from scratch. Code is Brittle & Unstable: An application that works perfectly one day can inexplicably break the next, as the underlying agent might make undocumented changes. Impact on Roles & The Future of Work The Rise of QC/Validation: The Quality Control (QC) function will become larger and more critical to manage the new challenge of validating AI-generated work. Product Managers Shift Focus: PMs can move away from tedious documentation (like flowcharts) and focus more on high-level business strategy, using vibe coding to create quick prototypes. Democratization of Building: It empowers non-coders to build functional apps and helps professionals upskill faster by “conversing” with an LLM on complex topics. New Forms of Cheating: The technology is creating novel ways for people to cheat in interviews, such as using tools that provide real-time subtitles of answers. The “Jagged Edge” of AI: The technology excels at certain tasks (like GTM content) but fails at others, creating new upstream bottlenecks where teams must rapidly generate more of the “AI-friendly” work. Practical Hacks & Takeaways Meta-Prompting: Use an LLM to refine and improve your prompt before giving it to the final tool. This helps fill in gaps and add necessary detail. Human-First Drafting: For creative or nuanced work (like writing), it’s often better to write the first draft yourself and use the LLM to polish it, rather than starting with a generic AI draft. Use Structured Prompts: For predictable and clean output, providing instructions in a structured format (JSON is OK but not needed) is highly effective. LLM as a Judge: Use LLMs to evaluate and grade content, code, and other outputs, dramatically speeding up the review process. Automate Learning & Documentation: Use tools to transcribe conversations automatically and create personalized revision quizzes from notes and documents. Voice is a Powerful Modality: Using voice-to-code allows for capturing more complex ideas faster and can be done while multitasking (e.g., walking), capitalizing on “dead time.” For live transcription, Gemini 2.5 Flash Live costs 0.6c/min of audio ($3/MTok x 32 tokens/second) while GPT 4o Mini Realtime costs ~2c/min and GPT 4o Realtime costs ~8c/min. ChatGPT I set up MCPs Codex CLI by adding this to ~/.codex/config.toml. I’ve disabled it for faster startup (this takes ~2 seconds) and raised an enhancement issue for MCP lazy loading Anthropic launched a remote MCP connector in their API. OpenAI Responses API already had remote MCP support. Gemini will likely follow, opening up new tool capabilities. The APIs can directly call the MCPs as part of their thinking. Turns out Indian English is a well studied topic. Indianisms like “can able to”, “need not to”, “why because…”, “if suppose…”, “return back”, “revert back”, “angry on”, “discuss about”, “order for”, “do one thing…”, “give me a missed call”, “what is your good name”, “kindly adjust”, “we are like that only”, “he is coming only”, “today itself”, “now only”, “prepone”, “pass out (of college)”, “out of station”, “do the needful”, “hotel”, “batchmate”, “cousin-brother / cousin-sister”, “I have a doubt”, “I am understanding”, “she is knowing”, “you’re coming, no?” etc. are discussed in Pingali Sailaja’s Indian English. ChatGPT Astral is building pyx - a paid PyPi alternative. It aims to solve problems like PyTorch CUDA builds. Knowing them, it’ll be fabulous. I look forward to when they build a Python hosting service. ⭐ Here’s one way to improve LLMs apps in real-time. After sending a response, send the prompt + input + output + optional user feedback to an LLM-as-a-judge asking for feedback to improve the prompt. Revise the prompt based on the improvement. Now the app has improved, real-time, based on human/LLM feedback. Refine this process to ensure that the revisions are smooth and positive. GPT 4.1 (and presumably GPT 5) models have been trained on a specific diff format useful for code diff-patching. PseudoPatch is a Python package that implements their apply_patch() function. Aider supports multiple edit formats that are commonly referenced as a standard. Code Surgery has a good walkthrough of various strategies. These are similar to Google’s diff-match-patch approach (which fuzzy matches and then patches) but does not require line numbers. ChatGPT Here are some query parameters ChatGPT.com unofficially supports: ?q=... prefills in a new chat and often auto-submits, especially small text #. Useful for: A custom search engine in your browser An “Ask ChatGPT about selection” bookmarklet, etc. Links (e.g. from courses, FAQs, etc.) for tasks or learning … but not for custom GPTs ?model=... selects a model (e.g., gpt-5-thinking). ?hints=search enables Search mode ?temporary-chat=true opens a new temporary chat Tavus is another AI avatar platform. Synthesia. Market leader; $2.1B valuation; enterprise trusted. Good: Realism, enterprise features, templating. But: Price, usage caps, slower avatar setup HeyGen. Rapidly growing; $500M valuation. Good: Avatar realism, speed, affordability. But: Basic collaboration, support, scene complexity Colossyan. Favored L&D focus. Good: Interactive & educational tools, good value. But: Less polished avatars, slower renders D-ID. Frequently cited alternative. Good: Speed, flexibility, custom avatars. But: Watermarks, fewer templates Elai.io. Repeats in alternatives lists. Good: Storyboarding, educational formats. But: Limited templates, render time Hour One. Also common in alternative lists. Good: Photoreal avatars, expression control. But: Missing advanced features like screen capture Others. Niche or emerging tools. Good: Varies by platform. But: Less adoption, fewer reviews Training companies are offering “Labs-as-a-service” as part of their AI training. Corporates ban LLMs, but need employees trained. Trainers offer a bundled package where they also offer access to LLMs are part of their course. Interesting business-model value-add. ⭐ I’m meta-AI-coding. I wrote a crude prompt in prompts.md, told Codex “prompts.md has a prompt under the “# Improve schema” section starting line 294. This is a prompt that will be passed to Claude Code to implement. Ask me questions as required and improve the prompt so that the results will be in line with my expectations, one-shot.” After a few discussions, it generated this remarkable prompt. This prompt was easy for me to review AND easy for Claude Code to understand because of the lack of inconsistencies. Use the Ask-Code pattern. In Codex, speak the requirement and have it rewrite the prompt asking clarifying questions pressing the Ask button instead of Code. Then, answer its questions. Then press Code. A Forward Deployed Engineer (FDE) is a hybrid role, part software engineer, part product manager, and part consultant, focused on deeply integrating a company’s technology with a specific client’s needs. Based on what I’ve seen of AI coding, new developers need to learn these skills. #ai-coding Context engineering Documentation Automated testing Standards Capabilities of platforms Modularity (and DRY vs WET) Code composition Code reviews Blindspots continue to be the insight with maximum RoI. Discovering something we’re not even aware we’re unaware of opens up the largest possibilities. #beliefs My top sources to discover blindspots are: Feedback. Especially feedback we reject, ignore, or miss. Things we run/shy away from. Across clients, providers (e.g. Bedrock) and products (e.g. Cursor) I have observed capacity bottlenecks for Claude models which don’t seem to affect OpenAI models as much. Increasing the size of an image improves OCR accuracy for LLM models (or at least Claude 4 Sonnet). Anecdotally, resizing 2x did not work on a number of examples but 2.5x - 3x did. This increases the cost to 6.25x or 9x, however. Discussion at PyConSG Edu Summit 2025. Padlet Discussion validation Interesting ways students use AI Use AI to refactor/debug whole codebases Get AI to create questions for practice ChatGPT Study mode Students like to upload photos. We can teach them to upload these to ChatGPT and ask questions. What teaching practices / assessment design can help students think for themselves before turning to AI? ChatGPT Interactive orals / micro-vivas (short, process-focused). Strong alignment with “interactive oral assessment” research and guidance in the AI era: improves authenticity, reduces outsourcing/contract cheating, and checks understanding. Make them low-stakes but frequent. How: 5–8 min viva tied to a task; students must explain choices, failures, and next steps. Authentic / project-based assessments students can self-validate (observable outputs). Project-based and “authentic” assessment meta-reviews show consistent positive effects (achievement, thinking skills, motivation), especially in STEM and small teams. Design tasks with local data/constraints so generic LLM answers are only a baseline. How: “Default AI answer” gets a pass; “A-grade” requires empirical validation, custom data, or optimisation trade-offs with metrics. Pair programming + peer critique on whiteboards/pseudocode. Evidence (meta-analyses & CS-ed studies) supports pair programming for learning and retention; code tracing/peer instruction deepen understanding before coding. How: Rotate driver/navigator; force commit-message style rationales; 10-minute “whiteboard dry-run” before touching IDE. Process-over-product with structured reflection. Metacognitive/reflective interventions show medium-to-large effects on achievement; they also build habits that resist blind acceptance of AI outputs. Keep reflections short but structured. How: “What I asked AI; what it missed; how I verified; what I’d change next time.” “No-AI under secure conditions” mixed with AI-permitted coursework. Matches national/institutional guidance for GenAI-aware assessment design. Use secure, time-boxed checks for fundamentals; allow AI elsewhere with audit trails. Primary research (interviews/user studies) before design/coding. Fits the “authentic assessment” literature and reduces LLM substitution. Grade on research protocol + synthesis rigor, not word count. Explicit problem-solving frames (initial/current/goal state). Classic problem-solving scaffolds; improves formulation before querying AI. Pair with short “assumption logs.” (General pedagogy supported; CT depends on domain knowledge – see caveat below.) Caveat (important): Critical thinking depends on domain knowledge. Don’t expect generic CT drills to transfer without content mastery. Plan tasks so students must recall/apply specific knowledge before or alongside AI. How can we train students to use AI critically instead of accepting the output blindly? ChatGPT Teach “lateral reading” and SIFT for source checking. Stanford’s Civic Online Reasoning work and Caulfield’s SIFT method offer actionable heuristics for verifying claims, URLs, and citations that LLMs surface. Build these into rubrics. Run “AI auditing” labs (hallucination hunts). Students collect/label model mistakes, missing assumptions, and fabricated citations – an approach aligned with UNESCO’s call for AI literacy and validation. Use online judges with hidden tests + adversarial cases. Autograding literature supports hidden tests for robust generalization; it trains students to verify and not overfit to visible specs – or to AI’s surface patterns. “Sandwich” workflow: spec → implement 1–2 reps → let AI complete → verify rigorously. Mirrors human-in-the-loop patterns in industry; use checklists for unit/property tests and invariants before accepting AI output. Live-coding with an AI assistant on display (to show failure modes). Demonstrates nondeterminism/limitations in real time; supports critical habits. Pair with a post-mortem template. Prompt red-teaming/jailbreak exercises (safe scope). Students learn that guardrails can be bypassed and why verification matters. Keep it ethical and bounded. Build a knowledge base first. Reinforce that CT sits on content knowledge; teach students to explain why an AI answer is plausible or not, citing domain facts. Notes from “My Thoughts on Computational Thinking in the Generative AI Era” by LEONG Hon Wai, ex-NUS, at PyConSG Edu Summit 2025 Students from China don’t like to write, express their ideas, and share. That’s changing now. Computational thinking is pretty new (Jeannette Wing, 2006), actually, based on Papert (1980). It’s too early to abandon it. It enables effective learning attitudes: Tinker (experiment & play): helps finding diverse problems to generalize into Debug (find & fix bugs) Create (design & make) Persevere (keep going): but only if it’s productive, i.e failing in new ways Collaborate & communicate Teaching this is hard. Get students to WANT to do computational thinking. Problem formulation (among the computational thinking blocks) is more important than before. Leveraging Computational Thinking in the Era of Generative AI argues that computational thinking manifests in prompt/context engineering. We’re moving from “Computational Thinking” to “Computational Action” – where we’re talking to AI coders that actually deploy apps that DO stuff. Notes from “Make Learning Easy and Fun @ NLB LearnX” by Goh Soon Seng, NLB, at PyConSG Edu Summit 2025 Libraries have a Pi Python Makers Club, open for all. Bi-monthly meetings. Quarterly Pi Python workshop. Space provides 3D printers, Raspberry Pi, sensors, etc. Notes from “Teaching Goals and Plans - How we might help students improve problem-solving” by Dr Norman Lee, SUTD, at PyConSG Edu Summit 2025 Programming is hard. E.g. Solving the Rainfall problem “Sum numbers until 99999” needs several building blocks: Python syntax Getting user input While loop Controlling while loop with counter Accumulation If-else Merging (or composing) such blocks is the hard part. In Learning to program = learning to construct mechanisms and explanations, Soloway, shares 4 compositions. Abutment: Put one block after another Nesting: Put one block inside another Merging: Interleave the code in the blocks Tailoring: Modify the code in the blocks But you need to already have those primitives (patterns) to put together. The “expert blind spot” blinds experts to this. Actionable ideas: Teach patterns explicitly Create exercises on applying them Use Parsons problems: Fill in the blanks. Re-order lines of code. But design problem carefully Step through a debugger. BUT students must predict next line, not passive watching Teach to from one format (psuedocode, flowchart, another language like Excel) to Python. Helps multiple modes of learning Notes from “AISG programmes” by Chen Qeiquang, AI Singapore, AI Apprentice Programme (AIAP) Assistant Head Full-time. For SG citizens. $4,000/month. Build 3-6 month MVPs for startups, SMEs, or corporates. 300/1000 delivered so far. No lectures/tutorials. Focus is: topic assignments, discussion with mentors, apprentice sharing sessions. Includes an LLM Application Developer Program. Notes from “Scaffolding the Problem-Solving Process for Introductory Computing Students” by Ashish Dandekar, NUS, at PyConSG Edu Summit 2025 Built an intelligent tutoring system Encourage students to create their own pattern banks / cheat sheets. “Find 2 more problems that can be solved in the same way.” Focusing on the problem-solving process shrinks the gap. Students above the 50th percentile of pre-assessment did not improve much. The lowest percentile improved the most. “At NUS, I know that even if I give 0.5% weightage for students attending tutorials, everyone will attend it for those ‘free marks’.” Notes from “Exploring Multi-Agent Generative AI in Education and Career Advisory” by Dr Yeo Wee Kiang, NUS, at PyConSG Edu Summit 2025 ⭐ “When you have a high fever, do you speak more sense or nonsense? Nonsense. LLM temperature is like that. But it can also sound creative!” The router pattern is a powerful query rewriter. Redirects the query to specialized prompts/agents. Useful tools you can build for students: Course Mentor, Interview Coach, Job planner/matcher. Notes from “Do we need to teach coding given vibe-coding tools?” by Dr. Oka Kurniawan, SUTD, at PyConSG Edu Summit 2025 Paper: What the Science of Learning Teaches Us About Arithmetic Fluency says mental math helps mathematicians. Fluency bootstraps higher-level thinking. MIT Media Lab’s Project: Your Brain on ChatGPT. Explores impact on brain. Bran-only group had the widest ranging brain networks. AI accumulates cognitive debt. Paper: “A Study of the Difficulties of Novice Programmers” struggle with: Syntax Problem solving Tools Computing concepts Analytical thinking / debugging Polya’s How to Solve It is the base problem solving framework for maths and can be adapted to computing Expert programmers have enough patterns to match against. Novices don’t. We need a bottoms-up framework instead Give them a concrete case. Have them generalize (loops, functional, vectors) Have them implement (debugging) Have them break it (test) All via vibe-coding! The chats are tracked!! Paper: First Things First: Providing Metacognitive Scaffolding for Interpreting Problem Prompts Students often get the problem wrong Reading student conversations helps figure it out LLMs can figure it out too! Paper: The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers Good coders got better with AI. Were able to ignore unhelpful advice. Poor coders got worse! Thought they performed better than they did. Increased illusion of competence. The Bebras Challenge is a global non-programming computational thinking (CT) challenge. Examples. Singapore runs a National Junior Informatics Olympiad that learns from Bebras. It tests the mindset behind coding, specifically “computational thinking”: Problem formulation (added recently, and is increasingly important) Decomposition (and composition): break the problem down Pattern recognition: find the building blocks Abstraction: generalize useful blocks, drop irrelevant ones Algorithmic thinking: write the steps to solve Validation (not part of original list, but critical): how to efficiently check if this works Apple’s Embedding Atlas (Demo - slow, needs WebGPU) is an embeddings visualizer, like Tensorflow Projector or Mantis (Demo). John Kotter’s organizational change model is the accepted practice for top-down change, while ADKAR is for bottom up. It’s surprising how obviously effective both are to someone who has effected both kinds of changes, but there is NO WAY I would have appreciated either during my MBA. Wikipedia: Change management The OpenAI Chat Completions API has a few interesting and (relatively) new options: verbosity. low: concise response, medium: default, high: verbose reasoning_effort: minimal: almost none. medium: default. Or low, high. truncation: auto: truncate response by dropping input items in the middle. disabled: default prediction: speeds up output for minor corrections to text prompt_cache_key: tailors per-user caches CSS nesting can be used with media queries too! Julia Evans id3v2, mid3v2 and eyeD3 seem the cleanest way of editing MP3 tags on the CLI. mid3v2 was already installed on my system. Learnings people shared in Ask HN: What trick of the trade took you too long to learn? Finance & housing Time is a non-renewable asset. Lifestyle design matters as much as net worth. Future-proof against regret. The present matters, too. Home ownership ties up location choice, capital and has hidden costs. Market timing & geographic arbitrage has an outsized effect. Software Align abstraction to domain. Avoid premature abstraction (Don’t Repeat Yourself vs Write Everything Twice) and over-abstraction. Temporary fixes tend to stick. Stop-gap regexes last for years. Consistency is a quality multiplier. Small inconsistencies cause disproportionate harm. git bisect is a regression-finding superpower. It’s OK to write tests covering key parts of legacy codebases - 100% coverage isn’t critical. Document architectural decisions: why this approach. See Diátaxis. Flow metrics predict delivery better than (arbitrary) estimates. Building features without linking to delivery spesd wastes resources. Life habits & learning You have the right to say “no”. Small, consistent actions beat dramatic changes. Persistence beats skill. You’re allowed to change your mind. Over-cleverness backfires. Witty code & communication lead to confusion. Context is king. Without background, everything is mis-interpretable. Fun leads to excellence. Excellence leads to fun. The meta-lesson here is how I discovered these: Run topicmodel to identify topics Feed the output CSV to ChatGPT and ask it to share lessons topic-by-by-topic # Topic modeling can be extended in many ways. # Structural Topic Models factor in metadata, like year (numeric) or category or author (categorical). Relational Topic Models factor in undirected graph relationships, e.g. parent documents Graph-Regularized Topic Models factors in arbitrary graph relationships, e.g. weighted, directed Neural (GNN + Topic Model) approaches work better for large graphs, long-range dependencies, etc. Some ways to inject graph structure into topic similarities to, for example, cluster threaded discussions. # Start with a graph similarity matrix S, like # a regularized graph Laplacian (based on degree - adjacency matrix) a similarity matrix like graph2vec from Graph Kernel a node-embedding karateclub. Option 1: “Smoothen” the embedding matrix multiplying it with S (i.e. spread each document towards neighbors), then calculate similarities Option 2: Take the weighted average of S and the embedding similarity matrix You can extract Hacker News comments as a threaded discussion pasting this into the DevTools console:

Alibaba released an open-source coding model (qwen-coder) and tool (qwen-code). qwen-code + qwen-coder cost 8 cents and made 3 mistakes. https://lnkd.in/gguSGdv6 qwen-code + claude-sonnet-4 cost 104 cents and made no mistakes. https://lnkd.in/gEPnVS-F claude-code cost 29 cents and made no mistakes. https://lnkd.in/gyCVeAr4 There’s no reason to shift yet, but it’s a good step in the development of open code models & tools. LinkedIn

Indian Celebrities and Directors was my top searched category on Google while OpenAI & AI Research was the top growing category. This is based on my 37,600 searches on Google since Jan 2021. Full analysis: https://sanand0.github.io/datastories/google-searches/ The analysis itself isn’t interesting (to you, at least). Rather, it’s the two tools that enabled it. First, topic modeling. If you have all your searches exported (via Google Takeout) into a text file, you can run: ...

Meta AI Coding: Using AI to Prompt AI

I’m “meta AI coding” – using an AI code editor to create the prompt for an AI code editor. Why? Time. The task is complex. If the LLM (or I) mess up, I don’t want re-work. Review time is a bottleneck. Cost. Codex is free on my $20 OpenAI plan. Claude Code is ~$1 per chat, so I want value. Learning. I want to see what a good prompt looks like. So, I wrote a rough prompt in prompts.md, told Codex: ...

Things I Learned - 10 Aug 2025

This week, I learned: OpenAI supports a tool "type": "custom" that lets it write code as an argument to a tool call. Great for code / SQL generation. Even more powerfully, you can generate output following specific grammars, e.g. STL files, PostgreSQL dialect, Mermaid/PlantUML diagrams, OpenAPI specs, Vega-Lite JSONs, Cron expressions, GraphQL SDLs, Dockerfiles, Terraform HCLs, or any DSL! # #ai-coding The OpenAI playground has a GPT-5 Prompt Optimizer that can migrate prompts to GPT-5. Docsify 4.13.1 is 2 years old and uses [email protected] which is 5 years old. Newer plugins like marked-directive don’t work with it. Though docsify v5.0.0-rc1 is in development, it may be the better option for modern Markdown plugins. Here’s sample code. CommonMark has a powerful directive syntax proposal that lets you add classes, attributes, and arbitrary plugins to Markdown. For example, :abbr[MD]{#id .class title="Markdown"} for inline directives. Plugins exist for marked, markdown-it and remark. biomejs and dprint are gaining traction as prettier alternatives. I’m yet to try them but keen to explore. Skip biomejs for now. It uses tabs (not spaces) and does not respect .gitignore by default. Handling these is too much work. ⭐ Code generation is more flexible than tool calling. LLMs can’t write a tool-call loop, for example, but they can write code to run an API in a loop. So, I like telling the LLM to “write code using these APIs” than giving it APIs to tool-call. #ai-coding npx -y ccusage is an easy way of summarizing your Claude Code usage and cost. My cost so far (since 21 July) is about $10. The median session cost is ~50 cents. Most of it ($7) was from a single temporary coding chat that I kept continuing for way too long, building up the context window. # defuddle can be used in the browser to get the main content from web pages. A replacement for Mozilla Readability. # Modern Node.js Patterns for 2025 include these 5 features I’m excited by: Single-executable bundling. node --experimental-sea-config sea-config.json builds standalone binaries. ES Modules. Use node: prefix for built-in imports. import { createServer } from 'node:http'; Watch mode. Use node --watch file.js auto-reloads when file.js or dependencies change. Env file. Use node --env-file=.env loads .env as environment variables. node:test is a full-featured test framework with --watch and coverage. Concise explanations speed up decisions because they’re faster to read and understand (obvious). They’re also easier to combine with other ideas (less obvious). # I’ve been uncertain about htmx for some time now. This tutorial, HTMX is hard, so let’s get it right, convinced me that it’s too far from my mental model, so I’m unlikely to ever use it. ⭐ Slow, effortful practice (spaced recall, interleaving topics, self-testing) builds lasting knowledge but looks inefficient and doesn’t help with exams. # #beliefs GitDoc VS Code extension auto-commits and syncs notes. I dropped gitwatch in favor of this. It’s interesting that Gemini Deep Research cannot access Google Drive while Gemini can. On the other hand, ChatGPT Deep Research can access Google Drive but ChatGPT cannot. A trend that AI coding will only accelerate: “It is now possible for tiny teams to make principled software that millions of people use, unburdened by investors. … you need far less money and far fewer employees to reach far more customers. That wave is only just beginning.” # #ai-coding Typed languages are better suited for vibe coding. This will likely lead to the growth of typed languages (TypeScript, Rust, Go) but also of typing in untyped languages (e.g. Python) # #ai-coding Instead of Celery, Redis, Kafka, etc. as task queues, we could the file system as a message queue. For example, pending/task-01.json moves to wip/task-01.json to done/task-01.json. Folders for state/tags, files for task details. Foam is a note-taking VS Code extension. The WikiLinks, tags and backlinking features align naturally with Markdown note-taking. Via Steph Ango who uses Obsidian which nudged me to search for WikiLink-ing features in VS Code. I’m an open data hawk. But here are things I should remind myself of. # Privacy incubates creativity. People self-censor when watched. Privacy shields fragile ideas. Power assymetry. Big players can leverage openness more, e.g. Cambridge Analytics + Facebook data. Context matters. What’s harmless in one setting can be toxic in another. One-way door. Data can’t be unshared. Don’t scrap brakes dreaming of perfect roads. Anticipate tyrannical regimes / cultures. Not your call. You don’t share your neighbour’s medical records. One Punch Man is available as manga. I watched the anime first and assumed that came first. Apparently not. ⭐ In “kind” environments (stable rules, rapid and accurate feedback), specialize. In “wicked” environments (rules shift, feedback is noisy/late), generalize. ChatGPT Models’ ability to orchestrate longer workflows will improve. Factor that into your application design. Claude Code can already handle over 70 tasks in a workflow What happens when LLMs play Chinese Whispers / the Telephone Game? Here are learnings. ChatGPT Drift increases faster than linear with hops. Bigger models do better, but constrained prompts (“Copy the text exactly; change nothing.”) have a bigger impact. Low temperature improves copying fidelity. But even after “forgetting”, LLMs reproduce rare content if they’re trained on it. “In fact, React Native looks set to become the most engine-agnostic JavaScript runtime around”. The Many, Many, Many, JavaScript Runtimes of the Last Decade OMDb (simple) and TMDb (comprehensive) are API-friendly alternatives to the IMDb. copyparty seems one of the most feature-rich file servers out there. Single Python file, runs on any OS, works with any client, and optimized for speed. Video Quotes I enjoyed from Linus Torvalds’ TED interview I want to not have external stimulation. You can kind of see, on the walls are this light green. I’m told that at mental institutions they use that on the walls. It’s like a calming color. … the main thing I worry about in my computer is – it really has to be completely silent. If the cat comes up, it sits in my lap. And I want to hear the cat purring. I did not start Linux as a collaborative project. I started it as one in a series of many projects I had done at the time for myself, partly because I needed the end result, but even more because I just enjoyed programming. I’m actually not a people person. But I do love other people who comment and get involved in my project. The big point for me was not being alone and having 10, maybe 100 people being involved. Going from 100 people to a million people is not a big deal – to me. Well, I mean, maybe it is if you want to sell your result then it’s a huge deal. But if you’re interested in the technology and you’re interested in the project, the big part was getting the community. So Git is my second big project, which was only created for me to maintain my first big project. And this is literally how I work. Well, I do code for fun – but I want to code for something meaningful so every single project I’ve ever done has been something I needed. Apparently, my sister said that my biggest exceptional quality was that I would not let go. I can’t do UI to save my life. Good taste is about really seeing the big patterns and kind of instinctively knowing what’s the right way to do things. Companies like Google and many others have made, arguably, like, billions of dollars out of your software. Does that piss you off? No. No, it doesn’t piss me off for several reasons. And one of them is, I’m doing fine. But the other reason is – I mean, without doing the whole open source and really letting go thing, Linux would never have been what it is. I think one reason open source works so well in code (is that …) Code either works or it doesn’t. The Uses This site has interviewed professionals for decades. From their repo I scraped the top developer apps post 2020: CloudFlare has an Iceberg data catalog in R2 Data Catalog. Iceberg is like Parquet but supports metadata, time-travel, and schema edits. But I’m yet to find a single publicly accessible Iceberg catalog. Its open-data adoption is not as high as Parquet’s. Apache Iceberg vs Parquet Observable Notebook 2 is the new notebook format from Mike Bostock. It is vanilla JS and embeddable into other pages. THis would have been a big deal 2 years ago, but with the LLM ecosystem today, I’m not sure if it matters as much. To add CORS support to CloudFlare pages protected by Zero Trust, add a _headers file to your repo. (This is different from the Zero Trust CORS which allows automated logins.) Sample _headers that lets logged-in users fetch pages via fetch("...", { credentials: "include" }): /* Access-Control-Allow-Credentials: true Access-Control-Allow-Origin: https://your-site.example.com Access-Control-Allow-Methods: GET, HEAD Access-Control-Allow-Methods: * As corporates restrict the use of LLMs, I see employees purchasing personal laptops to use LLMs on. An interesting trend! openai-python has a CLI. You can run uvx openai api chat.completions.create --stream -m gpt-4.1-nano -g developer 'Translate to Chinese' -g user "Hello" for example Anthropic has an OpenAI compatible API at https://api.anthropic.com/v1/. Claude Code tips from Things that didn’t work by Armin Rocher #ai-coding Speech-to-text. Cannot stress this enough but talking to the machine means you’re more likely to share more about what you want it to do. I maintain some basic prompts and context for copy-pasting at the end or the beginning of what I entered. I ended up preloading executables on the PATH that override the default ones, steering Claude toward the right tools, e.g. running python asks it to use uv. I use the task tool frequently for basic parallelization and context isolation. Simply taking time to talk to the machine and give clear instructions outperforms elaborate pre-written prompts. Forcing myself to evaluate the automation has another benefit: I’m less likely to just blindly assume it helps me. Research indicates that we don’t know in advance which prompts will help. Evals beat prompt engineering. Ethan Mollick

My ChatGPT engagement is now far higher than with Google. I started using ChatGPT in June 2023. From Sep 2023 - Feb 2024, my Google usage was 5x ChatGPT. Then, fell to 3x until May 2024. Then about 2x until Apr 2025. Since May 2025, it sits at the 1.5x mark. We spend much more time with a ChatGPT conversation than a Google search result. So clearly, ChatGPT is my top app, beating Google some months ago. ...

Giving Back Money

At the end of my 2021 graduation interview, All India Radio asked: Interviewer: What would, if you are asked to give back something to the country, what would be that? Anand: I really don’t know. At this stage, I don’t know what I’m capable of and what I can contribute, but whatever it will be, I suspect the bulk of it will come later towards my career. ...