Is all AI content slop?

Is all AI content slop? I asked Claude to: Analyze this thread. Then explain it like a Malcolm Gladwell New Yorker article. https://news.ycombinator.com/item?id=45820872 It gave me a beautiful, engaging and insightful essay about a 300+ message debate about AI vs humans on routine tasks. https://claude.ai/share/60c5810f-5c81-4970-8026-a24bf89c3392 Is this slop? One phrase stood out: There’s an irony here that the commenter doesn’t quite state but implies beautifully: we’ve spent so long celebrating automation because humans are imperfect that we’ve forgotten we also value humans because they’re imperfect. ...

OpenAI TTS cost

The OpenAI text-to-speech cost documentation is confusing. As of 2 Nov 2025: GPT-4o mini TTS costs $0.60 / MTok input and $12.00 / MTok audio output according to the model page and the pricing page. They also estimate this to be ~1.5c per minute - both for input and output. It supports up to 2,000 tokens input. TTS-1 costs $15 / MTok speech generated according to the model page but the pricing page says it's $15 / MChars. No estimate per minute is provided. Is supports up to 4,096 characters input. TTS-1 HD is twice as expensive as TTS-1 I wanted to find the approximate total cost for a typical text input measured per character and token. ...

Things I Learned - 02 Nov 2025

This week, I learned: TVMaze API is an API for TV shows, episodes, cast, crew, etc. Useful for TV-related apps as well as learning APIs. Awesome Skills is a curated list of prompts and skills for AI coding agents. ⭐ nokode is a API server that has no code: just LLMs responding. Interestingly, it is compliant. Just expensive, slow, forgetful and unreliable compared to code. All four are improving with time, indicating that coding may be transitional. Notes from Vanya Seth’s keynote at OSAI HYD Superpowers of Gen AI to keep in mind when exploring AI coding agent use cases: Translating. Requirements to code, code to code, language to queries, standard to standard. Finding info just-in-time (in context). How does this work? What’s this error? What tools are permitted in my org? Who knows what? E.g. Atlassian Rovo queries across JIRA, Confluence, etc. Brainstorming and ideation. Product ideation. Requirements. Testing gaps. Architecture review. Exploratory / scenario testing. Summarizing and clustering. Change logs, incident management, research data, docs summary. Challenges in using AI coding agents: Adoption imbalance. Only certain roles are amplified by AI. Coding, QA, more than planning, maintenance, AI ops, etc. What’s the impact of this? ⭐ Goldratt’s ToC implies that backlogs need to fill faster. Downstream becomes a bottleneck. Technical debt piles up. ACTION: Use AI across entire value chain, from research to maintenance. Locality. enhances roles (nodes), not relationships (links). They optimize local work, not global flow. Workflow tools are missing. Coordination overhead. Context Fragmentation. Translation problems. ⭐ Expand productive roles to cover neighboring tasks. Productive developers shift left and build backlogs; shift right to reduce code review, maintenance tasks. E.g. Move maintenance/production activities into development. Security, performance, monitoring, observability, cost, infrastructure. We spend time on IDE, CI/CD, Jira, Confluence, Prod observability tools. A typical Agent Development Platform (ADP) covers evals, guardrails, workflow builder, agent builder, observability, prompt management, AI gateway (LiteLLM), MCP servers, model fine-tuning, model serving, model repository, vector stores We need ADP Agents covering delivery risk, continuous security, prod issues RCA, observability, performance, accessibility, product research, infra optiimzation, test data generation, anomaly detection, release management ACTION: Share ADP photo with Patrick. ACTION: ⭐ Centralize skills (“knowledge packs”) and MCPs and observe which gets used most. Allow people to use more. Lethal Trifecta. There’s growing demand for higher productivity with AI code assistants. But the lethal trifecta makes them an attack vector. It has access to sensitive information, exfiltrate data, and read and follow unsafe instructions. Can lead to supply chain poisoning attacks. Regulated industries cannot adopt. Technical debt growth. More productivity leads to poor code quality which will slow down future work. See Software Engineering Excellence 2025 AI induced complacency. Sunk-cost fallacy on AI-generated code hurts. ACTION: Evaluate code quality continuously to reduce technical debt. Double-down on good engineering practices. Compliance. Model residency. Self-hosting is required. Data observability gaps. Data privacy, audit trails, etc. are concerns. Token economics. $20/day happens in Thoughtworks. Token cost is subsidized. Rogue AI usage. Use of dis-allowed tools; shadow IT. ROI justification. Hard to quantify productivity gains. Adoption. AI Literacy. Tap into organizational knowledge Champions & communities of practice to support cross-pollination. Use-case driven adoption. Teams identify based on AI superpowers. AI playbook. Share what worked, what didn’t work. AI automation is likely less if a high portion of work Has legal liability (e.g. pharmacist/judge vs shop attendant/lawyer) Is subjective (e.g. perfumer/auction appraiser vs lab chemist/insurance appraiser) Needs rapid contextual decisions (e.g. detective/fireman/ER vs parking enforcer) Via ChatGPT, Claude parse-sse from Sindre Sorhus is a more standards-compliant, more likely-to-be-maintained alternative to my async-sse package. Which is better: Comment A: 1 upvote, 0 downvotes (100% positive) or Comment B: 99 upvotes, 1 downvote (99% positive)? Use Wilson’s Lower Bound which measures “What % positive am I 95% confident of?” Claude Using this, we can measure metrics for tweets, like below. ChatGPT Popularity = (5 _ WLB(reposts / views) + 2 _ WLB(likes / views)) * Decay(half-life of 72 h) Memorability = (5 _ WLB(bookmarks / views) + 4 _ WLB(replies / views)) * Decay(half-life of 36 hours) A nice visual “benchmark” of text-to-image and image editing models. Seadream 4, Gemini 2.5 Flash, and Qwen Image Edit lead. This includes examples like straightening te Tower of Pisa - which only Flux.1 and Seadream 4 do well on; or removing only the brown M&Ms - which only Qwen Image Edit manages to. Arch is a pure LLM router. It supports multiple LLMs, flexible routing and observability but not auth. From Codex docs Add custom prompts in ~/.codex/prompts/xyz.md and launch as /prompts:xyz. Optional: description: and argument-hint: in YAML front-matter. For example, create prompts to refactor, rewrite in a developer’s style, document AGENTS.md, identify re-usable code, etc. AGENTS.override.md overrides parent directory AGENTS.md. AGENTS.md appends to parent AGENTS.md. Fallback names are allowed. codex exec supports streaming JSON codex exec accepts a CODEX_API_KEY= environment variable. codex uses an OPENAI_API_KEY. You can configure which environment variables are passed to the shell Codex reads 32KB from AGENTS.md by default Things that I currently follow and don’t follow from Peter Steinberger’s excellent Just Talk To It: Prefer Codex > Claude Code. Ask for options before executing Generate & review specs collaboratively You don’t need git worktrees Prefer subscriptions over API to reduce cost Store docs with code Give examples Use voice input Use Codex Web as a mobile inbox for ideas Prefer CLI over agentic platforms Prefer CLI tools over MCP Avoid ALL-CAPS for Codex. It follows instructions well Avoid sub-agents, RAG, etc. Iterate UI live. Watch changes Use 3-8 agents in parallel on a single repo. Make small, atomic commit checkpoints. Commit only what the agent touches Add ast-grep as a pre-commit hook to block rule violations. Keep custom prompts minimal (commit, automerge, massageprs, review, …). Just “commit” reduces context Cancel long tasks and ask what’s happening Prefer Medium over High reasoning. It decides level of thinking Share screenshots Use tmux to run CLIs persistently Schedule refactor time (20%). Use jscpd, knip, oxlint, … Don’t reset context. Cold start wastes time + tokens Write tests in the same context. Yields better tests, reveals bugs. Prototype in a separate folder / PR Queue continue messages** before stepping away Ask it to “Preserve intent and add comments at tricky spots”. Future you needs the WHY On hard problems, add “take your time”, “be comprehensive”, “read all related code”, “form hypotheses”, etc. Maintain an evolving AGENTS.md with product notes, naming, API patterns, test policy, ast-grep rules, etc. Delete stale guidelines Fascinating implications from Quantifying Human-AI Synergy ChatGPT Models vary in ability to uplift humans. Don’t just use standalone model benchmarks. People vary in ability to work with AI. Don’t just measure solo skills. Reward AI collaboration ability (delegation, prompting, verification, revision, …) Train models to ask for missing Theory-of-Mind cues: goal, beliefs, constraints, audience, success test Train people by asking them to predict what the model will get right/wrong, and validate Design UI and models for synergy. UI: Surface/solicit assumptions, intent, uncertainty, constraints. Model: Infer & adapt to evolving user state. OpenRouter image generation now includes GPT-5 Image Mini. An image costs about 1 cent. Here’s the code: curl 'https://openrouter.ai/api/v1/chat/completions' \ -H "Authorization: Bearer $OPENROUTER_API_KEY" \ -H "Content-Type: application/json" \ -d '{ model: "openai/gpt-5-image-mini", messages: [{ role: "user", content: "Draw a cat" }], modalities: ["image"], image_config: { "aspect_ratio": "16:9" } }' | jq -r '.choices[0].message.images[0].image_url.url' | cut -c23- | base64 -d > cat.png

Sometimes, technology creates truly memorable moments. Like when email connected me with my schoolmates in 1993. Or WhatsApp connected me with long-lost relatives in 2010. Today, Google Gemini took me back 55 years, converting the grainy black-and-white wedding photos of my parents into vivid high-resolution color images. So many people. Much younger. More alive. I look forward to when I can watch the video. Move around. Talk to them… Prompt: Convert this black and white photo to color. CAREFULLY ensure that the photo, especially faces, are EXACTLY the same. Use vivid colors and sharp photography, like in modern digital photos. Model: gemini-2.5-flash-image (nano-banana) Temperature: 0 ...

When to choose AI over humans

I charted the OpenAI GDPVal paper with industry compensation as the size and AI augmentation as color. Big green areas are we’re paying people where AI does better. Click here to see the interactive visualization. Clicking to see some actual tasks compared. I use this to check whom to ask advice: AI or professional. AI beats Personal Financial Advisors ~64% of the time. So I invested half my money using ChatGPT’s recommendation. (UTI Nifty 50, if you’re curious.) ...

Things I Learned - 26 Oct 2025

This week, I learned: Before founding a place to do good, work in a place that does good and learn. Ben Werdmuller What should we teach when vibe coding becomes good enough for non-coders? Ethan Mollick Problem decomposition Clear communication & spec writing Core technical foundations: file systems, access control, networking, APIs, version control, data structures, databases, deployment Software development skills: Debugging, Testing, Refactoring, Design patterns, UI/UX Project management: requirements, prioritization, scoping, … Codex CLI tips: codex --add-dir $DIR lets you write into $DIR codex --full-auto is the equivalent of codex --sandbox workspace-write --ask-for-approval on-request Terse code is not necessarily easier or harder for LLMs to write. It’s about how unusual (or not aligned with training data) the code is. Gabi Teoduru How are people using browser agents like Comet / Atlas? Simon Willison Most popular: YouTube video summaries with timestamps Most useful: Form filling: Government forms, data entry, repetitive bureaucratic tasks Foreign language navigation: Applying for pension in Korea, navigating sites in other languages Time reporting auto-completion Insurance claims: Reading policy documents and drafting appeals (successfully got claim reimbursed in India) Compliance training click throughs Next most useful: Shopping / planning Energy provider comparison - Comet checked current plan vs competitors on Check24, calculated exact annual savings per provider Financial tracking: Finding Amazon orders, tracking Airbnb spending with refund calculations, analyzing bank transactions Trip planning: Mapping 50-100 places on Google Maps automatically Interesting: Airport shuttle discovery - Found shuttle that user missed in manual searching HubFS mounts GitHub repos on the file system. Every file system action directly works on GitHub via a REST API. Useful for some scenarios but less useful for note-taking than something like GitDoc which offers a delayed sync. Ernest Ryu solved an open problem in convex optimization using ChatGPT. Quotes: ChatGPT is now at the level of solving some math research questions, but you do need an expert guiding it. ChatGPT was really effective at accelerating my progress. This work took about 12 hours, spread over 3 days. In hindsight, the proof is really simple. But I iterated through so many other strategies that didn’t pan out, and ChatGPT crucially helped to quickly explore and eliminate those dead-end approaches. Also, the key successful steps were suggested by ChatGPT. ChatGPT did not produce the proof in a single prompt. The process was highly interactive. It generated many arguments, roughly 80% of which were incorrect. Yet some were genuinely novel to me. Whenever I recognized a novel idea, whether correct or only partially so, I distilled the key insight and prompted ChatGPT to develop it further. My contribution: Filtering out incorrect arguments and accumulating a set of correct facts. Identifying promising new lines of reasoning and guiding ChatGPT to explore them further Recognizing when a strategy had been fully explored and deciding when to move on. ChatGPT’s contribution: Producing the final proof argument. Significantly accelerating my (or our) exploration of the many dead-end arguments, rapidly ruling out approaches that did not work. Comparing the GPT 4.1 and 5 models at all different of reasoning, I’ve switched my default from GPT 4.1 mini to GPT 5 mini (medium). Far smarter for a slightly higher cost. Artificial Analysis python -m pdb -c continue script.py or uv run -m pdb -c continue script.py runs a script and drops into pdb on unhandled exceptions (post-mortem). ChatGPT Technology removes constraints. We then do what we really value. Claude When writing became digitized, we stopped cared about spelling/handwriting for its own sake. Spelling bees and handwriting classes declined. “ur” is acceptable. When fitness tracking became easy, many just track, few exercise more. Few people value exercise When GPS became ubiquitous, we stopped learning geography. Most value arriving, not knowing When photography became unlimited, most captured moments. Few perfected shots I had Codex scrape my ~2,000 pending invites on LinkedIn and asked ChatGPT to analyze it. Here are learnings: ChatGPT, private Power-law. 5% of inviters account for ~42% of all common connections. Top 10 people alone for ~20%. IITM student invites are high (~14%), but with 0-2 common connects, i.e. distant strangers. EdTech is tiny in count but has the highest common connections per person (outlier-sensitive but real). Among ≥20-commons, many hold VP/Head/Site-Lead titles in Data/AI or GenAI (not just recruiters). GenAI people are 7-8% and steady across months. Not a useful signal to prioritize. Premium ~ Senior. Premium accounts show ~40% senior titles vs ~29% for non-premium. Finance invites have higher seniority rate and more common connects than healthcare. Followers have higher common connections (~6 vs ~4). ⭐ Memory can be code. Agent memory is anything it choose to persist. Agents can write code on the fly to automate tasks, save them, and serve the code on the next request, potentially modifying the code as required. This is like the conscious mind saving a habit for the subconscious to execute fast. Finally: Microsoft Office has an agent mode that lets you talk to it and do stuff. The Verge

I asked multiple coding agents and models to build the same app: Create a single-page web app at index.html that beautifully renders a GitHub user profile and activity comprehensively. Pick the ID in the URL ?id=…, default to ?id=torvalds. … and compared their quality, cost, and speed. My observations: Quality variance is the highest. Some models / agents produce great visuals, some average, some fail completely. Cost and time variance are lower among the successful models. About 2X variance in each. ...

Things I Learned - 19 Oct 2025

This week, I learned: ⭐ “… most engineers don’t have public commits. Senior engineers at large tech companies don’t work on open-source projects for the most part.” Why AI Can’t Do Hiring Cloudflare’s Sandbox feature in their Workers looks impressive. It supports streaming, web access to the container, and long-running processes. So we can spawn off a task and have it run a server (at least for a while) or a scraper. Gemini API has a Google Maps tool that it can refer to - like Google Search. Maps Grounding Earlier we needed humans to label data for RLHF. Now we don’t since AI can simulate it. This is a pattern. Once AI learns from a human, that human skill can be automated. How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek The <output> element has a for= attribute indicating which <input> elements it is linked to and a form= attribute indicating which form it belongs to. This works well with screen readers. A good reason to use it more. Examples. Meta built a Code World Model. Basically an LLM that acts like a Python interpreter! sudo apt install moreutils installs a set of useful packages: chronic. Runs a command quietly (suppressing output) unless it fails — good for cron jobs where you only want noise on errors. chronic backup.sh combine. Combines lines from two input streams/files using boolean operations (AND, OR, XOR). combine AND fileA fileB errno. Look up symbolic names, numeric codes, and descriptions for standard errno values. errno -l; errno ENOENT; errno 2 ifdata. Query network interface properties (IP, byte counts, errors) in a script-friendly format. ifdata -sip eth0; ifdata -bops eth0 ifne. Run a command only if stdin is not empty, passing the input through. find . -name core | ifne mail -s "Core files found" admin isutf8. Check whether a file or stdin is valid UTF-8. isutf8 somefile.txt lckdo. Run a command while holding an exclusive lock to prevent concurrent runs. lckdo /var/run/mylockfile.cmd myscript.sh mispipe. Pipe two commands, but return the exit status of the first one (useful in pipelines). cmd1 mispipe cmd2 parallel. Run multiple commands in parallel, reading them from stdin or arguments. parallel < jobs.txt pee. Like tee, but sends stdin to multiple commands in parallel. echo "foo" | pee cmd1 cmd2 ⭐ sponge. Soak up all input before writing to output — enables in-place edits safely. sort file | sponge file ⭐ ts. Prefix each input line with a timestamp. tail -f logfile | ts vidir. Edit a directory listing in your editor to rename, move, or delete files in bulk. vidir ~/myfolder vipe. Insert a text editor into a pipeline to manually edit streamed input before output. cat file | vipe | wc -l zrun. Transparently decompress compressed files before passing them to a command. zrun cat file.gz Despite 20 years of SVG experience, I learnt new things from A Friendly Introduction to SVG and A Friendly Introduction to Paths Setting a <rect> width/height or a <circle> radius to zero removes the element instead of drawing a point. There’s no option to draw the stroke on the inside or outside of a shape/path. Only the center. You can override a path’s pathLength attribute to create a new internal scale for its length. It’s unclear where I can use this. <path> arcs have this syntax: A [rx],[ry] [rotation] [large-arc-flag] [sweep-flag] [end-x],[end-y]. SVG first fits an ellipse to these parameters and then draws the arc. If rx and ry of an arc is too small to connect the points, the SVG spec scales up rx and ry. [large-arc-flag]=1 literally uses the larger arc of the fitting ellipse. This is less common. [sweep-flag]=1 its the ellipse to make the connecting arc go clockwise. 0 is anti-clockwise. [rotation] is rarely used because we usually draw arcs and then rotate them. stroke-linejoin automatically flips from miter (sharp) to bevel (cut) if the sharp edge protrudes too long (e.g. small angles). Increasing stroke-miterlimit increases the cutoff (default: 4) ⭐ Always include a thoughtful gallery of examples with tools / libraries. This does more than showing what a tool can do. It’s use-case / domain transfer: showing what it’s useful for in real life - opening ideas, suggesting workflows. It’s style transfer: showing how to use it. ⭐ Here’s what expert AI coders increasingly focus on. Thomas Dohmke Delegation: context engineering agents for success; parallelizing. Verification: efficiently reviewing and testing code/output; setting stop-points. Expanding scope: instead of time saved as the metric. Education: teaching AI-based coding, debugging, reviewing/testing. Product management: combining requirements + UI design + architecture + engineering + deployment. Cross-discipline: blending code with design, governance, finance, marketing, … (“computational creators”). Notes from Taylor’s How I’m using coding agents: October 2025 Left monitor: 2-4 desktops (e.g. work, side-project). Right monitor: things I always want available Plan next task while first executes. Use plan mode to write to a plan file. Don’t start big tasks if you have meetings scheduled soon. Recent open source package hack methods seem to work more because of people/process than systems (Filippo): Phishing the author Pull requests running unsafe code in CI Taking over expired domain / user ID Stealing long-lived tokens uv run --python 3.14 --isolated --with-editable '.[test]' pytest runs pytest on a local project with a specific Python version. Simon Willison Notes from the State of AI Report 2025: Reasoning models are more fragile. Irrelevant phrases make reasoning models spend FAR more tokens and get wrong answers #21 AI systems are able to teach experts new concepts #41 An environment providing feedback / rewards enables continuous learning #52 E.g. Multi-robot chemical labs at U.Liverpool and NCSU #60 RLHF has a fundamental flaw: humans reward sycophancy #71 We can read what people are typing from brain signals outside the skull #73 Model intelligence-to-price ratio doubles every ~6 months #94 The AI companies’ valuations are also roughly doubling every ~6 months #181 OpenAI is offering Governments giga-watt campuses to run OpenAI models for citizens #122 A 1GW clusters costs $50bn capex and $11bn per annum #130 China has added ~10X the energy capacity as the US in 2024 #146 NVIDIA challengers are still far away #161 LLMs can “read between the lines” even if training data is censored #268 LLMs can pass information via hidden signals #270 Prediction: A major retailer reports >5% of online sales from agentic checkout. AI agent advertising spend hits $5B. #304 OpenAI’s leadership guide says: Align Explain WHY AI thoughtfully. Set a goal, e.g. everyone uses ChatGPT 20 times/day (Moderna). Use it yourself. Show how. Have business leaders run AI sessions Activate Launch an AI skills proram Set up an AI champions network Encourage experimentation (dedicated time, workshops, hackathons, …) Link to performance evaluations Amplify Create an AI knowledge base Share success stories (weekly) Create internal groups (Teams, Slack, …) Celebrate AI wins Accelerate Unblock AI tools and data access Simplify project selection. Quick feedback, clear priorities Unblock projects with a cross-functional council Give resources to successful teams Govern Publish a responsible AI playbook (what’s safe to try) Audit AI practices quarterly

Workshops That Teach Me More Than You

I don’t charge for workshops. Altruism? No: it’s self-interest. “If you’re not paying for it, you’re not the customer; you’re the product being sold.” Andrew Lewis, via Tim O’Reilly, 2010. My workshop process is designed to benefit me first. I pick topics I want to learn, not stuff useful to the audience. Example: I picked DuckDB for my PyCon India 2025 talk to learn it. ...

Things I Learned - 12 Oct 2025

This week, I learned: ‘…as few as 250 malicious documents can produce a “backdoor” vulnerability in a large language model… data-poisoning attacks might be more practical than believed." Anthropic Tim Urban’s 2015 article, The AI Revolution: The Road to Superintelligence, is surprisingly relevant. A key theme is that post artificial-super-intelligence, pretty much anything we know / predict is probably wrong. LLMs are bad at asking questions, so you need to plan on their bahlf first. LLMs are bad at copy paste, so giving them a scaffolding to edit helps. Two things LLM coding agents are still bad at The VPN industry is a consolidating oligopoly that doesn’t offer much security and biases towards affiliates. Who Owns Express VPN, Nord, Surfshark? As of 2025, a fine-tuned DeBERTa-v3-Large / RoBERTa-Large model is better than an LLM at emotion classification. roberta-base-go_emotions is a good starting point if you don’t want to fine-tune. ChatGPT OpenAI defines an AI agent as “a system that can do work independently on behalf of the user”. swyx Brain coding is the new term for human coding - as opposed to vibe-coding (AI codes, human doesn’t review code) and AI coding (AI codes, human reviews code). npx -y emoj lets you type text and pick a relevant emoji. Many people who shifted away from conflict aversion did so by systematizing it. ChatGPT Martin Luther King Jr institutionalized not stepping back from conflicts in his movement. Kim Scott (Radical Candor) practiced caring more via short, specific feedback loops. Kwame Christian (Compassionate Curiosity) practiced ask open questions. Ed Catmull (Pixar) instituted Braintrust to ask candid questions. Ray Dalio (Bridgewater) instituted radical transparency. Many people who adopted a failure-seeking mindset made failure frequent, small, cheap, and informative. ChatGPT Jia Jiang ran a 100-day rejection challenge, acclimatizing himself to failure. Kim Liao (writer) moved from submission-avoidance to “100 rejections/year”. Reshma Saujani (Girls Who Code) built a practice of “brave, not perfect” - ship before perfect. Ray Dalio (Bridgewater) instituted mistake logs and “pain + reflection = progress”. Astro Teller (X, the Moonshot Factory) rewired incentives so teams are rewarded for killing their own ideas early. Sara Blakely (Spanx) set weekly failure quotas. Kathryn Schulz (author of Being Wrong) converts failures into teaching methods. Sindre Sorhus has already created a micro-framework css-extras using CSS @functions. Today, if I had to build agents, here are the tools and environment capabilities I’d ask for: Ask user (for clarifications) Internet tools Search Fetch (CORS-piercing) Scraper with XPath/CSS Selectors Access to llms.txt LLM APIs Summarizer (condenses chat) Sub-agents Coding tools Markdown convertor Code execution (including tests) Browser + DevTools for testing Memory / storage Tool/MCP directory with search Noting a few things that I find #impossible to do today with LLMs: LLMs can’t run experiments / explorations, like trying out on a new tool or web app in an environment, the way I would. LLMs can’t move stuff on my machine, e.g. notes from one list to another, when they’re only on my laptop, not GitHub. LLMs can’t capture the past wisdom in my head, e.g. the distilled principles of data visualization that we applied at Gramener. LLMs can’t prioritize my to-do list based on my preferences and what’s important to me. LLMs cannot write a blog post in my style of writing. When recruiting for people in the LLM era, look for questioning ability, sensible thinking, and how they use AI. Give them lots of fluff and context. Can they cut through it? Is their answer concise and to the point or waffling? Like post the industrial revolution, more people will become operators looking after AI, not craftsmen. This includes coding. zx is a nice JS-based alternative to shell scripts. const branch = await $`git branch --show-current`; await $`dep deploy --branch=${branch}`; docker run -it --name test --user vscode mcr.microsoft.com/devcontainers/base:ubuntu gives you a test Ubuntu image closer to a desktop / user setup rather than a server. Useful to try out apps.

Tamil AI

I was testing LLMs’ sense of Tamil humor with this quote: Extend this post with more funny Tamil words that end with .ai - mentioning why they’re funny. Chenn.ai is the artificial intelligence capital of India. Kadal.ai Kad.ai Dos.ai Vad.ai Ad.ai Thal.ai Mallig.ai Aratt.ai And finally Podad.ai All spoken in namma bash.ai 😅 The Chinese models didn’t fare well. DeepSeek made up words. Mood.ai - An AI that perfectly captures your mood. Sokk.ai - The AI for when you’re bored. Thanni.ai - A hydration assistant. Qwen too. ...

How to create a data-driven exam strategy

Can ChatGPT give teachers data-driven heuristics on student grades? I uploaded last term’s scores from about 1,700 students in my Tools in Data Science course and asked ChatGPT: This sheet contains the scores of students … (and explained the columns). I want to find out what are the best predictors of the total plus bonus… (and explained how scores are calculated). I am looking for simple statements with 80%+ correctness along the lines of: ...

Vibe-Coding for Interesting Data Stories

Last weekend, I fed Codex my browser history and said “explore.” It found a pattern I call rabbit holes – three ways we browse: Linear spiral - one page > next page > next. E.g. filing income tax, clicking “next” on the PyCon schedule. Hub & spoke - hub > open tabs > back to hub. E.g. exploring DHH’s Ubuntu setup, checking Firebase config. Wide survey - source > many, many pages. E.g. clearing inbox, scanning news. Then Claude Code built this lovely data story. ...

The Non-Obvious Impact of Reasoning Defaults

Yesterday, I discovered how much reasoning improves model quality. My Tools in Data Science assignment asks students to draft an llms.txt file for ipify and auto-checks with GPT-5 Nano - a fast, cheap reasoning model. I set reasoning_effort to minimal and ran this checklist: 1. Starts with "# ipify" and explains ipify. 2. Markdown sections on API access, support (e.g. GitHub, libraries). 3. Covers API endpoints (IPv4, IPv6, universal) and formats (text, JSON, JSONP). 4. Mentions free, no-auth usage, availability, open-source, safeguards. 5. Has maintenance metadata (e.g. "Last updated: <Month YYYY>"). 6. Mentions robots.txt alignment. Stay concise (no filler, <= ~15 links). If even one checklist item is missing or wrong, fail it. Respond with EXACTLY one line: PASS - <brief justification> or FAIL - <brief explanation of the first failed item>. With a perfect llms.txt, it claimed “Metadata section is missing” and “JSONP not mentioned” – though both were present. ...

Things I Learned - 05 Oct 2025

This week, I learned: Wrong answers are useful if you discover why they said that. Conversation is a game where you CO-CONSTRUCT common ground. Mike Caulfield BMTC hourly data from Bangalore Metro is available via RTI. Vivek “Find evidence for and against” improves LLM responses far more than “Are you sure?” Mike Caulfield SSH3 is an emerging SSH alternative that’s written on top of HTTP/3. It supports OAuth2, OpenID Connect, and HTTPS for certificates. Cholesterol has become a victim of its own success. We give statins to those with high LDL. So most people who have heart attacks have lower-than-natural cholesterol. Inflammation (HS-CRP) is now the strongest predictor of heart attack (American College of Cardiology). The usual stuff reduces HS-CRP: no sugar/carbs, veggies, nuts, green tea, turmeric/black pepper, weight loss, exercise, sleep, meditation. ⭐ The beginner mindset: scrub your instincts and don’t let life experience cloud you. This takes effort. Hold on to naivette and escape cynicism. The Knowledge Project: Barry Diller Forecasts give comfort. They may not be good but they feel safer than instinct. The Knowledge Project: Barry Diller My laptop’s mic is much better than my phone’s mic, surprisingly. When recording conversations, it’s better to leave my laptop open and record than use the phone’s recording app. ⭐ Here are the major not-immediately-obvious LLM megatrends/superpowers I see. Swarms. Ask for dozens of solutions in parallel. Merge, rank, auto-debate, converge. Personalize at Scale. Create feedback, designs, excerpts/summaries, … tailored to EACH person at scale. Computer use. Agents operate UIs like a human (browser, apps). LLM-as-a-judge. Use AI to validate ever-increasing AI generated output. Synthetic data. Create realistic data for prototypes, testing edge cases, market research simulation, training data, … Code on demand. Ask for outcomes directly. Agents code on the fly to get there, in data science, research, management, … Style transfer. Copy a master’s style of drawing, coding, writing, … creating an army of their apprentices. Multi-modality. Native voice/video/screensharing and long-context perception Citizen experts. Non-expertise is not a barrier. Amateurs can create expert-level films, music, software, reports, … Long-context LLMs. Growing context size lets us process entire repos, legal libraries, personal lifelogs, … Memory. Assistants learn per-person / per-team. Cuts prompt, builds knowledge. Agent-to-Agent. Agents consuming content (e.g. llms.txt), agents calling agents (sub-agents, A2A protocol, …) Real-world tools. Write reports, send emails, shop online, use computer, control devices, … Jagged frontier. AI is great at certain things but terrible at others. This frontier is unknown and shifting rapidly. Lethal trifecta. You can only have 2 out of these 3: private data, untrusted content, and external communication. Edge/Private AI. Small models on private cloud compute. Authenticity. What content is authentic? What’s slop? What’s fraud? Are AI twins liable? AI Governance. Strict liability, transparency mandates, state control, … Not sure about or haven’t seen enough of these: Data / workflow as the moat AI native business models AI digital-divide ⭐ What I’d like to do next, maybe, is build a boutique “AI Studio”. Small group of good people coding delightful AI problems. Something that doesn’t scale. GLM models can be used with Claude Code. At $3/month and a quality close to Claude 4 Sonnet, this is a good deal. But the effort of adding a new subscription is too high for me. I’d rather use it via OpenRouter which is doesn’t support an Anthropic API end point at the moment. typst is a good LaTeX alternative. Markdown-like syntax with fast rendering. Mostly useful for researchers using LaTeX. But publishers / journals don’t accept typst often. libSQL is an SQLite compatible fork with remote access, replication, ALTER TABLE to modify columns, random ROWID, etc. It supports the same externsions. The maintainers are working on turso - a SQLite compatible improvement with async, vectors, change data capture, etc. (still in alpha). But because of this, I’m a bit uncertain about the future of libSQL. ⭐ LLM benchmarks show a correlation of ~0.5, hinting at a common theme of intelligence. Correlations in coding & science are particularly high. Ethan Mollick. Reminds me of student marks correlations. Strong correlation clusters (physics, chemistry, biology, mathematics, computer science) with the weaker correlations going down to ~0.5. What does it indicate? LLMs learn like people? Knowledge areas cluster? Humans write benchmarks like exams? Dayflow records your screen at 1 fps and uses Gemini to summarise your activity every 15 min. Has low CPU usage. ⭐ Code Mode is a smart way to use MCPs and a very likely future direction. Using LLMs to write code to call MCPs rather than directly. Cloudflare supports an AI Index which will eliminate the need for a lot of custom RAG engineering.

Tools in Data Science Sep 2025 edition is live: https://tds.s-anand.net/. Major update: a new AI-Coding section and fresh projects. I teach TDS at the Indian Institute of Technology, Madras as part of the BS in Data Science. Anyone can audit. The course is public. You can read the content and practice assessments. I fed the May 2025 term student feedback into The Sales Mind and asked: What are the top non-intuitive / surprising inferences? What are interesting observations? What are high impact actions? Full analysis: https://lnkd.in/gVWVqaxN: summary, outliers, and action ideas. ...

The 11 sites I visit most: ChatGPT. It’s replaced Google as my default knowledge source. I prefer it over Gemini, Claude, etc. because the app has good features (memory from past conversations, code interpreter, strong voice mode, remote MCP on web app, etc.) The OpenAI models have pros and cons, but the app features are ahead of competition. Gmail. It’s my work inbox. Interestingly, I check it more (and respond faster) than social channels (e.g. WhatsApp, Google Chat, LinkedIn). It also doubles up as my task queue. WhatsApp. It’s my default phone + messaging app. A fair bit of my work communication happens here, too. Prime Video. I mainly watch The Mentalist. Totally love Patrick Jane! Google AI Studio. Mostly for transcription. It’s better than Gemini on UI, ability to handle uploads, file-formats, etc. It’s also free (though the data is used for training.) My Talks page: https://sanand0.github.io/talks/. I give 1-1.5 talks a week, mostly on AI/ML topics. I use Marp to render Markdown slides and publish it here. Google Chat. It’s Straive’s social channel. I can’t use it from my phone, so I log in only if I need to check if I missed something. LinkedIn. It’s where I post by default. I don’t use it for networking and only connect with people I’ve met and know well. YouTube. Mostly for movie clips over dinner. I occasionally watch educational content. LLM Foundry: https://llmfoundry.straive.com/. LLM Foundry is Straive’s internal gateway to multiple model APIs (I built it). I use it to experiment with models, grab API keys, and demo LLMs to clients. Squoosh. I compress every image, every time. Mostly into WebP (hands-down the best format today), typically lossless with an 8-color palette, or lossy at ~0-10% quality for photos. The list will change. But the reasons probably won’t: fast, simple, automatable, and practical (for me). ...

Vibe-Scraping: Write outcomes, not scrapers

There hasn’t been a box-office explosion like Dangal in the history of Bollywood. CPI inflation-adjusted to 2024, it is the only film in the ₹3,000 Cr club. 3 Idiots (2009) is the first member of the ₹1,000 Cr club (2024-inflation-adjusted). The hot streak was 2013-2017: each year, a film crossed that bar: Dhoom 3, PK, Bajrangi Bhaijaan, Dangal, Secret Superstar. Since then, we never saw such a release except in 2023 (Jawan, Pathan). ...

Things I Learned - 28 Sep 2025

This week, I learned: selectolax is a fast, easy-to-use, modern HTML5 parser with CSS selectors. A good replacement for lxml.html. The most effective way to convert a blob (e.g. file input) to a data URL on the browser seems to be via the FileReader API. const blobToDataURL = (blob) => new Promise((res, rej) => { const r = new FileReader(); r.onload = () => res(r.result); r.onerror = () => rej(r.error); r.readAsDataURL(f); }); Tool calls in OpenAI support files and images. OpenAI ⭐ “Task parity is not the same thing as job parity There is a lot of complexity as many different tasks are bundled into jobs, and many jobs contribute to processes inside an organization The jagged frontier of AI ability means doing tasks well doesn’t translate to doing jobs well.” Ethan Mollick Adding // @ts-check to a JavaScript file and documenting types via JSDoc might be the simplest way to migrate phase-wise from JS to Typescript. envsubst < file.txt replaces file.txt with the environment variable, e.g. $HOME is replaced by the HOME environment variable. Clean shell-level templating. GitHub Copilot CLI is out. npx -y @github/copilot Compost is the cheapest thing per ton that I can buy on Amazon India. I can buy 1 ton of compost for Rs 13,500. ChatGPT yt-dlp requires Deno from now on. #14404 In meetings, make cameras optional by default – and judge engagement by contributions, not video – because a 4-week field experiment found camera-on increased fatigue and reduced voice, especially for women and newcomers. Camera on early for trust building is useful. PubMed wrkflw is a quick and light way to test GitHub actions before publishing. It runs GitHub actions locally. GPT-5-Codex is available as an API and on LLM. Simon Willison ⭐ I’m habit engineering, i.e. discovering and stacking habits on to existing ones. For example: ChatGPT suggested increasing observability based on code reviews. I’m including it in my weekly codecast. ChatGPT suggested defining closures inmeetings. I’mn now discussing objectives at meeting starts and effectiveness at the end. Since Anaconda cannot be used for free by organizations with 200+ people, Straive’s received legal notices from Anaconda. Since laptops are under central IT administration, they went ahead and deleted all Anaconda instances. Installing miniconda for use with conda-forge requires admin access that most developers do not have, however. That leads to an interesting “No Python” situation. This is where uv becomes the knight in shining armor. Perceptron is SOTA LLM for object bounding boxes. Just 2B parameters. Gall’s “law” says that complex systems that work evolved from simple systems that worked. But a complex system designed from scratch won’t ever work. This holds in uncertain environments. But where formal theory or regulations exists, it doesn’t. ChatGPT uvx --with visidata vd gives you a command-line Excel editor to edit / convert CSV, Excel, JSON, SQLite, directories, etc. uvx markitdown https://example.com/ fetches example.com as Markdown. I learnt this when I told Codex it could use uvx markitdown to convert PDFs and it figured this part out by itself. The Dropbox connector for ChatGPT is the little flaky – at least on Android. It could not identify a file that was clearly there in Dropbox and I had to upload it manually. ChatGPT’s output is too dense for me. I added this to my custom instructions: “Write in simple language. Explain non-obvious terms intuitively.” yt-dlp has a --download-sections option that downloads specific YouTube time ranges. For example --download-sections "*00:01:00-00:03:00" downloads roughly (not exactly) from 1 min to 3 min. Note the * at the beginning. My Lenovo laptop’s touchpad started scrolling instead of moving when I moved my finger. Many things could have caused it, but the solution was to click (not tap) the top middle of the trackpad. ChatGPT The India Entrance Exam database is a dataset collating Indian entrance exams.

How to review trending GitHub repos on VS Code

Here’s how I track trending GitHub repos each week. I run a scheduled script that saves a clean TSV I can scan fast. It uses uvx gtrending to fetch weekly trending repos for: Rust: High-quality system tools. (Anything in Rust seems cool.) Go: Reliable CLI/infra tools. (Like Rust, most Go code seems good.) Python: Most AI/ML stuff TypeScript: Most modern JS codebases JavaScript: Most front-end utilities Shell: Productivity scripts I pipe results through jq to extract: ...