Vibe Shopping

I’ve started vibe shopping, i.e. using ChatGPT to shop for small, daily items and buying without verifying. For example: “A metal rack for the floor: at least 2 ft * 1 ft * 2 ft, small gaps, popular options on Amazon.in.” https://chatgpt.com/share/68d61d68-7040-800c-936b-354749539308 “An optical wired mouse that’s smaller than usual, 4*+, popular, Prime-eligible for Chennai by the weekend on Amazon.in.” https://chatgpt.com/share/68d61e0d-420c-800c-bc71-821b9f9296a9 The best use is when I don’t know the right terms. In this case, the terms were wire rack and mini mouse. ...

Tools in Data Science Sep 2025 edition is live: https://tds.s-anand.net/. Major update: a new AI-Coding section and fresh projects. I teach TDS at the Indian Institute of Technology, Madras as part of the BS in Data Science. Anyone can audit. The course is public. You can read the content and practice assessments. I fed the May 2025 term student feedback into The Sales Mind and asked: What are the top non-intuitive / surprising inferences? What are interesting observations? What are high impact actions? Full analysis: https://chatgpt.com/share/68cba081-afc0-800c-9da3-75222e84a499: summary, outliers, and action ideas. ...

The 10 sites I visit most often

Here are the 10 most frequent sites I use (based on Microsoft Edge’s home bar): ChatGPT. It replaced Google as my default knowledge source. I prefer it over Gemini, Claude, etc. because the app has good features (memory from past conversations, code interpreter, strong voice mode, remote MCP on web app, etc.) The OpenAI models have pros and cons, but the app features are ahead of competition. Gmail. It’s my work inbox. Interestingly, I check it more (and respond faster) than social channels (e.g. WhatsApp, Google Chat, LinkedIn). It also doubles up as my task queue. Prime Video. I mainly watch The Mentalist. Totally love Patrick Jane! Google AI Studio. Mostly for transcription. It’s better than Gemini on UI, ability to handle uploads, file-formats, etc. It’s also free (though the data is used for training.) My Talks page. I give 1-1.5 talks a week, mostly on AI/ML topics. I use Marp to render Markdown slides and publish it here. Google Chat. It’s Straive’s social channel. I can’t use it from my phone, so I log in only if I need to check if I missed something. LinkedIn. It’s where I post by default. I don’t use it for networking and only connect with people I’ve met and know well. YouTube. Mostly for movie clips over dinner. I occasionally watch educational content. Playground. LLM Foundry is Straive’s internal gateway to multiple model APIs (I built it). I use it to experiment with models, grab API keys, and demo LLMs to clients. Squoosh. I compress every image, every time. Mostly into WebP (hands-down the best format today), typically lossless with an 8-color palette, or lossy at ~0-10% quality for photos. That’s my current home row. It will change. But the reasons probably won’t: fast, simple, automatable, and practical (for me).

Voice coding is the new live coding

In Feb 2025 at PyConf Hyderabad, I tried a new slide format: command-line slideshows in bash. I’ve used this format in more talks since then: LLMs in the CLI, PyCon Singapore, Jun 2025 Agents in the CLI, Singapore Python User Group, Jul 2025 DuckDB is the new Pandas, PyCon India, Sep 2025 It’s my favorite format. I can demo code without breaking the presentation flow. It also draws interest. My setup was the top question in my PyCon talk. ...

Things I Learned - 21 Sep 2025

This week, I learned: When editing an image, ChatGPT’s non-thinking mode does a much better job of preserving the original image features than the thinking mode. When editing my photo, I found that the thinking mode creates images that looks quite different than me. A surprising effect of overthinking. ⭐ When evaluating model accuracy, compare with human accuracy rather than perfect accuracy. SMEs rarely agree among themselves, so it’s unlikely that they will agree with an LLM. Instead, measure how often the LLM agrees with the majority of SMEs and how often it disagrees with all SMEs. This gives a more realistic measure of accuracy. LLMs instead of Human Judges? and Judging LLM-as-a-Judge. ChatGPT I understand at least one mechanism of how costs are inflated in large organizations. Even people who want to keep costs low find that the process of tracking expenses, submitting receipts, answering questions around approval, adds transaction cost. So, rather than going for a $10 plus top up mechanism, I would rather go for and ask people to take a $500 top up. Better ask for more and waste than have to ask again. YouTube downloaders: yt-dlp for the CLI, Stacher for Windows/Mac/Linux, Cobalt for a web-based app. Ref VS Code a bunch of features I discovered: It can run a terminal in its own new window for over a year (via Ctrl+P > Terminal: Move Terminal into New Window). Now, Ctrl + Alt + Shift + ` does this directly. Terminal Intellisense shows completion suggestions in the UI. Very helpful. Ctrl+Space triggers the menu completion. ⭐ “We find that the per-step error rate itself rises as the task progresses”, i.e. once a conversation goes the wrong way, it’s really hard to correct it. The Illusion of Diminishing Returns Japonaise Cake is the name of the pastry that I had as a child and grew up longing for. I have spent several weeks searching for it in the roadside bakeries at Bangalore and Chennai but only one bakery seems to have it. systemd is the modern way to run scheduled jobs, instead of cron. It’s far more complex. But it can catch up on missed runs via a Persistent option. Working with systemd timers ⭐ Vice-chancellors of universities resist AI in education because (a) their faculty does not know AI and (b) AI is unreliable. But they are interested in (a) large-scale AI-evaluation and (b) AI-enabling entire campus. tldr.sh offers concise man pages, e.g. uvx tldr jq. cheat.sh offers detailed examples, e.g. curl cheat.sh/jq or curl cheat.sh/:help. ugrep is a fast drop-in replacement for grep. It supports fuzzy search with a customizable Levenshtein distance. Also ug -Q shows an interactive TUI searches like VS Code’s “Search in Files” feature. Very intuitive. Dagger lets you write CI/CD workflows in Python. I tried running it but after 7m of pulling large Docker containers, I gave up. Too heavy. dotslash lets you write scripts that downloads GitHub releases, caches, and runs them. Requires writing scripts. I prefer mise. ChatGPT has a quota for searches. I saw this phrase in the reasoning traces: “I’ll avoid overloading on citations since we only have a few calls left.” It doesn’t seem to be in ChatGPT’s system prompt from last month, so it’s either part of the tool response or a new prompt. Depending on the underlying chips that a model uses, the floating point multiplications may differ and model quality can vary. So Claude 4 Opus running on Anthropic’s GPUs can produce different results from when running on Google’s GPUs or Amazon’s GPUs.

AfterSlides: Write Slides After Talks

25 years ago, Mr. Krishnan (IAS) amused us with anecdotes of bureaucrats writing meeting minutes before the meeting. This week, I flipped that. I wrote slides after the talk. I call them AfterSlides. Why. I ran a couple of Ask-Me-Anything (AMA) sessions where the audience set the agenda. I learned their interests. They got answers. No slides prepared. How. I okayed recording with the organizers, recorded on my phone, transcribed with Gemini, and asked ChatGPT to generate the AfterSlides. ...

Turning Generic Gifts Into Joy with AI

In 2001, I received a campus interview invitation from BCG. It opened like this: Dear Anand, We’d like to invite you to an interview on … We were impressed by your … … and went on to share 2-3 phrases about what they liked about my CV. A dozen of us got similar letters – each personalized! That was cool. Two decades later, I still remember it. It showed care and competence – care enough to personalize for each candidate, competence to pull it off at scale across campuses. ...

GPT-5 (Codex) follows instructions exactly as given. Usually a good thing, but sometimes, it this is what happens. AGENTS.md: ALWAYS WRITE TESTS before coding. Codex: Let me begin with the tests. (Spends 5 minutes writing tests.) Anand: Stop! This is a proof of concept. We don’t need tests! AGENTS.md: Write tests before coding. Drop tests for proof-of-concepts. Codex: (Proceeds to delete all existing tests.) Anand: STOP! We need those tests! ...

Tomorrow, we’ll be vibe-analyzing data at a Hasgeek Fifth Elephant workshop. It’s a follow-up to my DataHack Summit talk “RIP Data Scientists”. I showed how it’s possible to automate many data science tasks. In this workshop, the audience will be doing that. Slides: https://sanand0.github.io/talks/2025-09-16-vibe-analysis/ (minimal because… well, it’s “vibe analysis”. We’ll code as we go.) Here are datasets I’ll suggest to the audience: India Census 2011: https://www.kaggle.com/datasets/danofer/india-census MovieLens movies: https://grouplens.org/datasets/movielens/32m/ IMDb movies: https://datasets.imdbws.com/ Occupational Employment and Wage Statistics (OEWS): https://www.bls.gov/oes/tables.htm Global AI Job Market & Salary Trends 2025: https://www.kaggle.com/datasets/bismasajjad/global-ai-job-market-and-salary-trends-2025 Flight Delay Dataset: https://www.kaggle.com/datasets/shubhamsingh42/flight-delay-dataset-2018-2024 London House Price Data: https://www.kaggle.com/datasets/jakewright/house-price-data Exchange Rates to USD: https://www.kaggle.com/datasets/robikscube/exhange-rates-to-usd-from-imforg-updated-daily Thailand Road Accidents (2019-202): https://www.kaggle.com/datasets/thaweewatboy/thailand-road-accident-2019-2022 … but if you’d like stories from any interesting recent datasets (10K - 10M rows, easy-to-download), please suggest in the comments. 🙏 ...

I use LLMs to create photos and comics. But they can generate any kind of illustration. So why limit ourselves? My problem is imagination: I know little about art. So, I asked ChatGPT, Claude, and DeepSeek: Suggest 10 unusual illustration styles that are not popular in social media yet but are visually striking. I would like to have an LLM create images in that style. For each of those, show me an (and link to) an online image in that style. ...

Things I Learned - 14 Sep 2025

This week, I learned: Though I’m connected on LinkedIn with people I can’t remember (weak ties), pruning them shrinks serendipity. Weak ties, despite noise, are disproportionately valuable for opportunities, e.g. intros, jobs, and pruning reduces future upside. Science Claude has a Python + Node code interpreter that can access GitHub, PyPi, npm and Google. Simon Willison SuperTinyIcons has very small icons for many websites and is available via CDN. Sample: http://cdn.jsdelivr.net/npm/super-tiny-icons/images/svg/github.svg Clock bench is an LLM benchmark based on how well LLMs tell the time from an analog clock. Humans (89%) are much better than the best model (Gemini 2.5 Pro - 13%). Veo 3 is now available via API. Veo 3 fast is 15s/second. Google ChatGPT has full support for MCPs via Developer mode in Plus and Pro accounts, via “Developer mode”. OpenAI In Pyodide, you can use from js import document and then document.querySelector to manipulate the DOM directly from Python. from pyodide.http import pyfetch lets you use fetch. gtrending is a Python package that fetches trending GitHub repos, users, etc. uvx gtrending repos --language rust --since weekly fetches trending Rust repos of the week. astgrep lets you search in code (across languages) using AST patterns. Like semgrep but more about code search than security. uvx --from ast-grep-cli ast-grep runs from the CLI. Useful for code rewriting, fast linting, code search. hurl is a CLI config-based HTTP automation tool. Useful for tests, bulk (templatized) HTTP requests, etc. rustdesk is an open-source remote desktop software. TeamViewer alternative. Self-hostable. prek is a much faster version of pre-commit - a cross-language pre-commit hook manager. ⭐ mise is a tool version manager. Combines nvm/fnm, pipx, etc. Supports running several tools with a smooth installation. The npm phishing email was a great one. It compromised chalk which is used in most npm packages. This may be one of the best supply chain attacks in recent times and makes me want to pin versions instead of using npx -y. Also makes me glad that I’m sponsoring @isaacs and @sindresorhus - two critical open source maintainers. “I pay for YouTube Premium. For my money, it’s the best bang-for-the-buck subscription service on the market”. - Gavin Andregg LLMs are non deterministic because GPUs add floating point numbers concurrently and FP addition is non associative - order matters. Thinking Machines Claude.ai can natively work with Excel, PPTX, DOCX, and PDF files now. With embeddings, atomic labels + hierarchy beat instruction-heavy prompts. Prefer short, concrete sub-labels (e.g., “promotion,” “job security,” “flexibility”) that roll up to a parent “career” rather than a composite instruction like “Total Rewards and Career Growth”. Embedding similarity is not smart enough to figure this out. Today, RPA is cheaper than LLMs in some areas. But it’s a moving target. LLM costs are fall fast: 70–90% declines across major providers in 1.5 years. Therefore, waiting has option value. But classic IT compares static quotes, not declining curves, and hence is likely to under-procure LLM solutions. ⭐ The biggest near-term ROI for LLMs in data science is like ‘boring’ data work: PII tagging, data dictionaries, ER/joins, SDTM mapping, etc.. People expect flashy GenAI, but LLMs can bootstrap schema matching and data-cleaning, speeding engineer verification, which is more useful at scale. You can create an infinite leaflet map with nano banana. Codex CLI with high reasoning effort seems far more comprehensive than Codex online. I asked both to identify the system requirements (URLs to access, software to install, ports to open) for my Tools in Data Science course. Codex CLI got it right one shot (after 10 minutes of thinking). Codex online missed several items even after 4 attempts. The Reod on Elantris might have been triggered by Jaddeth who might be an Autonomy avatar. ChatGPT Output tokens dominate latency. Decoding is sequential (one token depends on all prior tokens), so long completions are the main throttle. Shrinking returned text (e.g., send spans/tags instead of echoing paragraphs) yields a far bigger win on latency than shrinking inputs.

Slides for my DataHack Summit talk (controversially) titled RIP Data Scientists are at https://sanand0.github.io/talks/2025-08-21-rip-data-scientists/ Summary: as data scientists we explore, clean, model, explain, deploy, and anonymize datasets. I live-vibe-coded each step with DGCA data in 35 minutes using ChatGPT. Of course, it’s the tasks that are dying, not the role. Data scientists will leverage AI, differentiate on other skills, and move on. But the highlight was an audience comment: “I’m no data scientist. I’m a domain person. I’ll tell you all this: If you don’t follow these practices, you won’t have a job with me!” ...

My Tools in Data Science course uses LLMs for assessments. We use LLMs to Suggest project ideas (I pick), e.g. https://chatgpt.com/share/6741d870-73f4-800c-a741-af127d20eec7 Draft the project brief (we edit), e.g. https://docs.google.com/document/d/1VgtVtypnVyPWiXied5q0_CcAt3zufOdFwIhvDDCmPXk/edit Propose scoring rubrics (we tweak), e.g. https://chatgpt.com/share/68b8eef6-60ec-800c-8b10-cfff1a571590 Score code against the rubric (we test), e.g. https://github.com/sanand0/tds-evals/blob/5cfabf09c21c2884623e0774eae9a01db212c76a/llm-browser-agent/process_submissions.py Analyze the results (we refine), e.g. https://chatgpt.com/share/68b8f962-16a4-800c-84ff-fb9e3f0c779a This changed our assessments process. It’s easier and better. Earlier, TAs took 2 weeks to evaluate 500 code submissions. In the example above, it took 2 hours. Quality held up: LLMs match my judgement as closely as TAs do but run fast and at scale. ...

Things I Learned - 07 Sep 2025

This week, I learned: A quick way to get the docs for an npm package is npm view package-name readme. For PyPi, it’s curl -s https://pypi.org/pypi/package-name/json | jq -r .info.description Searching embeddings of text summaries of images improves vision search a lot. Jason Liu LLM vision capabilities are far from enough to click accurately. The AI Digest GLM supports the Anthropic API. So it’s possible to use Claude Code with GLM 4.5. z.ai gitingest has a CLI. uvx gitingest https://github.com/owner/repo fetches the code in the Git repo suitable for passing to an LLM. Claude’s API has access to a code execution tool via the code-execution-2025-08-25 beta header. It runs Python 3.11 with 1GB RAM and 5GB disk space, with Internet disabled. The containers persist for 30 days and can access uploaded files. Anthropic You can use the <script> tag in XML to render RSS, as an alternative to XSLT. Jake Archibald browser-fs-access is a ponyfill for the File System Access API and should be the go-to approach for reading and saving files via the browser. ⭐ To run a Python project directly from GitHub, use uvx --from "git+https://github.com/owner/repo.git@branch" script-name Github1s is a cool tool. Replace github.com with github1s.com to get a VS Code page that opens that repo. It’s fast and very useful to browser repos. For example, https://github1s.com/sanand0/tools-in-data-science-public is my TDS course repo. The /init command in Claude Code and Codex CLI aren’t up to the mark. I believe a good README.md provides better specs for existing repos. There is a window of opportunity to craft a good prompt to generate this from repos. #ai-coding Since LLMs can code, I’d love to see useful CI/CD pipelines where the LLM creates / edits code on the fly. LLMOps might take on a new angle - it’s not just Ops on LLM apps. It’s LLMs as part of DevOps. insertAdjacentHTML is a great API but suffers from XSS vulnerabilities. The TrustedHTML API is an emerging standard to create sanitized HTML strings. Notes from Anthropic’s How we built our multi-agent research system Multi-agent systems are like organizations that can do more than a single human. Multi-agent systems conserve the context window. The top 3 drivers of performance variance: spending more tokens, more tool calls, better models You need to teach (prompt) the orchestrator how to delegate to sub-agents How to avoid task duplication among agents How many sub-agents to spin up for different kinds of tasks Which tools to use for what Provide sub-agents objective, output format, tools/sources, clear task boundaries ⭐ Self-improving agents, e.g. prompt optimizers or tool-testing agents that run and rewrite tool descriptions, are powerful Since agents are stateful, resuming from failure is important. Agent prompts are public Claude models support interleaved thinking that lets them think between tool calls via an anthropic-beta: interleaved-thinking-2025-05-14 header. OpenAI models natively think between tool calls, preserving thinking across calls with the Reasoning API. Gemini lets you control the amount of thinking between tool calls via the thinkingBudget parameter. Anthropic auto-extracts persona vectors or traits by generating LLM responses to the same question with system prompt A (“You are evil”) and B (“You are helpful”) and subtracting the average activations. This helps monitor personality drifts during training, deployment, and even in training data. From My experience creating software with LLM coding agents - Part 2 (Tips) #ai-coding Use standards. Or, write your standards in README.md and tell AGENTS.md / CLAUDE.md to read it. Use a standard file structure. Or in README.md, list what each file is for. Helps agents pick the right file for context. Use a standard build/lint/test setup (e.g. package.json scripts). Or Localize context, i.e. add context in files that use them. E.g. add comments in test files on how to execute them. Keep files modular so agents can read less code and conserver context. Write a developer’s guide. Use with /init in Claude Code / Codex / … or have an LLM generate a developer guide. Edit manually. Agents don’t write great specs. Document the design. Write DETAILED specs to reduce deviations. Share goal while specifying tasks. Helps agents fix related stuff. Use deep reasoning mode, e.g. “think harder” or “ultrathink” in Claude Code, or -c model_reasoning_effort=high in Codex. ⭐ Run parallel agents in different windows and share agent feedback with each other. E.g. Server/API coding in one window. Client coding in another. Plan/ask in one window. Execute in another. Add debug logs to help agents spot errors. Start/stop of long/complex operations, state changes, external interfaces. Include full objects in logs. Prioritize diffs. Trim long contents. ⭐ Give access to debugger, e.g. Chrome remote debugging at localhost:9222 Agents write poor tests. So: Manually add important ones. ⭐ When you find a bug, ask the agent why the tests missed it and have it add. Review and remove useless ones. Ensure agent passes test cases. Tell them not to disable / skip failed tests. Have agents create a new branch per feature and auto-commit. Merge when successful. Feel free to provide a TODO list or update it on the fly. Interrupt with Esc if the agent’s thinking is off-track. When agents struggle, write tools to help them, e.g. JSON splicing, Excel edits, etc. Agents bloat code and features. Explicitly refactor and trim. From A Guide to Gen AI / LLM Vibecoding for Expert Programmers #ai-coding Use vibe coding for stuff you don’t need to maintain. Use vibe coding for stuff you know well enough to review quickly. Use vibe coding for independent tasks where you’re not fussed which ones fail. Vibe coding turns everyone into a team lead. That needs skills: planning, allocation, design, review, feedback, … ⭐ Empathy enables vibe-coding. Empaths allocate work by ability, review regularly, and provide detailed specs and feedback. Have LLMs plan and allocate tasks. “Read this repo. Suggest improvements.” (Review.) “Add these as issues.” “Add the top 3 Sentry log errors as issues.” “Find the easiest issue and solve it with a PR.” Use GitHub issues extensively for planning. ⭐ Create a separate GitHub account for your agent! Let it push. Assign it issues. Treat it like an intern. Ensure agent passes test cases and run till the do, or report the core difficulty. Throw away rubbish code and start again. Issues unsolved in 2-3 tries are too hard for agents or are poorly spec-ed. The context7 and Sequential Thinking MCPs are useful. The O*NET database has a list of tasks/activities, skills, titles, … for each job, at least in the US. It has been updated every few months since 2003. It’s an excellent source to analyze things like the impact of AI across jobs. Anthropic used it to map Claude.ai conversations with educator tasks to identify how educators are using AI. How educators use Claude (apart from learning) is mainly driven by automation of tedious tasks, ideation, and personalization for each student. Curriculum development: Develop games, interactive tools, MCQs, simulations, content Academic research: Bibliographies, statistical modeling, revisions from feedback. Assessments: Student feedback, scoring, summarization. Administration: recommendation letters, meeting agendas, admin tools. OpenAI used feedback from ~1000 annotators to update their model spec. Learnings: Request targeted feedback. Annotators reviewed responses pre-selected for subjectivity against a pre-selected rubric () More examples. Most improvements add examples of good and bad responses. Use detailed prompts. Newer models do well with HUGE system prompts. That’s how we frame better questions. The Great Refactor is refactoring critical open-source C code to Rust using Claude Code, since 70% of vulnerabilities are memory related and Rust is memory-safe. No repo/docs yet. #ai-coding

Problems that only one student can solve

Jaidev’s The Bridge of Asses reminded me of my first coding bridge. It was 1986. I’d completed class 6 and was in a summer coding camp at school. M Kothandaraman (“MK Sir”) was teaching us how to swap variables in BASIC on the BBC Micro. This code prints the first name in alphabetical order (“Alice”): 10 A = "Bob" 20 B = "Alice" 30 IF A > B THEN 40 TEMP = A 50 A = B 60 B = TEMP 70 END 80 PRINT A The homework was to print all details of the first alphabetical name: ...

Things I Learned - 31 Aug 2025

This week, I learned: ⭐ Habit tooling can expand habit-building capacity. I already use tools to support my habits. Habit stacking “sticks” new habits to old ones. By sticking new habits into existing tools, I can automate this. (For example, I extended my meeting record fish script with an echo reminding me to write the meeting goal, my role, practice kind candor, and measure effectiveness.) ⭐ The crux of Arthashastra’s advice on defeating an enemy is removing support: मित्राणि भेदयेत्, मित्रं च शत्रोः। Dis-unite friends, enemies from their allies. अमात्यान् द्रव्यैः, जनपदं भेदयेत्। Bribe their ministers, sow discord among subjects. बलं चोच्छिनत्ति, कोशं चोपशोषयेत्। Break the army, exhaust the treasury. ततोऽन्योन्यवैरिणं कुर्यात्। Then set them against each other as mutual foes. Consensus is dangerous in venture capital. “Because if everyone inside the firm sees the same thing, it probably means the market already does too. And when the market sees it, the upside is limited.” Guillermo Flor This CodeMonkeys paper suggests running a mixture of agents in parallel for multiple code + test tasks and auto-pick the best by running and LLM-rewriting tests. #ai-coding We think a new pricing model might emerge for outsourced knowledge work that leads to lower client cost & quality at higher margins. ChatGPT LLMs do the task; multiple LLMs cross-check. Three tiers: Auto-pass (no human), Light review, Full review. Each tier has a clear price and SLA. Using LLMs as validators is one of the safest ways of introducing LLMs into a process. If the human ignores it, no loss. If it spots new errors or the human gets new ideas, quality improves at low cost. I finally get why elders in my family prefers eating in a pure (rather than a mixed) vegetarian restaurant. When in Vietnam, I could pick dishes in pure vegetarian restaurants without worrying about whether they were meat or not, even when I didn’t understand what the dishes were about. That confidence to proceed without fear is a powerful enabler. There’s emerging evidence that jobs automated by (not augmented or unaffected by) AI have fewer entry-level jobs. Experienced workers are less affected. Compensation is affected less. Canaries in the Coal Mine CloudFlare AutoRAG lets you index any website and expose it as an API + Chatbot with a model of your choice. This is available on the free tier, too. The API follows NLWeb, Microsoft’s open standard for LLMs and MCPs to interact with websites in natural language. Cloudflare has an image transformation API that also acts as a CDN. Apart from basic transformations, it can auto detect and crop faces, remove backgrounds, and more. oklch seems the best color model supported by all modern browsers. We can use relative colors with it, making color palette design much easier: #darker-color { background-color: oklch(from var(--base-color) calc(l - 0.15) c h); } Malware embedded in the compromised nx build tool leveraged Claude/Gemini CLI to offload fingerprintable password-gathering code into prompts, making detection significantly harder for traditional security tools. semgrep Codex CLI has several updates VS Code plugin with remote container execution Drag & drop image support PR Docs Queued (editable) messages PR Web search via --search PR Esc-Esc to edit previous messages Docs Our team passed an image to an LLM for OCR (especially to identify formatting, e.g. bold, italics, etc.), then passed the output and the image to another LLM for improvement. Interestingly, the best LLM (Gemini 2.5 Pro, for this sample of 8 images) out-performed the two-stage workflow. Perhaps incorrect results confuse more than the correct results help? This needs more research. OpenAI now has a series of llms.txt URLs. Rust seems to catch errors better at compile-time than many typed languages like TypeScript. That makes it better for larger projects (or for AI coding). The unexpected productivity boost of Rust #ai-coding Image APIs that support hotlinking and searching (useful to support LLM-generated content, e.g. slides or presentations): Openverse: CC, scale, simple REST. Wikimedia Commons: CC, historic/diagram breadth. Pixabay: easy, free, broad, but license fuzzier. Pexels: beautiful but custom license. Unsplash: stylish but restrictive. OpenClipart: niche, useful for icons. ⭐ For mental tiredness, the impact of sleep > workload > mood/stress > environment (travel, light, air) > posture > food/drink. To rebound, nap > bright light > exercise > fresh air > water > posture/breathing. ChatGPT In my internal meetings, I tend to ask many questions (1 per 8 turns), but fewer open-ended ones (~40%) compared with others. I also praise once every 22 turns - among the lowest in our group. I could ask more open-ended questions and acknowledge good work. # When seeking advice, people sometimes think aloud, become repetitive, and introduce detail before clarifying intent. Kind candor helps. You can: State time boundaries. “We have 20 min. If we spend 5 min on your question, we’ll have 15 for solutions.” Clarify intent upfront. “Before we dive in: What can I help with?” Interrupt, summarize, clarify early. “Cooperative interruptions” are seen as supportive. E.g. “I get this: six accelerators, two done. Great! What can I help with? To accelerate?” rclone is the cleanest way to copy files from Google Drive. I ran rclone config to set it up with Google Drive via native app OAuth key. Then, rclone copy "gdrive:" transcripts/ --drive-shared-with-me --include "**Transcript*.docx" copied all transcripts including “Shared with me” files (not just drives). The --drive-shared-with-me enables this. What makes Claude Code so damn good has a detailed review of Claude Code’s system prompt and is a great for ideas on using LLMs for coding. #ai-coding With AI coding, task breakdown, context right-sizing, and automated testing are key levers. #ai-coding

Here’s my current answer when asked, “How do I use LLMs better?” Use the best models. O3 (via $20 ChatGPT), Gemini 2.5 Pro (free on Gemini app), or Claude 4 Opus (via $20 Claude). The older models are the default and far worse. Use audio. Speak & listen, don’t just type & read. It’s harder to skip and easier to stay in the present when listening. It’s also easier to ramble than to type. Write down what fails. Maintain that “impossibility list”. There is a jagged edge to AI. Retry every month, you can see how that edge shifts. Wait for better models. Many problems can be solved just by waiting a few months for a new model. You don’t need to find or build your own app. Give LLMs lots of context. It’s a huge enabler. Search, copy-pasteable files, past chats, connectors, APIs/tools, … Have LLMs write code. LLMs are bad at math. They’re good at code. Code hallucinates less. So you get creativity and reliability. Learn AI coding. 1. Build a game with ChatGPT/Claude/Gemini. 2. Create a tool useful to you. 3. Publish it on GitHub. APIs are cheaper than self hosting. Don’t bother running your own models. Datasets matter. Building custom models does not. You can always fine-tune a newer model if you have the datasets. Comic via https://tools.s-anand.net/picbook/ ...

The Surprising Power of LLMs: Jack-of-All-Trades

I asked ChatGPT to analyze our daily innovation-call transcripts. I used command-line tools to fetch the transcripts and convert them into text: # Copy the transcripts rclone copy "gdrive:" . --drive-shared-with-me --include "Innovation*Transcript*.docx" # Convert Word documents to Markdown for f in *.docx; do pandoc "$f" -f docx -t gfm+tex_math_dollars --wrap=none -o "${f%.docx}.md" done # Compress into a single file tar -cvzf transcripts.tgz *.md … and uploaded it to ChatGPT with this prompt: ...

If I turned female, this is what I’d look like. gpt-image-1: “Make this person female with minimal changes.” Hm…. maybe… just as an experiment…? LinkedIn

Measuring talking time with LLMs

I record my conversations these days, mainly for LLM use. I use them in 3 ways: Summarize what I learned and the next steps. Ideate as raw material for my Ideator tool: /blog/llms-as-idea-connection-machines/ Analyze my transcript statistics. For example, I learned that: When I’m interviewing, others ramble (speak long per turn), I am brief (less words/turn) and quiet (lower voice share). In one interview, I spoke ~30 words per turn. Others spoke ~120. My share was ~10%. When I’m advising or demo-ing, I ramble. I spoke ~120 words per turn in an advice call, and took ~75% of the talk-time. This pattern is independent of meeting length and group size. I used Codex CLI (command-line tool) for this, with the prompt: ...