September 2025

Vibe-Scraping: Write outcomes, not scrapers

There hasn’t been a box-office explosion like Dangal in the history of Bollywood. CPI inflation-adjusted to 2024, it is the only film in the ₹3,000 Cr club. 3 Idiots (2009) is the first member of the ₹1,000 Cr club (2024-inflation-adjusted). The hot streak was 2013-2017: each year, a film crossed that bar: Dhoom 3, PK, Bajrangi Bhaijaan, Dangal, Secret Superstar. Since then, we never saw such a release except in 2023 (Jawan, Pathan). ...

Things I Learned - 28 Sep 2025

This week, I learned: selectolax is a fast, easy-to-use, modern HTML5 parser with CSS selectors. A good replacement for lxml.html. The most effective way to convert a blob (e.g. file input) to a data URL on the browser seems to be via the FileReader API. const blobToDataURL = (blob) => new Promise((res, rej) => { const r = new FileReader(); r.onload = () => res(r.result); r.onerror = () => rej(r.error); r.readAsDataURL(f); }); Tool calls in OpenAI support files and images. OpenAI ⭐ “Task parity is not the same thing as job parity There is a lot of complexity as many different tasks are bundled into jobs, and many jobs contribute to processes inside an organization The jagged frontier of AI ability means doing tasks well doesn’t translate to doing jobs well.” Ethan Mollick Adding // @ts-check to a JavaScript file and documenting types via JSDoc might be the simplest way to migrate phase-wise from JS to Typescript. envsubst < file.txt replaces file.txt with the environment variable, e.g. $HOME is replaced by the HOME environment variable. Clean shell-level templating. GitHub Copilot CLI is out. npx -y @github/copilot Compost is the cheapest thing per ton that I can buy on Amazon India. I can buy 1 ton of compost for Rs 13,500. ChatGPT yt-dlp requires Deno from now on. #14404 In meetings, make cameras optional by default – and judge engagement by contributions, not video – because a 4-week field experiment found camera-on increased fatigue and reduced voice, especially for women and newcomers. Camera on early for trust building is useful. PubMed wrkflw is a quick and light way to test GitHub actions before publishing. It runs GitHub actions locally. GPT-5-Codex is available as an API and on LLM. Simon Willison ⭐ I’m habit engineering, i.e. discovering and stacking habits on to existing ones. For example: ChatGPT suggested increasing observability based on code reviews. I’m including it in my weekly codecast. ChatGPT suggested defining closures inmeetings. I’mn now discussing objectives at meeting starts and effectiveness at the end. Since Anaconda cannot be used for free by organizations with 200+ people, Straive’s received legal notices from Anaconda. Since laptops are under central IT administration, they went ahead and deleted all Anaconda instances. Installing miniconda for use with conda-forge requires admin access that most developers do not have, however. That leads to an interesting “No Python” situation. This is where uv becomes the knight in shining armor. Perceptron is SOTA LLM for object bounding boxes. Just 2B parameters. Gall’s “law” says that complex systems that work evolved from simple systems that worked. But a complex system designed from scratch won’t ever work. This holds in uncertain environments. But where formal theory or regulations exists, it doesn’t. ChatGPT uvx --with visidata vd gives you a command-line Excel editor to edit / convert CSV, Excel, JSON, SQLite, directories, etc. uvx markitdown https://example.com/ fetches example.com as Markdown. I learnt this when I told Codex it could use uvx markitdown to convert PDFs and it figured this part out by itself. The Dropbox connector for ChatGPT is the little flaky – at least on Android. It could not identify a file that was clearly there in Dropbox and I had to upload it manually. ChatGPT’s output is too dense for me. I added this to my custom instructions: “Write in simple language. Explain non-obvious terms intuitively.” yt-dlp has a --download-sections option that downloads specific YouTube time ranges. For example --download-sections "*00:01:00-00:03:00" downloads roughly (not exactly) from 1 min to 3 min. Note the * at the beginning. My Lenovo laptop’s touchpad started scrolling instead of moving when I moved my finger. Many things could have caused it, but the solution was to click (not tap) the top middle of the trackpad. ChatGPT The India Entrance Exam database is a dataset collating Indian entrance exams.

How to review trending GitHub repos on VS Code

Here’s how I track trending GitHub repos each week. I run a scheduled script that saves a clean TSV I can scan fast. It uses uvx gtrending to fetch weekly trending repos for: Rust: High-quality system tools. (Anything in Rust seems cool.) Go: Reliable CLI/infra tools. (Like Rust, most Go code seems good.) Python: Most AI/ML stuff TypeScript: Most modern JS codebases JavaScript: Most front-end utilities Shell: Productivity scripts I pipe results through jq to extract: ...

Vibe Shopping

I’ve started vibe shopping, i.e. using ChatGPT to shop for small, daily items and buying without verifying. For example: “A metal rack for the floor: at least 2 ft * 1 ft * 2 ft, small gaps, popular options on Amazon.in.” https://chatgpt.com/share/68d61d68-7040-800c-936b-354749539308 “An optical wired mouse that’s smaller than usual, 4*+, popular, Prime-eligible for Chennai by the weekend on Amazon.in.” https://chatgpt.com/share/68d61e0d-420c-800c-bc71-821b9f9296a9 The best use is when I don’t know the right terms. In this case, the terms were wire rack and mini mouse. ...

Tools in Data Science Sep 2025 edition is live: https://tds.s-anand.net/. Major update: a new AI-Coding section and fresh projects. I teach TDS at the Indian Institute of Technology, Madras as part of the BS in Data Science. Anyone can audit. The course is public. You can read the content and practice assessments. I fed the May 2025 term student feedback into The Sales Mind and asked: What are the top non-intuitive / surprising inferences? What are interesting observations? What are high impact actions? Full analysis: https://chatgpt.com/share/68cba081-afc0-800c-9da3-75222e84a499: summary, outliers, and action ideas. ...

The 10 sites I visit most often

Here are the 10 most frequent sites I use (based on Microsoft Edge’s home bar): ChatGPT. It replaced Google as my default knowledge source. I prefer it over Gemini, Claude, etc. because the app has good features (memory from past conversations, code interpreter, strong voice mode, remote MCP on web app, etc.) The OpenAI models have pros and cons, but the app features are ahead of competition. Gmail. It’s my work inbox. Interestingly, I check it more (and respond faster) than social channels (e.g. WhatsApp, Google Chat, LinkedIn). It also doubles up as my task queue. Prime Video. I mainly watch The Mentalist. Totally love Patrick Jane! Google AI Studio. Mostly for transcription. It’s better than Gemini on UI, ability to handle uploads, file-formats, etc. It’s also free (though the data is used for training.) My Talks page. I give 1-1.5 talks a week, mostly on AI/ML topics. I use Marp to render Markdown slides and publish it here. Google Chat. It’s Straive’s social channel. I can’t use it from my phone, so I log in only if I need to check if I missed something. LinkedIn. It’s where I post by default. I don’t use it for networking and only connect with people I’ve met and know well. YouTube. Mostly for movie clips over dinner. I occasionally watch educational content. Playground. LLM Foundry is Straive’s internal gateway to multiple model APIs (I built it). I use it to experiment with models, grab API keys, and demo LLMs to clients. Squoosh. I compress every image, every time. Mostly into WebP (hands-down the best format today), typically lossless with an 8-color palette, or lossy at ~0-10% quality for photos. That’s my current home row. It will change. But the reasons probably won’t: fast, simple, automatable, and practical (for me).

Voice coding is the new live coding

In Feb 2025 at PyConf Hyderabad, I tried a new slide format: command-line slideshows in bash. I’ve used this format in more talks since then: LLMs in the CLI, PyCon Singapore, Jun 2025 Agents in the CLI, Singapore Python User Group, Jul 2025 DuckDB is the new Pandas, PyCon India, Sep 2025 It’s my favorite format. I can demo code without breaking the presentation flow. It also draws interest. My setup was the top question in my PyCon talk. ...

Things I Learned - 21 Sep 2025

This week, I learned: When editing an image, ChatGPT’s non-thinking mode does a much better job of preserving the original image features than the thinking mode. When editing my photo, I found that the thinking mode creates images that looks quite different than me. A surprising effect of overthinking. ⭐ When evaluating model accuracy, compare with human accuracy rather than perfect accuracy. SMEs rarely agree among themselves, so it’s unlikely that they will agree with an LLM. Instead, measure how often the LLM agrees with the majority of SMEs and how often it disagrees with all SMEs. This gives a more realistic measure of accuracy. LLMs instead of Human Judges? and Judging LLM-as-a-Judge. ChatGPT I understand at least one mechanism of how costs are inflated in large organizations. Even people who want to keep costs low find that the process of tracking expenses, submitting receipts, answering questions around approval, adds transaction cost. So, rather than going for a $10 plus top up mechanism, I would rather go for and ask people to take a $500 top up. Better ask for more and waste than have to ask again. YouTube downloaders: yt-dlp for the CLI, Stacher for Windows/Mac/Linux, Cobalt for a web-based app. Ref VS Code a bunch of features I discovered: It can run a terminal in its own new window for over a year (via Ctrl+P > Terminal: Move Terminal into New Window). Now, Ctrl + Alt + Shift + ` does this directly. Terminal Intellisense shows completion suggestions in the UI. Very helpful. Ctrl+Space triggers the menu completion. ⭐ “We find that the per-step error rate itself rises as the task progresses”, i.e. once a conversation goes the wrong way, it’s really hard to correct it. The Illusion of Diminishing Returns Japonaise Cake is the name of the pastry that I had as a child and grew up longing for. I have spent several weeks searching for it in the roadside bakeries at Bangalore and Chennai but only one bakery seems to have it. systemd is the modern way to run scheduled jobs, instead of cron. It’s far more complex. But it can catch up on missed runs via a Persistent option. Working with systemd timers ⭐ Vice-chancellors of universities resist AI in education because (a) their faculty does not know AI and (b) AI is unreliable. But they are interested in (a) large-scale AI-evaluation and (b) AI-enabling entire campus. tldr.sh offers concise man pages, e.g. uvx tldr jq. cheat.sh offers detailed examples, e.g. curl cheat.sh/jq or curl cheat.sh/:help. ugrep is a fast drop-in replacement for grep. It supports fuzzy search with a customizable Levenshtein distance. Also ug -Q shows an interactive TUI searches like VS Code’s “Search in Files” feature. Very intuitive. Dagger lets you write CI/CD workflows in Python. I tried running it but after 7m of pulling large Docker containers, I gave up. Too heavy. dotslash lets you write scripts that downloads GitHub releases, caches, and runs them. Requires writing scripts. I prefer mise. ChatGPT has a quota for searches. I saw this phrase in the reasoning traces: “I’ll avoid overloading on citations since we only have a few calls left.” It doesn’t seem to be in ChatGPT’s system prompt from last month, so it’s either part of the tool response or a new prompt. Depending on the underlying chips that a model uses, the floating point multiplications may differ and model quality can vary. So Claude 4 Opus running on Anthropic’s GPUs can produce different results from when running on Google’s GPUs or Amazon’s GPUs.

AfterSlides: Write Slides After Talks

25 years ago, Mr. Krishnan (IAS) amused us with anecdotes of bureaucrats writing meeting minutes before the meeting. This week, I flipped that. I wrote slides after the talk. I call them AfterSlides. Why. I ran a couple of Ask-Me-Anything (AMA) sessions where the audience set the agenda. I learned their interests. They got answers. No slides prepared. How. I okayed recording with the organizers, recorded on my phone, transcribed with Gemini, and asked ChatGPT to generate the AfterSlides. ...

Turning Generic Gifts Into Joy with AI

In 2001, I received a campus interview invitation from BCG. It opened like this: Dear Anand, We’d like to invite you to an interview on … We were impressed by your … … and went on to share 2-3 phrases about what they liked about my CV. A dozen of us got similar letters – each personalized! That was cool. Two decades later, I still remember it. It showed care and competence – care enough to personalize for each candidate, competence to pull it off at scale across campuses. ...

Tomorrow, we’ll be vibe-analyzing data at a Hasgeek Fifth Elephant workshop. It’s a follow-up to my DataHack Summit talk “RIP Data Scientists”. I showed how it’s possible to automate many data science tasks. In this workshop, the audience will be doing that. Slides: https://sanand0.github.io/talks/2025-09-16-vibe-analysis/ (minimal because… well, it’s “vibe analysis”. We’ll code as we go.) Here are datasets I’ll suggest to the audience: India Census 2011: https://www.kaggle.com/datasets/danofer/india-census MovieLens movies: https://grouplens.org/datasets/movielens/32m/ IMDb movies: https://datasets.imdbws.com/ Occupational Employment and Wage Statistics (OEWS): https://www.bls.gov/oes/tables.htm Global AI Job Market & Salary Trends 2025: https://www.kaggle.com/datasets/bismasajjad/global-ai-job-market-and-salary-trends-2025 Flight Delay Dataset: https://www.kaggle.com/datasets/shubhamsingh42/flight-delay-dataset-2018-2024 London House Price Data: https://www.kaggle.com/datasets/jakewright/house-price-data Exchange Rates to USD: https://www.kaggle.com/datasets/robikscube/exhange-rates-to-usd-from-imforg-updated-daily Thailand Road Accidents (2019-202): https://www.kaggle.com/datasets/thaweewatboy/thailand-road-accident-2019-2022 … but if you’d like stories from any interesting recent datasets (10K - 10M rows, easy-to-download), please suggest in the comments. 🙏 ...

GPT-5 (Codex) follows instructions exactly as given. Usually a good thing, but sometimes, it this is what happens. AGENTS.md: ALWAYS WRITE TESTS before coding. Codex: Let me begin with the tests. (Spends 5 minutes writing tests.) Anand: Stop! This is a proof of concept. We don’t need tests! AGENTS.md: Write tests before coding. Drop tests for proof-of-concepts. Codex: (Proceeds to delete all existing tests.) Anand: STOP! We need those tests! ...

Things I Learned - 14 Sep 2025

This week, I learned: Though I’m connected on LinkedIn with people I can’t remember (weak ties), pruning them shrinks serendipity. Weak ties, despite noise, are disproportionately valuable for opportunities, e.g. intros, jobs, and pruning reduces future upside. Science Claude has a Python + Node code interpreter that can access GitHub, PyPi, npm and Google. Simon Willison SuperTinyIcons has very small icons for many websites and is available via CDN. Sample: http://cdn.jsdelivr.net/npm/super-tiny-icons/images/svg/github.svg Clock bench is an LLM benchmark based on how well LLMs tell the time from an analog clock. Humans (89%) are much better than the best model (Gemini 2.5 Pro - 13%). Veo 3 is now available via API. Veo 3 fast is 15s/second. Google ChatGPT has full support for MCPs via Developer mode in Plus and Pro accounts, via “Developer mode”. OpenAI In Pyodide, you can use from js import document and then document.querySelector to manipulate the DOM directly from Python. from pyodide.http import pyfetch lets you use fetch. gtrending is a Python package that fetches trending GitHub repos, users, etc. uvx gtrending repos --language rust --since weekly fetches trending Rust repos of the week. astgrep lets you search in code (across languages) using AST patterns. Like semgrep but more about code search than security. uvx --from ast-grep-cli ast-grep runs from the CLI. Useful for code rewriting, fast linting, code search. hurl is a CLI config-based HTTP automation tool. Useful for tests, bulk (templatized) HTTP requests, etc. rustdesk is an open-source remote desktop software. TeamViewer alternative. Self-hostable. prek is a much faster version of pre-commit - a cross-language pre-commit hook manager. ⭐ mise is a tool version manager. Combines nvm/fnm, pipx, etc. Supports running several tools with a smooth installation. The npm phishing email was a great one. It compromised chalk which is used in most npm packages. This may be one of the best supply chain attacks in recent times and makes me want to pin versions instead of using npx -y. Also makes me glad that I’m sponsoring @isaacs and @sindresorhus - two critical open source maintainers. “I pay for YouTube Premium. For my money, it’s the best bang-for-the-buck subscription service on the market”. - Gavin Andregg LLMs are non deterministic because GPUs add floating point numbers concurrently and FP addition is non associative - order matters. Thinking Machines Claude.ai can natively work with Excel, PPTX, DOCX, and PDF files now. With embeddings, atomic labels + hierarchy beat instruction-heavy prompts. Prefer short, concrete sub-labels (e.g., “promotion,” “job security,” “flexibility”) that roll up to a parent “career” rather than a composite instruction like “Total Rewards and Career Growth”. Embedding similarity is not smart enough to figure this out. Today, RPA is cheaper than LLMs in some areas. But it’s a moving target. LLM costs are fall fast: 70–90% declines across major providers in 1.5 years. Therefore, waiting has option value. But classic IT compares static quotes, not declining curves, and hence is likely to under-procure LLM solutions. ⭐ The biggest near-term ROI for LLMs in data science is like ‘boring’ data work: PII tagging, data dictionaries, ER/joins, SDTM mapping, etc.. People expect flashy GenAI, but LLMs can bootstrap schema matching and data-cleaning, speeding engineer verification, which is more useful at scale. You can create an infinite leaflet map with nano banana. Codex CLI with high reasoning effort seems far more comprehensive than Codex online. I asked both to identify the system requirements (URLs to access, software to install, ports to open) for my Tools in Data Science course. Codex CLI got it right one shot (after 10 minutes of thinking). Codex online missed several items even after 4 attempts. The Reod on Elantris might have been triggered by Jaddeth who might be an Autonomy avatar. ChatGPT Output tokens dominate latency. Decoding is sequential (one token depends on all prior tokens), so long completions are the main throttle. Shrinking returned text (e.g., send spans/tags instead of echoing paragraphs) yields a far bigger win on latency than shrinking inputs.

I use LLMs to create photos and comics. But they can generate any kind of illustration. So why limit ourselves? My problem is imagination: I know little about art. So, I asked ChatGPT, Claude, and DeepSeek: Suggest 10 unusual illustration styles that are not popular in social media yet but are visually striking. I would like to have an LLM create images in that style. For each of those, show me an (and link to) an online image in that style. ...

My Tools in Data Science course uses LLMs for assessments. We use LLMs to Suggest project ideas (I pick), e.g. https://chatgpt.com/share/6741d870-73f4-800c-a741-af127d20eec7 Draft the project brief (we edit), e.g. https://docs.google.com/document/d/1VgtVtypnVyPWiXied5q0_CcAt3zufOdFwIhvDDCmPXk/edit Propose scoring rubrics (we tweak), e.g. https://chatgpt.com/share/68b8eef6-60ec-800c-8b10-cfff1a571590 Score code against the rubric (we test), e.g. https://github.com/sanand0/tds-evals/blob/5cfabf09c21c2884623e0774eae9a01db212c76a/llm-browser-agent/process_submissions.py Analyze the results (we refine), e.g. https://chatgpt.com/share/68b8f962-16a4-800c-84ff-fb9e3f0c779a This changed our assessments process. It’s easier and better. Earlier, TAs took 2 weeks to evaluate 500 code submissions. In the example above, it took 2 hours. Quality held up: LLMs match my judgement as closely as TAs do but run fast and at scale. ...

Slides for my DataHack Summit talk (controversially) titled RIP Data Scientists are at https://sanand0.github.io/talks/2025-08-21-rip-data-scientists/ Summary: as data scientists we explore, clean, model, explain, deploy, and anonymize datasets. I live-vibe-coded each step with DGCA data in 35 minutes using ChatGPT. Of course, it’s the tasks that are dying, not the role. Data scientists will leverage AI, differentiate on other skills, and move on. But the highlight was an audience comment: “I’m no data scientist. I’m a domain person. I’ll tell you all this: If you don’t follow these practices, you won’t have a job with me!” ...

Things I Learned - 07 Sep 2025

This week, I learned: A quick way to get the docs for an npm package is npm view package-name readme. For PyPi, it’s curl -s https://pypi.org/pypi/package-name/json | jq -r .info.description Searching embeddings of text summaries of images improves vision search a lot. Jason Liu LLM vision capabilities are far from enough to click accurately. The AI Digest GLM supports the Anthropic API. So it’s possible to use Claude Code with GLM 4.5. z.ai gitingest has a CLI. uvx gitingest https://github.com/owner/repo fetches the code in the Git repo suitable for passing to an LLM. Claude’s API has access to a code execution tool via the code-execution-2025-08-25 beta header. It runs Python 3.11 with 1GB RAM and 5GB disk space, with Internet disabled. The containers persist for 30 days and can access uploaded files. Anthropic You can use the <script> tag in XML to render RSS, as an alternative to XSLT. Jake Archibald browser-fs-access is a ponyfill for the File System Access API and should be the go-to approach for reading and saving files via the browser. ⭐ To run a Python project directly from GitHub, use uvx --from "git+https://github.com/owner/repo.git@branch" script-name Github1s is a cool tool. Replace github.com with github1s.com to get a VS Code page that opens that repo. It’s fast and very useful to browser repos. For example, https://github1s.com/sanand0/tools-in-data-science-public is my TDS course repo. The /init command in Claude Code and Codex CLI aren’t up to the mark. I believe a good README.md provides better specs for existing repos. There is a window of opportunity to craft a good prompt to generate this from repos. #ai-coding Since LLMs can code, I’d love to see useful CI/CD pipelines where the LLM creates / edits code on the fly. LLMOps might take on a new angle - it’s not just Ops on LLM apps. It’s LLMs as part of DevOps. insertAdjacentHTML is a great API but suffers from XSS vulnerabilities. The TrustedHTML API is an emerging standard to create sanitized HTML strings. Notes from Anthropic’s How we built our multi-agent research system Multi-agent systems are like organizations that can do more than a single human. Multi-agent systems conserve the context window. The top 3 drivers of performance variance: spending more tokens, more tool calls, better models You need to teach (prompt) the orchestrator how to delegate to sub-agents How to avoid task duplication among agents How many sub-agents to spin up for different kinds of tasks Which tools to use for what Provide sub-agents objective, output format, tools/sources, clear task boundaries ⭐ Self-improving agents, e.g. prompt optimizers or tool-testing agents that run and rewrite tool descriptions, are powerful Since agents are stateful, resuming from failure is important. Agent prompts are public Claude models support interleaved thinking that lets them think between tool calls via an anthropic-beta: interleaved-thinking-2025-05-14 header. OpenAI models natively think between tool calls, preserving thinking across calls with the Reasoning API. Gemini lets you control the amount of thinking between tool calls via the thinkingBudget parameter. Anthropic auto-extracts persona vectors or traits by generating LLM responses to the same question with system prompt A (“You are evil”) and B (“You are helpful”) and subtracting the average activations. This helps monitor personality drifts during training, deployment, and even in training data. From My experience creating software with LLM coding agents - Part 2 (Tips) #ai-coding Use standards. Or, write your standards in README.md and tell AGENTS.md / CLAUDE.md to read it. Use a standard file structure. Or in README.md, list what each file is for. Helps agents pick the right file for context. Use a standard build/lint/test setup (e.g. package.json scripts). Or Localize context, i.e. add context in files that use them. E.g. add comments in test files on how to execute them. Keep files modular so agents can read less code and conserver context. Write a developer’s guide. Use with /init in Claude Code / Codex / … or have an LLM generate a developer guide. Edit manually. Agents don’t write great specs. Document the design. Write DETAILED specs to reduce deviations. Share goal while specifying tasks. Helps agents fix related stuff. Use deep reasoning mode, e.g. “think harder” or “ultrathink” in Claude Code, or -c model_reasoning_effort=high in Codex. ⭐ Run parallel agents in different windows and share agent feedback with each other. E.g. Server/API coding in one window. Client coding in another. Plan/ask in one window. Execute in another. Add debug logs to help agents spot errors. Start/stop of long/complex operations, state changes, external interfaces. Include full objects in logs. Prioritize diffs. Trim long contents. ⭐ Give access to debugger, e.g. Chrome remote debugging at localhost:9222 Agents write poor tests. So: Manually add important ones. ⭐ When you find a bug, ask the agent why the tests missed it and have it add. Review and remove useless ones. Ensure agent passes test cases. Tell them not to disable / skip failed tests. Have agents create a new branch per feature and auto-commit. Merge when successful. Feel free to provide a TODO list or update it on the fly. Interrupt with Esc if the agent’s thinking is off-track. When agents struggle, write tools to help them, e.g. JSON splicing, Excel edits, etc. Agents bloat code and features. Explicitly refactor and trim. From A Guide to Gen AI / LLM Vibecoding for Expert Programmers #ai-coding Use vibe coding for stuff you don’t need to maintain. Use vibe coding for stuff you know well enough to review quickly. Use vibe coding for independent tasks where you’re not fussed which ones fail. Vibe coding turns everyone into a team lead. That needs skills: planning, allocation, design, review, feedback, … ⭐ Empathy enables vibe-coding. Empaths allocate work by ability, review regularly, and provide detailed specs and feedback. Have LLMs plan and allocate tasks. “Read this repo. Suggest improvements.” (Review.) “Add these as issues.” “Add the top 3 Sentry log errors as issues.” “Find the easiest issue and solve it with a PR.” Use GitHub issues extensively for planning. ⭐ Create a separate GitHub account for your agent! Let it push. Assign it issues. Treat it like an intern. Ensure agent passes test cases and run till the do, or report the core difficulty. Throw away rubbish code and start again. Issues unsolved in 2-3 tries are too hard for agents or are poorly spec-ed. The context7 and Sequential Thinking MCPs are useful. The O*NET database has a list of tasks/activities, skills, titles, … for each job, at least in the US. It has been updated every few months since 2003. It’s an excellent source to analyze things like the impact of AI across jobs. Anthropic used it to map Claude.ai conversations with educator tasks to identify how educators are using AI. How educators use Claude (apart from learning) is mainly driven by automation of tedious tasks, ideation, and personalization for each student. Curriculum development: Develop games, interactive tools, MCQs, simulations, content Academic research: Bibliographies, statistical modeling, revisions from feedback. Assessments: Student feedback, scoring, summarization. Administration: recommendation letters, meeting agendas, admin tools. OpenAI used feedback from ~1000 annotators to update their model spec. Learnings: Request targeted feedback. Annotators reviewed responses pre-selected for subjectivity against a pre-selected rubric () More examples. Most improvements add examples of good and bad responses. Use detailed prompts. Newer models do well with HUGE system prompts. That’s how we frame better questions. The Great Refactor is refactoring critical open-source C code to Rust using Claude Code, since 70% of vulnerabilities are memory related and Rust is memory-safe. No repo/docs yet. #ai-coding