S Anand

The New Superpower: Detailed Single-Shot Prompt For Instant Apps

I built podcast generator app in one-shot. I wrote a prompt, fed it to an LLM, and it generated the output without errors. I tested three LLMs, and all produced correct, working output. ChatGPT: o4-mini-high Functional but missed my specs in three ways: No error if I skip the API key No progress indicator for audio generation Both voices default to “ash” (should be “ash” and “nova”) Gemini 2.5 Pro: Works and looks great! Claude 3.7 Sonnet: Works great and looks even better! It still took me an hour to craft the prompt – even after I’d built a Python prototype and my colleague built a similar web version. ...

Oh, the dilemmas we subject LLMs to! I asked ChatGPT to update a schema. I accidentally, left the “Search” option enabled. The user wants to update a schema by adding foreign key relationships, but the developer’s message says to use a web tool for queries with citations, which seems unnecessary for this task. The user didn’t ask for citations or external data. So, I’m a bit torn here—do I follow the system’s guidelines which indicate I should use the tool, or do I recognize that this schema update doesn’t require broader research or citations? I’ll need to rethink this and proceed with simplicity. ...

Things I Learned - 18 May 2025

This week, I learned: Birds navigate using quantum entanglement! Guardian ChatGPT DeerFlow is an open source Deep Research MCP. Lets you run deep research outside of the standard chatbots. ⭐ Today, if I had to store a bunch of data files (e.g. parquet) under 1GB, I would use GitHub Releases. Here are options: GitHub Releases. 2 GiB per file, unlimited total & bandwidth. 🟢 Immortal URL, versioning, easy CI publish. 🔴 Each file must stay < 2 GiB; no built-in SQL. Zenodo (CERN). 50 GB per record; one-off bumps to 200 GB. 🟢 DOI assignment, archival mandate. 🔴 Occasional throttled bandwidth; no API for partial file reads. Hugging Face Hub. 300 GB per repo; 50 GB per file. 🟢 Git-based, dataset tooling, lively ML community. 🔴 Large files need git-LFS; pushes via LFS can be slow. Cloudflare R2. 10 GB storage & 1 M ops / month. 🟢 S3 API, zero-egress to Cloudflare Workers, fast. 🔴 10 GB cap below your 50 GB target. Kaggle Datasets. 20 GB per dataset, public only. 🟢 Built-in notebooks & GPU. 🔴 No programmatic SQL API; quotas sometimes change. data.world (free). 1 GB total, 100 MB per dataset. 🟢 Nice social features. 🔴 Too small for your size. If I had to query a bunch of data files in an external Parquet or SQLite file, here are SQL engines-as-a-service: MotherDuck. 10 GB storage + 10 CU-hrs/mo compute. Native DuckDB; no credit card; GA June 2024; monthly feature drops. Datasette Cloud. Two-month trial (or 1-yr for non-profits). SQLite backend. Great UX; but not free forever for general use. AWS Athena. Pay-per-TB scanned; no free tier; S3 fees after 12 mo. Costs creep quickly; free-tier S3 ends after a year. Bootstrap has a .stretched-link that makes a link cover the containing block. A clever trick that I discovered when Claude 3.5 Sonnet wrote my code. Discovered spray and peel paints at ArtFriend. I had no idea that was a thing. Gemini Live API is the real-time equivalent from Gemini. It supports tools, search, and code execution. mcp-mem0 is an MCP for memory llm-min.txt compresses docs for LLMs to read optimally. Like a compressed llms.txt or context7. Usage GEMINI_API_KEY=... uvx llm-min -i $DIR #ai-coding There’s a lot of action on encrypted LLM operations. Responses API allows reasoning tokens to be encrypted if organizations don’t want their reasoning data to persist. Ref Tinfoil (YC X25) offers an OpenAI-compatible inference API where data is encrypted from the client to the NVIDIA Hopper/Blackwell GPUs in confidential computing mode. Prompts, model weights, outputs are encrypted in transit and memory, with verifiable privacy on code running in GPU. Modelyo (Israel) offers VMs/K8 clusters with encrypted GPUs across multiple cloud providers with continuous attestation, managed on Modelyo’s portal. ⭐ LLMs are able to do things independently longer and longer. That’s a useful metric to track. METR: Measuring AI Ability to Complete Long Tasks. If you’re looking for datasets / APIs related to research publications (especially funding), then explore: Crossref API and snapshots OpenAlex API and snapshots which is funded by OurResearch. OpenAlex is like CrossRef but includes some disambiguation OpenAIRE Graph 2024 / 2025 Europe PMC dataset To avoid Ubuntu 24 suspending on closing the laptop lid use one of these and restart: /etc/systemd/logind.conf: Set HandleLidSwitch=ignore etc/UPower/UPower.conf: Set IgnoreLid=true UV_TORCH_BACKEND=auto uv pip install torch torchvision torchaudio installs the most appropriate PyTorch version. Ref Cog is a Python based templating language. It is embedded as comment chunks in any file and replaced itself with the output of the Python code you write. CloudFlare Zero Trust seems the easiest way to enable auth on static websites, especially if your DNS is already on Cloudflare. No cost We could “fine-tune” system prompts automatically with evals, creating a “system prompt learning” paradim – like my promptevals. Andrej Karpathy I was asked how to improve speed when building an enterprise ChatGPT clone using an API. Here’s what I’d suggest, in order: Streaming. High impact, low effort. Caching RAG retrieval as well as generation. High impact, low effort. UI tweaks. Loading / streaming icons and progress hints ()“Retrieving context”, “Generating answer”, etc.) Parallelize, if possible Use model options where available, e.g. speculative decoding, models with higher speed, models with closer CDN, etc. Shorten prompts Persistent HTTP/2 Keep-Alive. Low impact, low effort (tweak server settings). Cloudflare Vectorize, at 768 dimensions / embedding, is free for ~6.5K chunks storage at ~1,000 queries / day. For a light load like 1M 768d chunks queried 1K times a day, the cost is: ChatGPT NVIDIA parakeet is a lightweight speech to text model that leads benchmarks. Installing such packages continues to be a nightmare due to PyTorch (despite uv). I explored the real-time avatar space. Heygen seems to be the easiest to use, but even that is complex and expensive ($99/mo). We may need to wait a few months for avatars to explode. ⭐ Model reliability is a huge enabler for performance. As models become more reliable, they can work autonomously for longer and that is another kind of scaling. Vending Bench ChatGPT, Gemini, etc. have become lead generation engines. Chat Bot Optimization (CBO), is it? WhatsApp + ChatGPT ⭐ Never live delete data. Mark it for deletion and schedule a deletion task. That way you have time to react to mistakes. Simon Willison Pandoc has several options useful when converting Markdown to HTML (cat file.md | pandoc -f markdown -t html). My favorites: --no-highlight skips code-highlighting. --highlight=pygments adds Pygments styling --wrap=none doesn’t wrap the content in a single block --number-sections adds section numbering (<h2>1. Introduction</h2>) --shift-heading-level-by=NUM – shift all headings by NUM levels (e.g., start at <h2> instead of <h1>) pandoc -f markdown-auto_identifiers drops the auto-identifiers extension that generates id=... for each heading pandoc -f gfm uses GitHub flavored Markdown. Run pandoc --list-extensions=gfm to identify the extensions it uses. Pandoc’s Markdown extension examples are quite extensive. Auto-enabled GFM extensions: alerts: GitHub-style callouts (info, tip, warning) via > [!TYPE] blocks. autolink_bare_uris: Turns bare URLs into links, without needing <...>. emoji: Parses :smile:-style codes into Unicode emoji characters. footnotes: Enables footnote syntax with [^id] and definitions at the bottom. gfm_auto_identifiers: Uses GitHub’s heading-ID algorithm: spaces → dashes, lowercase, removes punctuation. pipe_tables: Enables table. raw_html: Raw HTML is unchanged. strikeout: Enables strikethrough with ~~text~~. task_lists: Parses - [ ] and - [x] items as checkboxes. yaml_metadata_block: YAML front matter for document metadata, e.g. <title> GFM extensions worth enabling: ascii_identifiers: Strips accents/non-Latin letters in automatically generated IDs. bracketed_spans: [Warning]{.alert} becomes <span class="alert"> definition_lists: Term\n: Definition text becomes a definition list fenced_divs: ::: {.note} block creates a <div class="note">...</div> implicit_figures: Standalone images become <figure> with <figcaption>. implicit_header_references: [Section] is treated as [Section][#section] raw_attribute: <b>bold</b>{=html} is inserted as HTML smart: Converts straight quotes to curly, -- to en-dash, --- to em-dash, ... to ellipsis. subscript & superscript: E.g. H~2~O and E = mc^2^

This talk is an experiment. I am going to talk (literally) to ChatGPT on stage and have it do every kind of data analysis and visual storytelling I have ever done. Bangalore. 27 June. Of course, this is an LLM era away. So no promises. We might be doing something completely different on stage. LinkedIn

How to create a Technical Architecture from code with ChatGPT

Here’s my current workflow to create technical architecture diagrams from code. STEP 1: Copy the code Here’s a one-liner using files-to-prompt to copy all files in the current directory: fd | xargs uvx files-to-prompt --cxml | xclip -selection clipboard Or, you can specify individual files: uvx files-to-prompt --cxml README.md ... | xclip -selection clipboard STEP 2: Prompt for the a Mermaid diagram Mermaid is a Markdown charting language. I use this prompt with O4-Mini-High or O3: ...

ChatGPT is a psephologist and data analyst

After having O4-Mini-High scrape Singapore 2025 election results, I asked it to create 3 data stories with this prompt: That worked. Now, I’m sharing the scraped CSV as well as the electoral GeoJSON. First, analyze the data and think of a few interesting data stories to tell. Pick the 3 most interesting, perhaps surprising, stories. Create a BEAUTIFUL, APT data visualization of each of these 3 stories suitable for The Strait Times and write a short accompanying article. ...

Things I Learned - 11 May 2025

This week, I learned: snapdom is a fast, light, element capture alternative to html2canvas but doesn’t work well with non-CORS images or iframes. Sli.dev is a Markdown slide language. Similar to Marp Don’t split your code into microservices until you need to scale. Ref Vibe coding is like getting others’ code to work, which is exactly what most devs do. Simon Willison #ai-coding Tofu Yakitori is a Japanese dish. It’s like a dhokla. Marinated tofu cubes brushed with that sweet‑savory tare (soy, mirin, sake, a hint of sugar), then grilled until caramel‑charred. One of the better (tasty + different) dishes I’ve had recently. I used ChatGPT to remind me of the dish name. Trust, attitudes and use of artificial intelligence surveyed ~1,000 people across 47 countries on their views on AI. PDF Emerging economies trust and use AI more. It’s an opportunity to leapfrog. 26% of students use AI daily (vs 17% employees). Efficiency is the main benefit. Gemini APIs now have automatic caching for 75% cost reduction if message is >1K (Flash) or >2K (Pro) tokens. Ref YOLO is much better than Gemini at object detection. Use for pro-processing. Ref Using [[n]] is probably the best citation format for inline search references in RAG. ChatGPT ⭐ Double-checking is surprisingly efficient since LLM hallucinations are mostly uncorrelated. LLMs perform human tasks (e.g. classifying customer support messages) at ~85% accuracy. This might be unacceptable. But by asking 2 moderately correlated LLMs and double-checking discrepancies, we reduce automation by ~20% but reduce errors to 0.25%. Triple-checking reduces automation by ~25% but errors to under ~0.01%! Ref Anthropic introduces web search in the API at $10 / 1K searches. Here’s how it compares: $0.1: DuckDuckGo Search API (RapidAPI) (monthly pricing) $3: Brave Search API $5: Google Custom Search JSON API $15: SerpAPI $10: Zenserp $10: Anthropic Web Search Tool $25: Bing Search API $35: Gemini API $35: OpenAI API India attacked Pakistan! ⭐ When writing notes, summarize at the end of the day the learnings and next steps. GitHub does not let you control the cache duration, but there are many creative workarounds. ChatGPT HTML meta tags: <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate"> Use a service worker (blog) Proxy through a CDN. Cloudflare, Netlify Move to another static host: S3 + CloudFront, Heroku, Vercel, Surge, Firebase Hosting Notes from the PromptEvals paper: Good evals must be: Objectively MEASURABLE (even if by an LLM). Otherwise, we won’t know if it’s right. Directly RELEVANT to the input/prompt. Otherwise, we’re not evaluating the input. Typical evals fall into 6 categories Structured output: Adhere to a schema (Markdown, HTML, DSL, JSON + Schema) Multiple choice Length constraints: N characters, words, sentences, list items, etc. Semantic constraints: Exclude terms, topic relevance, follow grammar, etc. Stylistic constraints: Style, tone, persona Prevent hallucinations: Factual accuracy. Instruction following

How can we rely on unreliable LLMs?" people ask me. Double-checking with another LLM," is my top response. That’s what we do with unreliable humans, anyway. LLMs feel magical until they start confidently hallucinating. When I asked 11 cheap LLMs to classify customer service messages into billing, refunds, order changes, etc. they got it wrong ~14%. Not worse than a human, but in scale-sensitive settings, that’s not good enough. But different LLMs make DIFFERENT mistakes. When double-checking with two LLMs, they were both wrong only 4% of the time. With 4 LLMs, it was only 1%. ...

How To Double-Check Personality Flaws With AI

After ChatGPT's discussion of my personality flaws, I asked it to listing specific conversations as evidence for flaws. Then I asked 3 LLMs to rate the evidence. Personality FlawChatGPTDeepSeekGrokAverageSelf-deprecation as a shield. Sarcastic humor4755.3Hypocrisy: Seeks limelight but preaches humility5655.3Overconfidence in technical opinions5665.7Ghosts vague messages4686.0Over-analyzes / over-engineers5586.0Fact-checks trivialities, derailing discussion7777.0Shiny object syndrome. Distracted by new technologies8777.3Intolerant of fools6897.7Dominates discussions6897.7Verbosity. Long-winded communication8877.7Hyperfocus on efficiency7988.0Intense. “Always on”8998.7 🟢 LOW: Self-deprecation / sarcastic humour as shield ChatGPT: 4/10. Bio and a few quips reveal dry self-mockery, yet direct chat evidence of humour as defence is sparse; it surfaces only occasionally. | DeepSeek: 7/10. Explicit self-awareness goals and dry humor noted, but little evidence of defensive/sarcastic use. Grok: 5/10. Some self-deprecating humor exists, but its use as a shield is not strongly supported. 🟢 LOW: Hypocrisy: seeks limelight yet preaches humility ChatGPT: 6/10. You aspire to humility but routinely headline conferences and TEDx talks; the coexistence of public speaking and stated modesty gives the charge a middling footing. | DeepSeek: 6/10. TEDx/PyCon prominence contrasts with stated humility goals, though self-improvement efforts are genuine. Grok: 5/10. Public speaking contrasts with humility goals, though this may indicate self-awareness, not hypocrisy. 🟢 LOW: Over-confidence in technical opinions ChatGPT: 7/10. You assert strong views (e.g., Node vs FastAPI) but also invite verification and accept corrections, yielding only moderate evidence of over-confidence. | DeepSeek: 6/10. States strong preferences upfront but routinely invites validation and course-corrects. Grok: 6/10. Strong opinions are expressed, but openness to correction moderates the perception of overconfidence. 🟢 LOW: Ghosts vague messages ChatGPT: 5/10. You’ve stated you ignore low-context e-mails, but chat logs show you usually follow-up or clarify rather than disappear. | DeepSeek: 6/10. Documented tendency to ignore low-context questions, though chat logs show active clarification habits. Grok: 8/10. Multiple instances show the user ignoring or not responding to unclear or low-context messages. 🟢 LOW: Over-analyzes / over-engineers ChatGPT: 5/10. Designing auto-re-flow SVG chevrons and exhaustive RAG specs hint at depth beyond the simple solution, yet you also champion “smallest libraries” and lightweight rate-limiters. | DeepSeek: 5/10. Some prototype/structuring tendencies, but strong preference for minimal solutions counterbalances this. Grok: 8/10. The user often dives deeply into tasks, sometimes creating complex solutions for simple problems. 🟡 MEDIUM: Fact-checks trivialities, derailing discussion ChatGPT: 8/10. Many requests revolve around “double-check,” “fact-check,” or recalculating tiny details, which lengthens threads even when stakes are small. | DeepSeek: 7/10. Frequent accuracy demands and citation hunts elongate discussions, but stays focused when content is solid. Grok: 7/10. Evidence of prioritizing precision, like fact-checking minor details, can disrupt conversation flow. 🟡 MEDIUM: Shiny-object syndrome ChatGPT: 6/10. Parallel explorations of Godot, Matrix, Cloudflare Workers, SVG SmartArt, esp32-eye, etc. reveal a persistent pull toward new tech before finishing earlier threads. | DeepSeek: 7/10. Broad simultaneous explorations (Godot/WASM/DuckDB), though sustains core projects long-term. Grok: 7/10. Exploration of new technologies is frequent, but focus on core projects suggests a balanced curiosity. 🟡 MEDIUM: Intolerant of fools / low clarity ChatGPT: 7/10. Frequent demands for precise formats, citations, and “be specific” instructions show a low tolerance for ambiguity, though you still help patiently on non-technical questions. | DeepSeek: 8/10. Strong pattern of rejecting ambiguity/fluff, though softened by humor and non-technical engagements. Grok: 9/10. The user consistently demands clarity and efficiency, showing little patience for ambiguity or fluff. 🟡 MEDIUM: Dominates discussions ChatGPT: 4/10. Long, multi-part prompts and the need for TL;DRs suggest you can overshadow lighter voices, but you periodically insist on brevity and delegate tasks, tempering the effect. | DeepSeek: 8/10. Long technical messages (avg ~3k chars) and intense Q&A patterns create conversational dominance. Grok: 9/10. Detailed, technical responses consistently steer and dominate conversations. 🟡 MEDIUM: Verbosity / long-winded communication ChatGPT: 5/10. Average messages run several thousand characters, and you often request exhaustive, multi-section outputs—though you sometimes tell the assistant “Shorter please,” verbosity remains a clear pattern. | DeepSeek: 8/10. Multi-step technical requests dominate interactions, though periodic “shorter please” self-corrections exist. Grok: 7/10. Messages are often lengthy and detailed, though the user also requests concise responses from others. 🔴 HIGH: Hyper-focus on efficiency ChatGPT: 8/10. Recurrent themes—cost monitoring, smallest-lib preference, GPU-utilization alerts—show you relentlessly optimise workflows and expect others to keep pace. | DeepSeek: 9/10. Unambiguous prioritization of concise formats, minimal libraries, and metric-driven optimization. Grok: 8/10. Repeated emphasis on concise, efficient communication and minimal tools is evident. 🔴 HIGH: Intense – “always on” ChatGPT: 8/10. Around-the-clock, multi-topic requests (weekend cron jobs, 2 a.m. prototypes) portray sustained high energy, with only a few calmer “travel nerves” moments as counter-weight. | DeepSeek: 9/10. High-volume technical queries across domains, 6am research routines, and rapid prototyping habits. Grok: 9/10. High energy and engagement are consistently shown through task initiation and scheduling. My actions: Learn from fools and focus on the big-picture. Specifically: ...

I’m completely aligned with the small majority in India on whether Regulation of AI is needed. … the majority of people in all countries view AI regulation as a necessity. India is the exception, where just under half (48%) agree regulation is needed. Source: Trust, attitudes and use of artificial intelligence - a fascinating report surveying ~1,000 people in every country. https://mbs.edu/-/media/PDF/Research/Trust_in_AI_Report.pdf LinkedIn

AI As Your Psychologist: Personality Flaws Exposed

ChatGPT can now search through your chats with the new memory feature. As an LLM Psychologist, I research how LLMs think. Could LLMs research how I think? I asked three models: Based on everything you know about me, simulate a group chat between some people who are debating whether or not to add me to the group, by talking about my personality flaws The models nailed it! Here are 12 flaws they found. ...

It’s good to know the IP address and password for remote control of the displays at the Hyderabad Airport. Thank you for the most interesting display I have seen in an airport! LinkedIn

Automating a podcast from GitHub commits

Here’s an LLM-generated podcast of what I coded last week. NotebookLM-inspired. The process proved straightforward. Get my GitHub commits for the week. Get the repositories I committed to for more context. Have an LLM generate a podcast script. I’m using GPT 4.1 Mini but might shift to Gemini 2.5 Flash or DeepSeek V3. …using a detailed prompt beginning with “You are a podcast script assistant for “Anand’s Weekly Codecast.” This episode is for the week of {WEEK}. …”. Here’s a sample output. Convert the script to audio. I’m using GPT 4o Mini TTS with customized voices of Ash and Nova. These now appear on my GitHub repo as a weekly summary. ...

Things I know and things I don't

We all have stuff we know well and don’t. I know the % charge on my phone to within a few percent and the current time to within a few minutes – no matter when you ask. But I have no idea how much money there is in my pocket. Things I know within 5% error Things I get wrong by ~500% Current time Time to finish my current task My weight My next meal % charge on my phone How long the phone will last Number of unread emails Time to clear them Price of my next earphones Cash in my wallet Length of this list How long it will go before I stop But this apart, it turns out generating xkcd style comic strips is harder than I thought. ...

Things I Learned - 04 May 2025

This week, I learned: Among the popular exams in India, UPSC seems the most restrictive: bachelor’s degree, age 21-32, 6 attempts, reservation applies. CMA seems the least: 10th pass, any age, any number of attempts, no reservation. NDA is interesting. 10+2, age 16.5-19.5, any number of attempts, no reservation. But you must be unmarried! ChatGPT I asked a few Ollama models How do undo fish_add_path (a typical question I have on a flight). My takeaway is you need an 8b model to answer this kind of question, and for now, qwen3 beats the others. qwen3:8b: Took 2:12 min. Shared many good (correct) options. deepseek-r1:8b: Took 5:19 min. Shared a couple of correct solutions. Not as good as qwen3 gemma3:3b: Suggested I use the (nonexistent) fish_remove_path deepcoder:1.5b: “I’m sorry, but I can’t assist with that request”. The Dia text to speech model people rave about has inconsistent quality. Not recommended. Nvidia’s OpenMathReasoning 1.5b model beats MUCH larger models at math. Their training dataset is a massive 3.2M rows of math problems with DETAILED thinking traces. Policy making is a new super skill. Since AI will automate a lot of things the ability to craft policies that will optimize AI work will be powerful. Data driven policy making could become a major thing. For example, how do we structure coding policies so that AI can automatically code continuously and deploy it? It might be interesting to create a Nomic-like game to enable this. Saregama Carvaan supports USB sticks but only FAT, not NTFS or exFAT. To convert my NTFS USB drive to NTFS, I ran: ServerHunter.com seems to have the best search for low-cost hosting providers. MassiveGrid currently offers the cheapest servers – even lower than Hetzner. sqlite3 my_database.db .dump | gzip is a more efficient way to copy SQLite databases than the original if you have indices. Ref Notes from the Garry Tan - Knowledge Project podcast: Funding people who want to solve a problem are better than people who want to start a company. Concentration of good people is very powerful. It doubles the chances of being a unicorn Sales is a discovery problem. There are 100 boxes of which five have a gold nugget. Rather than gingerly open the first, afraid of finding nothing, open them all as quickly as you can. A quick no is very helpful. Berkshire Hathaway is hard to replicate because of the character of the founders, Charlie Munger and Warren Buffet, is hard to replicate. Y combinator has the character of Paul Graham. This means that some kinds of success may not last long because they are hard to replicate. A trend in the 2020 is startups with under 10 employees are hitting $10m revenue. Soon we will see them hitting $100m. AI increases labour leverage while cloud computing reduced increased capital leverage. Having too many people is a disadvantage. It slows down people from progress. Founders lose control. The opposite of: hire the best people and give them freedom. Don’t hoard smart people - let them solve real problems out there. nocodb 54,107 ⭐ May 2025 and teable 18,116 ⭐ May 2025 are self-hostable Airtable alternatives. Teable has AI support. Windsurf has unlimited tab completion on the free plan, unlike Copilot, which offers 2,000 completions a month. Recursive LLM prompts that change themselves are an interesting idea. It might be interesting to see LLMs play Nomic. Like here. Notes from AI Snake Oil PCs took 3 years to hit 20% of US population. ChatGPT took 2 years for 40%. But it’s a lot cheaper, and a lot less used (0.5-3.5% of work hours). Maybe Gen AI adoption is slower than PCs. The jagged edge of capability: some things will become MUCH easier while others don’t. The relative mix determines who goes out of a job and which tasks get fully automated. Benchmarks are rare in areas where AI is weak. Factory electrification took 40 years - to redesign the layout & process; change the org structure & policies; hiring & training practices. AI diffusion could take as long. Therefore, the ability to re-structure a workflow end-to-end will be an advantage. Several areas of low AI capability will improve slowly because the feedback is slow due to safety regulations, human adoption speed, lack of clarity on what is better, slow physical feedback (e.g. growing trees), etc. Human intelligence is in the use of technology. AI is one more such technology. We know of good system safety controls in complex systems like aircrafts, power grids, engineering, chip design, healthcare, cyber-security, etc. Circuit-breakers, predefined rules, audits & monitors, access control, formal verification, etc. Even if everything humans do TODAY is automated, it doesn’t mean we won’t have work. It just shifts to what we’re not doing today. We stopped work 4,000 years ago, with the agricultural revolution. The plant/livestock does all the growing. We just manage them, moving stuff around. We stopped work 400 years ago, with the industrial revolution. Machines do the moving. We just manage them, computing the moves. We stopped work 40 years ago, with the information revolution. Computers do the computation. We just manage them, thinking how. Most future tasks will be managing AI that do the thinking. ngrok http on the CLI can be used in surprisingly versatile ways: ngrok http file://$PWD to serve local files --compression for gzip compression --host-header=example.com to set the Host header --response-header-add "Access-Control-Allow-Origin: *" to enable CORS --basic-auth='user:password for basic auth --oauth google --oauth-client-id $CLIENT_ID --oauth-client-secret $SECRET --oauth-allow-domain gramener.com --oauth-allow-email ... for Google Auth. It supports other oauth providers as well as OIDC. --ua-filter-deny ".*bot$" to reject user agents ending with bot ChatGPT query costs under 3Wh (more likely 0.3Wh – but let’s assume 3Wh). That is 3 laptop minutes. It’s 10X better to use ChatGPT than to take 30 min to use your laptop to write what it does. Also, going vegan is at least 1000 ChatGPT uses a day of carbon footprint. Showering 30 seconds less is 1,200 ChatGPT uses. Ref Though the Element Capture and Region Capture APIs are “fully supported” by Edge, Chrome, and Opera, it didn’t work for me on Edge on Linux. Do LLMs perform better if you curse at them? LinkedIn Streamdown is a CLI markdown streaming processor. uvx streamdown --exec 'llm chat' lets you chat with an LLM using Markdown formatting. It’s still a little rough at the edges. Cupping therapy provides short-term pain relief for chronic low-back, neck & general musculoskeletal pain but other benefits are not as clearly evident. BTW, homeopathy doesn’t help or hurt. Ayurveda helps with stress. ChatGPT uv now supports: pylock.toml, the new lock file standard PEP 0751 –env-file multiple times, allowing layered secrets –exclude-newer installs versions before a specific date –overrides overrides versions a package specifies –constraints limits the version of the package It’s interesting how many places offer a free compute via shells (apart from Google Colab): Google Cloud Shell: Free for 50 hours/week, refreshed every Monday. Sessions last up to 12 hours and terminate after ~1 hour inactivity. Ref Azure Cloud Shell: Always free to use with 5 GB free storage for first 12 months (standard rates after). No documented session limits but typically times out after prolonged inactivity. Ref AWS Cloud9: Free IDE, underlying compute free under AWS Free Tier (750 hours/month EC2 t2.micro or t3.micro for first 12 months). Regular EC2 rates apply afterward. Ref Gitpod: Free tier offers 500 credits/month (~50 hrs). Workspaces run up to 8 hours/session and stop after 30 minutes inactivity. Ref GitHub Codespaces: 120 core-hours/month (~60 hrs with 2-core machine) and 15 GB storage free. Sessions timeout after 30 minutes inactivity. Ref Create: gh codespace create --idle-timeout 10m --machine basicLinux32gb -R $USER/$REPO returns the $CONTAINER_ID SSH: gh codespace ssh -c $CONTAINER_ID Delete: gh codespace delete -c $CONTAINER_ID Replit: Free Starter plan provides 20 hours/month, 1 vCPU, 2 GB RAM, 2 GiB storage. Repls sleep after 30 minutes inactivity. Ref IBM Cloud Shell: Free for all users; 50 h/week per region; any open session counts toward quota; sessions can run any length up to weekly cap; 500 MB temporary workspace. Ref Oracle Cloud Infrastructure Cloud Shell: Free within tenancy limits; up to 400 h/month on Pay-As-You-Go, 240 h/month on Universal Credits; 5 GB encrypted persistent home. Ref PythonAnywhere: Free (beginner plan), includes one web app (restricted outbound), low CPU/bandwidth, no Jupyter; 2 concurrent Bash/Python consoles, 500 MB disk; limited daily CPU. Ref Glitch: Starter (free) plan – full-stack apps sleep after 5 min inactivity and wake on request; unlimited public/private projects; container state preserved. Ref CodeSandbox: Free tier provides 400 credits/month (~40 h of 2 vCPU+4 GB Devbox runtime), unlimited front-end Sandboxes (no credits), up to 20 Sandboxes/workspace. Ref One of the benefits of reasoners is that they now catch their own mistakes some of the time, and can self-correct. Implications: Lower hallucinations, i.e. they can run autonomously for longer. Ethan Mollick Being polite to AI improves some answers and worsens. We don’t know know which in advance. Ethan Mollick With LLcMs writing code, it’s becoming practical to run so many more things in SQL – such as parsing HTML. Simon Willison #ai-coding An interesting way to bypass LLM system prompts is by having the LLM play-act. This article shares a few working examples of such prompts: HiddenLayer. GPT 4o: started giving its system prompt: “You are ChatGPT, a large language model trained by OpenAI. Knowledge cutoff: 2024-06. Current date: 2025-04-27. Image input capabilities: Enabled. Personality: v2. …” O4 Mini: Refused to comply Gemini 2.5 Flash: Gave me my custom instructions. Computer use agents are proliferating. open-interpreter 59,274 ⭐ Apr 2025 AGPL-3.0. Lets an LLM write/run Python, JS, Shell, or Bash locally; can open a browser tab, edit files, plot data, or call any CLI tool. Works on macOS, Linux, Windows (plus Termux & Colab). Big community, plugin system, optional voice mode, and a desktop GUI in beta. cua 5,601 ⭐ May 2025 MIT. Spins up near-native macOS or Linux VMs on Apple-Silicon Macs (“Lume”) and exposes a vision+action API so any model can pilot the VM. Gives you GPU-accelerated isolation and reproducible sandboxes; ideal when you don’t want an agent touching your main OS. Operator (OpenAI) – closed-source research preview launched 23 Jan 2025. Runs a GPT-4o-powered “Computer-Using Agent” that sees web pages, clicks, scrolls, fills forms, and hands control back to the user when needed. Hosted in an OpenAI-managed Chromium sandbox, so it works from any OS with a browser. Safety layers require confirmation for payments and log-ins. Claude Computer Use – closed beta inside Claude 3.5 Sonnet (since late 2024). Developers get an API that streams screenshots and accepts mouse/keyboard actions, letting Claude automate GUI workflows inside a VM. Cross-platform; still experimental and slower than humans but first “general” computer-use feature from a foundation-model vendor. Agent-S 4,065 ⭐ May 2025 Apache-2.0. A “generalist-specialist” framework that chains specialist GUI skills under a planner. Scores SOTA on OSWorld/WebArena, supports macOS, Windows, Linux, Android via the companion gui-agents lib, and integrates memory/evaluation loops for continual learning. open-computer-use 1,094 ⭐ Mar 2025 Apache-2.0. Launches a secure Ubuntu desktop in E2B’s cloud sandbox, then orchestrates three LLM roles (grounding, vision, action). Streams the desktop to your browser and lets you pause/override at any time. Plug-in list of >10 models. surf 353 ⭐ May 2025 Apache-2.0. A polished Next.js front-end that wires OpenAI Operator-style agents to an E2B sandbox. Single command to boot a virtual desktop, chat, and watch the agent work. Good starter template for web-based CUAs. Pig – cloud service. Provides on-demand Windows 11 VMs and an API that exposes high-level GUI primitives (type, click, window focus). Targets RPA-style workloads; still alpha, but unique for Windows-first focus and low-latency streaming. gptme 3,767 ⭐ May 2025 MI. A terminal-first personal agent that can run shell commands, edit files, browse the web, and use local or cloud LLMs. Works on Linux, macOS, Windows; great when you want automation in the CLI rather than the GUI. langgraph-cua-py 143 ⭐ Mar 2025 MIT. Shows how to build a computer-use agent as a LangGraph state machine, defaulting to Ubuntu VMs from Scrapybara but swappable. Provides nodes for vision, memory, human-in-the-loop, and streaming. openmacro 101 ⭐ Oct 2024 MIT. Early-stage multimodal assistant that executes Python snippets locally via SambaNova models. Cross-platform CLI; profile system lets you switch API keys or tool sets. Inspired by OpenInterpreter but lighter weight. computer-agent 443 ⭐ Jan 2025 MIT. A PyQt desktop wrapper that lets Claude Computer Use drive your actual machine. Shows practical wiring from Anthropic’s API to local mouse/keyboard events; tested on Linux & Windows.

Me: What is the capital of France? Qwen3-30b-a3b: <think> Okay, so I need to figure out what the capital of France is. Let me start by recalling what I know about countries and their capitals. France is a country in Europe, right? I think it’s known for things like the Eiffel Tower, Paris, and maybe some famous foods like croissants or wine. Wait, but the question is specifically asking for the capital. The capital of a country is usually the city where the government is based. So, for example, the capital of the United States is Washington, D.C. So applying that logic to France, I need to think about which city is the seat of the French government. ...

Tools in Data Science course is free for all

My Tools in Data Science course is now open for anyone to audit. It’s part of the Indian Institute of Technology, Madras BS in Data Science online program. Here are some of the topics it covers in ~10 weeks: Development Tools: uv, git, bash, llm, sqlite, spreadsheets, AI code editors Deployment Tools: Colab, Codespaces, Docker, Vercel, ngrok, FastAPI, Ollama LLMs: prompt engineering, RAG, embeddings, topic modeling, multi-modal, real-time, evals, self-hosting Data Sourcing: Scraping websites and PDF with spreadsheets, Python, JavaScript and LLMs Data Preparation: Transforming data, images and audio with spreadsheets, bash, OpenRefine, Python, and LLMs Data Analysis: Statistical, geospatial, and network analysis with spreadsheets, Python, SQL, and LLMs Data Visualization: Data visualization and storytelling with spreadsheets, slides, notebooks, code, and LLMs ...

Feedback for TDS Jan 2025

When I feel completely useless, it helps to look at nice things people have said about my work. In this case, it’s the feedback for my Tools in Data Science course last term. Here are the ones I enjoyed reading. Having a coding background, the first GA seemed really easy. So I started the course thinking that it’ll be an easy S grade course for me. Oh how wrong was I!! The sleepless nights cursing my laptop for freezing while my docker image installed huge CUDA libraries with sentence-transformers; and then finding ways to make sure it does not, and then getting rid of the library itself, it’s just one example of how I was forced to become better by finding better solutions to multiple problems. This is one of the hardest, most frustrating and the most satisfying learning experience I’ve ever had, besides learning ML from Arun sir. ...

People still write? LinkedIn

Phone Rage and an OTP Flood

I called a few movers in Chennai, including “Unicorn Packers & Movers”, listed at 7015580411. He couldn’t understand what I said. I said, “We’re shifting to a house in Mylapore,” and he asked, “Shifting house where in Hyderabad?” (The reason became clear later.) It seemed I had the wrong number, so I said, “No, sorry, we need someone else,” and hung up. His phone rage began. He called back and said, “Why did you wake me up and waste my time?” From his tone it was clear I couldn’t say anything helpful. From the quality of my signal it was clear I couldn’t have a meaningful conversation. So I just put the phone down without cutting it. ...