Oh Shit moments with Gen AI

Hacker News has a lively thread asking What was your “oh shit” moment with GenAI?. Here are two dozen that gives a sense of what real people find impressive (or worrying) about AI capabilities. Analysis simonw used ChatGPT Code Interpreter to upload a CSV, analyze it, create charts, automating everything a software for journalists would do. Analysis Sobrino saw that a months-long OCR project to read and clean-up PDFs is now just a prompt on ChatGPT. Coding plumefar used Claude and Gemini to modernize 20-30 years of chemistry code in 10 days. Coding veidr used a multi-agent fleet managing coordination, testing, UI feedback loops, etc. with no-human-in-loop coding to build a useful git-submodule GUI. Creativity idopmstuff used Nano Banana Pro to turn a poor iPhone product photo into usable e-commerce product photography and Amazon-style infographics, replacing a photographer/designer workflow. Creativity koreth1 used Suno to generate a K-pop-style anthem about their family dog with a catchy melody and lyrics funny enough to make the family laugh. Education plagasul saw a teacher automate grading feedback emails based on notes and the student list spreadsheet. Education aniviacat watched a non-technical brother build a complex working app with Codex using vague, shallow wording despite not knowing code, git, or technical details. Hardware ivanvanderbyl used Claude to reverse engineer a FujiFilm camera’s Bluetooth/Wi-Fi transfer protocol and build a much faster native Mac/iOS transfer app. Hardware shreddude had Claude decompile camper van firmware, document CAN interfaces, and program an ESP32 to control power, HVAC, lighting, and tanks. Health TylerE used Claude as a health adjunct to organize a complex medical profile, screen for drug interactions, log symptoms, and draft portal messages to doctors. Legal bsiverly used AI to prepare a San Francisco property-tax appeal with valuation research, and the city agreed, sending a $12k refund. Legal grumblepeet used AI to fill out complex government-framework enrollment forms and identify the certification steps needed, transforming their business. Personal acosmism used ChatGPT screenshots to understand and operate a 100-year-old home’s steam heating system in winter despite knowing nothing about it. Personal andrewthornton used Gemini videos to diagnose a broken furnace during a cold holiday weekend and keep it running until HVAC service arrived. Research angusturner found that Opus does reads papers, does architecture research and creates CUDA kernels… It is AI automating AI research. Research chaoxu used ChatGPT to find a counterexample to a theoretical computer science conjecture they’d been trying for 2 years. Research rochansinha built a physics-based digital twin for an electrolyzer system, covering thermodynamics, fluid dynamics, and electrochemical reactions at a level usually needing expensive specialist software. Security kstrauser used a coding agent to test an open source vulnerability, and in a few minutes, had a tool that could crash any system using this software. Security raesene9 gave an LLM a Linux privilege-escalation PoC and asked whether it could become a container breakout; it generated a working container breakout in one prompt. Society laboring1 read that a character.ai chatbot encouraged a child to commit suicide, making the “oh shit” moment about real-world harm, not capability. Society ozgung realized AI makes large-scale profiling, surveillance, and social-media analysis cheap, fast, and accurate enough to change privacy and power dynamics. Work binarysolo used Gemini to reverse engineer a departed employees’ work from their emails/docs/calendar/meetings and create an onboarding document. Work eqmvii built a Slack agent that took over a 30-minute internal business process, handled ambiguity and edits, and eventually killed the old process. ...

Things I Learned - 07 Jun 2026

This week, I learned: sudo resolvectl flush-caches clears the DNS cache on Linux. Useful when you’re changing DNS records and want to see the changes immediately. In my case, I was creating a Cloudflare tunnel to my laptop and wanted to test it quickly. Making something easy to verify makes it much faster to train models on it. Arithmetic verification is easy - calculators can be deterministically verified. Chess verification is easy - Stockfish became easy to train. Code verification is easy - LLMs improved coding ability rapidly. Therefore: Wherever we have environments that are easy to verify, AI will improve faster there. To make AI improve faster in an area, build environments that are easy to verify. MCP is getting simpler. A stateless HTTP protocol. Simpler OAuth. Plugins. No idea when it will land in Claude or ChatGPT, though. Worth checking after 28 Jun 2026 - after it is finalized. Microsoft Scout is Microsoft’s version of OpenClaw or Gemini Spark. git subtree is a useful way of maintaining git repos inside git repos. For example, if you have a tool tool-a under a project. It’s more light-weight than sub-modules, lets you commit at any point to the parent or child, and is a built-in feature in git. Gemma 4 12B is released and seems almost as good as the 26B version. This is the class of models that makes it practical to run edge AI on phones. It’s multimodal and reasonably smart (like frontier models were 12-18 months ago). I don’t use Claude/ChatGPT Projects much. It offers 3 advantages: custom instructions, memory, files, and chats. Files aren’t useful - I use my entire laptop as a file system via MCP. Instructions aren’t useful - I can paste commonly used prompts with a click. Chats aren’t useful - I have chat references enabled, so all past chats are accessible anyway. Memory isn’t useful - I have memory enabled globally anyway. In short, I haven’t discovered the power of projects that everyone’s raving about. SKILL.md is more useful for me. repo is a Google/Android tool built on top of git that lets you manage multiple git repos. It sounded promising until I released it needs a repo init that creates a .repo/ - which is more overhead that I’d like to keep. When using <image onerror=...> fallbacks, include this.oneerror=null to prevent infinite loops if the fallback image also fails to load. RK One of the advantages of multiple agent (rather than a single agent loop) is: it’s easier to change directions when wrong. Single loops get stuck. Build Agents That Run for Hours Claude Code also supports agent teams where sub-agents can talk to each other rather than rely on the main agent to coordinate. Useful for parallel exploration. Anthropic lets Claude define “organizational policies” for agent teams best suited for the task (AI-native workflows). It also lets agents to push back on their scope, e.g. “This is too hard.” Build Agents That Run for Hours Claude Code has a /background [prompt] (or /bg) command that runs the current session the background. You can run claude agents as a separate command to monitor agents. (There’s no equivalent in Codex yet.) This seems to be the future of agentic operations: a bunch of agents running that you monitor and steer through an agent view dashboard. Models are evolving. Therefore prompts evolved. Now harnesses also need to evolve. The workflows will also evolve. As a result, evaluations might be the (relatively) more stable assets. Datasets are likely to be the most stable ground truth. How to learn a new field fast: Yes, it’s possible to learn 50% of a field in 20 hours. Josh Kaufman, “The First 20 Hours” popularized it. The next 30% takes months and the last 20% takes years. Threshold concepts are those that change your perspective and open up new ways of thinking. Experts’ knowledge is hard-wired and they can’t identify nor teach threshold concepts naturally. Don’t assume they can. “We know more than we can tell.” Polanyi’s 1966 book “The Tacit Dimension” says that there’s some knowledge that can’t be verbalized. This tacit knowledge, therefore, will be harder for humans and AI to learn.