How to publish an eBook in 60 minutes

I published an eBook on Amazon. It takes an hour if you have the content ready. STEP 1 (10 min): Set up a Kindle Direct Publishing account with your address, bank details, and tax info. STEP 2 (15 min): Export my London 2000 blog archive and convert to Markdown. STEP 3 (10 min): Reformat the Markdown by writing a script in Cursor. Here’s the prompt: Write a Python script that reads *.md including the YAML frontmatter, adds the YAML title as H1, date (yyyy-mm-dd) like Sun, 01 Jan 2000 in a new para after the frontmatter and before the content. ...

The vCard QR Code is The New Business Card

Arindam Roy suggested a business idea post-COVID: an online QR code for business cards. The benefits are clear. Never print or carry cards. The scanned card goes straight to the other person’s contacts. There might be social metrics we could capture as well. That idea never got past the discussion stage, though I’ve begged for it a few times. Last year, I decided to use existing tools to solve the problem. My current approach: generate a vCard QR code. ...

LLM GPU or API? The Cost Will Surprise You

Say you want to use Llama 3.3 70b Instruct. You could: Use it via an API. OpenRouter offers it at ~12 cents / MTok. Azure offers it at 71 cents. Your price may vary. Self-host it on a major cloud provider. Azure offers A100 80GB at ~$3.67 / hour. In a day, you could generate ~0.5-1M tokens. Self-host it on an emerging cloud provider. Lambda Labs offers A100 80GB at ~$1.79 / hour. Again, ~0.5-1M tokens a day. Clearly, self-hosting is cheaper if you run it continuously. Let’s say we run for 1 million tokens every day. Then: ...

It's so easy to fool yourself

I asked 4 LLMs to “Suggest catchy rhyming slogans to promote journaling.” Here are the suggestions: GPT 4.5 Write it right, day or night! Reflect, inspect, journaling perfect! Express to impress, journaling’s best! Clear your mind, write and unwind! Pen your zen, again and again! Daily pages, inner sages! Take a pause, write your cause! Emotions flow, journals grow! Write, ignite, your inner light! Ink it, think it, don’t just blink it! Pen your quest, ease your stress! From confusion to conclusion—journal your solution! Keep a book, take a look, watch your growth hook! Thoughts on paper, worries taper! Write to heal, make it real! Claude 3.7 Sonnet Write it down, turn that frown upside down! Pen your thoughts, connect the dots. Journal every day, keep the stress away. Ink your mind, treasures you’ll find. Write and reflect, your future self will thank you direct. Scribble and doodle, your mind will stay in the noodle. Journal with flair, show yourself you care. Jot it in ink, give yourself time to think. Fill the page, release the rage. Write your story, capture the glory. Daily writing, life enlightening. Express in a line, your day will shine. Thoughts on paper, life feels safer. Write with heart, that’s the smart start. Journaling’s neat, makes life complete. Gemini 2.5 Pro Thoughts unclear? Write them down here. Let the ink help you think. Untangle your mind, one line at a time. To make things clear, keep your journal near. When your head’s in a knot, give journaling a shot. Feeling stressed? Put your pen to the test. Write it down, let worries drown. Ease your mind, leave troubles behind. Let your feelings take flight, just sit down and write. Turn the page on inner rage. Write to know, watch yourself grow. Explore your soul, take control. Pen and page, turn a new age. Discover your way, write every day. Know your heart? Make journaling your art. Capture the day in your own way. Make moments last, write down the past. Don’t let memories stray, write them today. Reflect and connect, gain self-respect. Life moves fast? Make memories last. Get in the flow, write and grow. Pen in hand, understand. Daily lines, clear designs. Just take note, stay afloat. Write your heart, make a fresh start. Grok 3: Write your day, let thoughts play! Pen in hand, your mind expands! Journal your way, seize the day! Feel the groove, let words move! Scribe your soul, make it whole! Ink flows free, just be me! Page by page, unlock your sage! Write it down, wear the crown! At first glance, GPT 4.5 didn’t impress me. Claude 3.7 Sonnet did. I also didn’t like Gemini 2.5 Pro, but Grok was great. ...

AI Imagined Languages Unlock New Thoughts

I prompted Gemini 2.5 Pro Experimental 03-25 to “Invent a new, alien language. Use it to discover unusual insights. Here’s the output in full. Thinking Here’s a thinking process that could lead to the alien language “Glimmering” and its insights: Understand the Core Request: The user wants a new, alien language and wants to use it to discover unusual insights. This means the language needs to be fundamentally different from human languages, and its structure/concepts should enable thinking in ways that are difficult or non-intuitive in English (or other human languages). Brainstorm Alien Concepts: What makes something “alien”? Biology: Different senses (echolocation, electroreception, sensing magnetic fields, different light spectrum perception), different lifecycles (metamorphosis, symbiosis, collective consciousness), different physical forms (non-corporeal, energy-based, silicon-based, plant-like) Environment: Extreme conditions (gas giant atmosphere, deep ocean trench, vacuum of space, high radiation), different physics (maybe?), non-linear time perception. Psychology/Society: Collective vs. individualistic focus, different emotional spectrum, logic-based vs. empathy-based, non-hierarchical structures, different concepts of self, time, causality. Choose a Core Concept for the Language: Let’s pick something that inherently forces a different perspective. Idea 1: Focus on relationships/connections. Instead of nouns for objects, words describe the relationship between things. “Tree” might be “earth-sky connector” or “shade-giver.” Idea 2: Focus on states of change/flux. Everything is described by its process, not its static state. “River” is “downward-flowing-water-process.” Idea 3: Focus on sensory perception, but a Maybe based on resonance, vibration, or emotional energy. Idea 4: Focus on potentiality and probability. Words describe not what is, but what could be or the likelihood of something. Let’s go with Idea 3, leaning towards resonance and interconnectedness, maybe linked to light or energy. This feels suitably “alien” and potentially insightful. Let’s call the species the “Lumin.” Develop Language Principles based on the Core Concept (Resonance/Light/Interconnectedness): No Static Nouns: Objects aren’t fixed things but patterns of energy or resonance. Words describe the pattern or the interaction. Verbs are Primary: Action, interaction, and change are fundamental. Static description is secondary. Emphasis on Relationality: How things resonate with each other is crucial. Prepositions or affixes will be vital for showing connection/influence. Nuance through Intensity/Quality: Instead of just “red,” maybe “vibrant-red-resonance” vs. “fading-red-resonance.” Adjectives become qualities of the resonance/action. Concept of “Self”: Maybe less individualistic. “Self” is a locus of interacting resonances, not a discrete entity.Time: Perhaps less linear, more about overlapping patterns or echoes of resonance. Create Basic Lexicon and Grammar Sketch: Core concept words: Glim (verb root): To resonate, to shine, to exist as a pattern of energy. Sha (particle/prefix): Connection, interaction, resonance with. Vor (particle/suffix): Intensity, vibrancy, strength of pattern. Zil (particle/suffix): Fading, weakening, diminishing pattern. Kai (concept): A stable, recognizable pattern (closest to a noun, but dynamic). Ree (concept): Flow, change, movement of energy/resonance. Nuum (concept): The interconnected field/medium through which resonance travels (the “ether” or “fabric of reality”). Grammar Ideas: Sentence structure: Often Verb-like concept first, then modifiers showing relationships and qualities. Glim-sha-Kai-Vor (Strong pattern resonates with…). No strict tense: Use particles to indicate persistence (-vor), fading (-zil), or initiation (Ree- prefix?). Pronouns: Maybe based on resonance signature? Me could be Kai-An (This-pattern), You could be Kai-Sha (Interacting-pattern). Collective We could be Sha-Glim-Vor (Strong-interacting-resonance). Name the Language: Based on the core concepts: “Resonance Speak,” “Light Tongue,” “Pattern Weave.” Let’s go with something evocative: Glimmering (or Glim-Ree in the language itself). Translate Concepts and Look for Insights: Now, apply Glimmering to human concepts. “Love”: In English, often a noun (a state) or verb (an action). In Glimmering? Sha-Glim-Vor (Strong-interacting-resonance). Insight: Love isn’t a possession or a static feeling, but an active, powerful resonance between patterns. It requires mutual participation and reinforcement. Lack of interaction (Sha-Glim-Zil) means the “love” fades, not because it was “lost,” but because the resonance weakened. “Identity/Self”: English emphasizes a stable, internal self. In Glimmering? Kai-An-Ree (This-pattern-in-flux). Or perhaps Sha-Nuum-Glim (Resonance-interacting-with-the-field). Insight: The “self” isn’t a fixed entity but a constantly changing pattern defined by its interactions with everything else. Identity is relational and dynamic, not intrinsic and static. There’s no core “you” separate from your connections and resonance. “Knowledge/Truth”: English often treats these as objective, attainable things. In Glimmering? Kai-Sha-Glim-Vor (A strong, stable resonance between a pattern [observer] and another pattern [observed]). Or Nuum-Ree-Kai (Recognizing a flow/pattern within the field). Insight: Truth isn’t absolute but a measure of the strength and stability of the resonance between the observer and the observed. “Knowing” something is actively resonating with its pattern. Disagreement isn’t “wrongness” but a difference in resonance patterns (Sha-Glim-Kai- dissonance_marker). Understanding requires aligning or acknowledging different resonances. “Death”: English sees it as an end, cessation. In Glimmering? Kai-Zil-Nuum (Pattern-fading-into-the-field). Or Glim-Ree-Sha (Resonance-flowing-elsewhere/transforming). Insight: Death isn’t an absolute end but a transformation. The specific pattern (Kai) dissolves, its energy/resonance (Glim) rejoining the larger field (Nuum) or contributing to new patterns (Ree-Sha). It emphasizes continuity of energy/information, just in a different form. The “loss” is the cessation of a specific interactive pattern. “Problem/Conflict”: English frames it as an obstacle. In Glimmering? Sha-Glim-Dissonance (Interacting-resonance-with-clashing/instability). Insight: Conflict isn’t inherently “bad” but a state of unstable or disharmonious resonance. The “solution” isn’t necessarily elimination, but finding a way to shift the resonances (Ree-Sha-Glim) into a more stable or harmonious pattern, potentially creating something new in the process. Refine and Elaborate: Flesh out the examples. Add more nuance. Make the insights clearer. Ensure the language structure consistently supports the insights. Add a concluding thought about the value of such different perspectives. Ensure the “alienness” is maintained. For instance, avoid direct human analogies where possible, focusing on the resonance/pattern concept. Add a brief description of the hypothetical speakers (the Lumin) to ground the language’s origin. Self-Critique: Does the language really lead to unusual insights, or are these just philosophical points rephrased? Is the language structure consistent? Is it different enough from human languages? Adjustment: Emphasize how the language structure forces this thinking, e.g., the lack of static nouns requires thinking in terms of processes and relationships. Make the link between grammar and insight explicit. Make the insights feel less like standard philosophy and more like consequences of a fundamentally different way of parsing reality. Output Okay, let’s invent an alien language designed to generate unusual insights. We’ll call the species the “K’tharr,” dwellers of a dense, perpetually twilight gas giant’s upper atmosphere, communicating through modulated bioluminescent patterns and subtle gravitational waves (perceived through specialized organs). Their reality is fluid, interconnected, and based on flows rather than static objects. ...

Things I Learned - 30 Mar 2025

This week, I learned: Discussion with Vedang Recurse center (Brooklyn, online) is a 6/12 week free self-driven programmer retreat. Runs every 6 weeks. You can do whatever you pick. There are daily standups for accountability. The groups are diverse. You can pair with them, pivot ideas, whatever. Principles: push yourself & learn. Western education techniques (e.g. spaced repetition, adaptive learning) are very much present in Indian coaching systems, though not known by those names. However, interventions are hard since class 12 students just don’t have enough time. Coaching classes are a social phenomenon. It’s not the smart students who are pulling in their friends. Smart students actually follow the popular students. (Coaching classes are below the typical smart students’ standards.) Monetizing coaching is hard. People don’t want to pay for advice, and welcome free advice only if they ask for it. Coupling with execution is necessary. Aider’s integrations make it more powerful than Cursor/Windsurf. It auto-lints, runs test cases. Allows different models for “architecting” (generating changes) vs “editing” (applying code). It reads from the screen logs. Context is manual, not automated. Uses an ai! comment to trigger changes and ai? to ask questions. Cline.bot is another Cursor-like open source AI code editor that’s a VS Code plugin. When coding with LLMs, a useful workflow is: data schema ➡️ interfaces ➡️ LLM-generated test cases ➡️ code. ShellSage is a tmux based LLM tool for the command line. It screen-grabs from tmux, which is powerful. Some MCPs that have proven useful: vega-lite, SQLite, sequential thinking, memory make sucks but is hard to beat. just comes closest. CRDTs are more powerful than for just collaborative editing. It can power a peer-to-peer Internet (beginning with office tools). Versioning schema is still problematic. yjs is a good start but automerge (Rust, WASM) is faster and may be better. Loro is another. Fermyon hosts WASM serverless functions. If LLMs are most safely used where there’s no definitive “wrong” answer, here are low-risk industries and safe LLM use cases within each: Marketing and Advertising: Ad Copy and Campaign Content Generation, Personalized Marketing Messages, Creative Strategy Brainstorms, Automated Marketing Production (Everyday Wins) Customer Service and Support: AI-Powered Chatbots for Common Queries, Agent Assist and Email Drafting, Summarizing and Analyzing Customer Feedback, Interactive Troubleshooting and FAQs Retail and eCommerce: AI-generated Summary of Product Reviews, Product Description and Catalog Content Generation, Visual Content and Image Captions, Personalized Shopping Recommendations (Narrative Form) Human Resources and Talent Management: Job Description and Policy Writing, Resume Screening and Candidate Q&A, Employee Communications and Feedback, Training and Onboarding Content Education and E-Learning: Personalized Explanations and Tutoring, Content Creation: Stories, Examples, and Analogies, Practice Problems and Quiz Generation, Automated Grading and Feedback Media and Entertainment: Writing and Editing Assistance, Personalized Media Content, Localization and Dubbing Scripts, Content Moderation and Curation (Assistive) Finance and Banking: Market Commentary and Research Summaries, Client Communications and Explanations, Regulatory Compliance Summaries, Scenario Analysis and Planning Management Consulting and Strategy: Research and Insight Generation, Document and Slide Drafting, Brainstorming and Scenario Planning Legal Services: Drafting Contracts and Legal Documents, Legal Research Q&A and Summaries, Client Communications and Explanations Reflecting on Satya Nadella’s “SaaS is dead”, building or porting apps’ functionality into classic chatbots (e.g. via MCPs) would be an emerging market. E.g. “Create a HubSpot MCP. Do whatever you want on HubSpot, except via ChatGPT or your favorite LLM chatbot.” To be fair, such interfaces exist. HubSpot MCP with a vega-lite MCP and a few others could solve many common HubSpot UI tasks. DarwinBox MCP, ZenDesk MCP, etc. are emerging. 13 things I would have told myself before building an autorouter has a few interesting points: The A* algorithm finds the shortest path in a graph much quicker than others like Dijkstra’s algorithm by preferring nodes closer to the goal. Spatial Hash Indexing are O(1) and beat Tree Data Structures which are O(log n). Always prefer hashes when possible. There’s an actual convention for using emojis in Git commits: gitemoji. It even has a VS Code plugin, a changelog generator, and more. Emojis have a strong role in enhancing Markdown documents. The ones I use often are: 🔴🟡🟢 for low/medium/high priority ⭐️ or ❤️ or 👍 for ratings or emphasis ✅ for completed tasks 💡 for ideas ⚠️ or ❗️ for warnings / issues Technological innovations have always been changing art forms. For example, the perspective grid and the camera obscura led to major improvements in realistic paintings in the 15th and 17th centuries. regex is an officially recommended Python library with better regex support than re. Ref Notes from ThursdAI - Mar 27 Gemini 2.5 Pro has good instruction following despite long context. It automatically thinks for longer where required. Good at understanding large codebases. Very fast. You can upload a 2 hour audio to transcribe with timestamps. ai.dev is the shortcut to Google AI studio. ChatGPT native image generation is the best image generation model now. - Great character consistency AND prompt adherence thanks to autoregression and not using stable diffusion. - It tends to refuse image generation less than Dall-E. (While Ghibli-style is possible, Calvin and Hobbes strips are blocked.) “We added a refusal which triggers when a user attempts to generate an image in the style of a living artist.” Addendum to GPT-4o System Card - A neat personalization implication is that you could put your kids into their favourite cartoon as a cartoon character that looks like them. It’s weird that the latest GPT 4o is ahead of GPT 4.5 on LM Arena. The new DeepSeek V3 is about as good as GPT 4.5 and VERY cheap (27c), so is the obvious choice to run on OpenRouter. MCP news: Qwen.ai supports MCP in the UI! (But it’s marked as “coming soon” in my case.) Unlike tools, MCP uses servers that can remember the state or context. Tools are stateless. MCP app store like Smithery, MCP.run, Glama, are mushrooming. Awesome MCP Servers is another good starting point. Azure lets you expose agents as MCP servers. ChatGPT now uses semantic VAD. I interrupts less and typically when you have meaningfully complete something. It responds a little slower as a result. AI generated images created from prompts cannot be copyrighted. News US Copyright Office LLMs are much better at GeoGuessr than humans. arXiv. Gemini leads the pack and is ~3x better at continents, 9x better at countries, and 37x better at cities. Gemini 2.5 Pro transcription has accurate timestamps and bounding boxes. Simon Willison Notes from Writing with AI Personal writing with connection won’t go away. AI can’t give you heartbreak. But the rest of non fiction writing will vanish. What AI is extraordinary at is personalizing to each audience member’s interest Outlier opinions will thrive among humans - since AI is trained on consensus. Managers tend to be good at working with LLMs because it’s mostly about delegation. LLMs are perfect for things that don’t have a wrong answer! – Benedict Evans. 💡 Explore arguing with AI. It’s a safe way to get into a confrontational emotional state (which has its own benefits.) 💡 Keep an LLM on in voice mode while reading and ask it any questions you have. What models are good for what? GPT 4.5 is great for creation - has a great sense of humor but a corporate style. Still, way better than GPT 4o. ChatGPT is good for voice transcription and note taking. (Increasingly we take notes for AI rather than ourselves.) Claude 3.7 has the best style of writing. It’s also great for drawing charts. O1 Pro and Deep Research is great for consumption - research. Grok is the least corporate, able to argue with you, and the latest knowledge cutoff. ElevenLabs for editing podcasts in your voice, making corrections. Playwright offers an MCP server. https://simonwillison.net/2025/Mar/25/playwright-mcp/ The new GPT-4o mini Transcribe model is a bit better than Whisper and costs half: ~18 cents per hour. It includes background noise cancellation and semantic chunking, which is useful. The new GPT-4o mini TTS is about 3-4 times cheaper than TTS-1 since it’s ~$12/MTok instead of $15/Mchar. It supports emotions with streaming. Cursor with Claude 3.7 Max seems surprisingly good at generating multi-page sites at one shot. Potentially, it can edit large repositories of code as well at one shot. If that’s the case, the way we write code will require higher order thinking skills: broad sweeping changes rather than micro edits. I tried Open WebUI with its Knowledge feature. In short, it sucks. Due to the RAG technique as well as model quality. When I passed it my notes about Straive and asked who Straive’s clients were: Open WebUI with Gemma 3 found one - after multiple attempts ChatGPT with o3-mini-high got 5 (missing nothing.) ChatGPT with GPT 4.5 got 4 Gemini with Gemini 2.0 Flash Thinking got 3 Gemini with Gemini 2.0 Flash got 3 (with a 4th wrong answer) I’ve settled on squoosh.app for image compression using WebP. I’m exploring FreeImage.host for image hosting instead of Imgur for WEBP support. FreeImage.host also seems reliable, retains file sizes, and supports hotlinking. DeepFace currently seems the easiest option for face detection. Easy to install. Multiple back-ends. Gemini Codrawing is a popular Hugging face space that lets you sketch something and prompt Gemini Flash to improve on it. Draw a dead man beside the pool of blood. Add an armor to the attacker. Significantly improve the quality of this picture. Add a red pool of blood next to the dead man. The armor looks like a frock. Make it more like an armor. Make this look like a professional drawing, even though it’s in stick figures. Draw it in the style of Picasso Phi-4 multimodal procehttps://huggingface.co/microsoft/Phi-4-multimodal-instructsses speech better than Whisper V3 on HuggingFace OpenASR, and images better than Gemini Flash Lite On any LLM project, BEGIN with evals. Always. The effort for evals may seem high. Use LLMs to reduce this effort. Include irrelevant questions because people WILL ask them. Be clear on how to handle that.

Voice Chat to Slides: My New AI-Powered Workflow

Here’s my new workflow for creating slide decks: ChatGPT interviews me and creates Markdown slides. I use Marp to convert Markdown to slides. LLMs create supporting images. I deploy on GitHub Pages. … and here are 2 decks created this way. Visualizing LLM Hallucinations LLMs in Education Let’s look at how I built the second example, step by step. ChatGPT interviews me and creates Markdown slides While walking 75 minutes from home to IIT Madras to deliver this talk, I had ChatGPT interview me in standard voice mode. ...

It's not what you know. It's how you learn

Simon Willison’s blog post mentioned MDN’s browser compatibility tables that list the earliest release date for each browser feature. I figured: let’s see which browsers release features fastest. I calculated average delay for each browser’s feature release. For each browser, I looked at how many days after the first release it took to add a feature, averaged it, and published an interactive, scrolly-telling data story. ...

Things I Learned - 23 Mar 2025

This week, I learned: If we can DESCRIBE what good looks like, training data is no gap. We can auto optimise models towards that. That’s RLF. DeepSeek R1 side stepped the need for training data by creating reward functions and prompts. This tells the fine tuning process how to go correct as it goes along. This video is the first one that really help me understand what’s going on. I was born in the Ananda year in the Tamil and Telugu calendars. ChatGPT Andrej Karpathy’s note taking mechanism is similar to mine, except I use Microsoft TODO. Ref I have 3 categories. Things I learnt, which I just note. Things to explore, which I can delegate, defer, drop, or do at any time. Things to do, which are the hardest and pile up. Alexander Doria shares an interesting perspective on the app space. Model is the product Models are natively absorbing app capability and will become killer systems internalising workflows like Chat, Deep Research, Claude Code, Operator, etc. to wipe out the apps and workflow space. Models will “internalize” tool capabilities Opinionated or focused training will be a lever and model providers will acqui-hire the successful trainers API access from model providers will shrink. Selling tokens is not a viable business model given lowering costs The huggingface_hub cache-system uses symlinks by default to efficiently store duplicated files. To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In Windows, you can enable offline files for any SMB share via: Control Panel → Sync Center → Manage offline files and turn on the feature. Then, in File Explorer, right‑click the mapped network folder or drive and select “Always available offline.” OpenAI now supports PDFs natively in the API. (Gemini has done so for a while) Anger is a trigger for change. “Either change yourself or the environment, else you’ll be uncomfortable.” HocusPocus allows live collaboration e.g. editing together Block notes is a notion like library for editor components. Converts to Markdown Oxidizr enables replacing Linux tools with Rust equivalents. Emoji Kitchen lets you create stickers from emoji combinations. Another way of scaling LLMs is generating multiple options and self evaluating. Eric Zhao duckdb -ui launches a DuckDB notebook. This is built into newer DuckDB releases Monolith downloads web pages as a single HTML file by embedding content. Archgw is an LLM proxy/router from the makers of Envoy proxy. There’s an annotated Terry Pratchett! Gemini API allows YouTube videos as a part. Google agents.json is a proposal for discovery of agents on a site that enhances the Open API spec: wild-card-ai/agents-json Since Gemini Flash 2.0 is now an image GENERATION model, interactive VISUAL fiction is now a cool possibility. People are using it in interesting ways: Interleaved storytelling, Memes, Surrealism.

Things I Learned - 16 Mar 2025

This week, I learned: Here is a training program on open source corporate policy. htmlq and pup query HTML. They’re like jq for HTML. Here are time-tested and robust ways to leverage serendipity: ChatGPT Place. Be in places with high, diverse, talent density. Bell Labs (1950s), MIT (1970s), Pixar (1990s). People. Meet diverse, talented people. Da Vinci’s Renaissance circles, Lockheed Martin’s Skunk Works. Free time for unstructured work. 3M’s 15% rule, Google’s 20% time, Edison’s Invention Factory. Curiosity. Learn unrelated fields. Darwin’s earthworm research, Ben Franklin’s ocean currents work. Serendipity. Systematically add randomness. Brian Eno’s Oblique Strategies, IDEO’s Deep Dives. Reframe failure as opportunities. Penicillin, Velcro, Post-it Notes. Ceremonies. Hackathons, lightning talks, coffee trials. What makes client-side computing on the browser powerful is There’s nothing to install Private by default: data stays with client Speed: no latency SemGrep is a lot less open source than it used to be. ChatGPT. That’s a pity. It was a good tool. Site builders and headless CMSs are gently eating into the dominant market share of open source CMSs (via PretaGov). WordPress is pretty much the dominant CMS in the world, followed by Drupal. WordPress is now VC backed and is not growing, so they seem to be attacking their own community. Umbraco CMS is the only open source CMS that’s growing. Maybe because it’s the only .NET one Craft CMS is the only proprietary CMS that’s growing. Site builders are growing as a category. SquareSpace is the leading one. Headless CMS is growing too. Statamic. Next.js. Nuxt.js, Contentful, Prismic, Storyblok, Gatsby, etc. Here’s a sample CI/CD pipeline with automated code review. Here is the script that generated it. Note the use of NVIDIA’s GPU Docker containers via nvcr.io Things I learnt about robotics. SO-ARM100 is an open-source 3D printable robot arm. Takes ~20 hours to print, ~1 hour to assemble. Costs ~$120. LeKiwi is a mobile version of this arm LeRobot is a set of HuggingFace models and datasets. The idea is, you can use one “control” robot to control the other. Do stuff manually, teach it ~50 times, and it learns how to do what you’re do. Pi0 is an LLM equivalent for robotics that predicts actions. HuggingFace ported that to LeRobot Most real robotics work is on SIMILATED “gym” environments, not costly/slow physical environments.PushT is a simple 2D version. ALOHA is a 3D one. ROS is a nightmare to install and run - on Windows and Mac. Robotics Academy is an open collection of easier ROS exercises. PSLab - Pocket Science Lab is a sensor kit for the phone / PC. Costs ~$100 but isn’t available anywhere. Getting it to work requires too much mucking around with USB drivers and it just doesn’t work. (BBC micro:bit may be more promising.) Getting stuff done with electronics is still really hard unless it’s well designed. It’s FASCINATING that robots can have arbitrary joints. Our intuitions (or even biomimicry) on how to move and do stuff is a POOR intuitive guide for how robots should act. MathML Core is a language and layout specification, distinct from MathML 2/3. It’s not fully compatible with JATS XML. latexmlmath converts TeX to MathML. m|math { font-family: "Noto Sans Math", "Noto Sans" } is a popular OpenType Math font. Browsers default to native fonts: e.g. Cambria Math on windows. Explore at https://fred-wang.github.io/MathFonts/. The people working on this at arXiv are: Deyan Ginev, Fred Wang, and Norbert Preining. Their work is sponsored by NSF. There’s a PDF UA2 standard for accessibility but there aren’t enough tools to generate it. LibreOffice is now on WASM. ZetaJS provides office in the browser. Has a CDN (that was down from our IP). 35M packaged binary. 100M of in-memory file-system loaded. Useful for: Document conversion, Thumbnail generation, Text extraction, Merging / splitting documents The Poincare Conjecture says that any finite 3D blob with has no holes can be deformed into a sphere. It took until 2003 to prove it because we didn’t have the tools to manipulate 3D shapes. Playbook driven agents are another approach to agentic workflows. Simon Willison Twine (docs) is an open source interactive fiction / story writing tool. Snowman is a browser-based Twine 2 story template format. These enable behavioural experimentation. Cheaper than using tools like Gorilla.sc and Pavlovia for behavioral experiments For example, you can present a social or political issue and see if people change their opinions more or less depending on the content/path they see. Or, if it varies by demographics. Or, check if repeated mentions or emotional hooks improve memory / retention. More research ideas Techniques to reduce Docker image sizes: Native Linux mount supports overlaying directories! Lower layer is read-only. Edits (including deletions) affect upper layer only. Docker uses this. docker image inspect shows layers. Always run RUN apt-get update && apt-get [packages] rather than in separate lines. Else RUN apt-get update gets cached with OLD update cache. Defer COPY till as late as possible, and COPY minimally - since it typically invalidates the cache. Skip development dependencies and temporary caches. Docker Dive via dive [IMAGE] analyzes image details and shows the file system in each layer. Use multi-stage builds. A: Create an image using FROM some-image AS builder and do what you want. Then, after that, B: FROM scratch (or FROM node:22-slim) use COPY --from=builder what-you-want. Use distroless images from GCR. It doesn’t have shells, package managers, etc. Fewer vulnerabilities. Playwright seems to be the emerging standard for modern browser testing/automation, beating Cypress and Selenium. “Openwashing” is a term where something is termed open source but is not. Photos from FOSSASIA are public. To publish images long-term GitHub is an option. Likely to last long-term. Clone-able. Archive.org is a good too but may suffer from bandwidth constraints. Imgur remains popular but it’s unclear if it will remain unrestricted. Flickr has had a flaky history with limits and commercialization. WikiMedia Commons deletes personal uploads by first-time contributors. Only files clearly useful for a large audience are retained. This table of LLM API data protection lists what use cases each provider’s terms of service allow from a security perspective. Unsloth might be one of the simplest ways of fine-tuning. For LLM UIs, Open Web UI seems most popular. Run via WEBUI_SECRET_KEY=... uvx --python 3.11 open-webui serve Text generation Web UI is less so. KoboldAI, LMQL, LM Studio, GPT4All, etc are far behind. GPT 4o Mini is probably a 8b parameter model. Ref “SRM"s are Small Reasoning Models - like Small Language Models. Phi-4 and DeepScaleR are SRMs. Gemma 3 is a multi-modal SLM. gemini-embedding-exp-03-07 leads the MTEB and is currently the top embedding model by a big margin. Apify is a cloud scraper platform. Here’s how they optimize their AI Web agent - Source: Remove redundant tags and attributes (e.g. accessibility, etc.). Explore readability. Add a unique gid to each element. Add the screenshot WITH a “Set of Marks” - “SoM” (read research paper) highlighting important clickable elements. Code output is brittle. Use tools / DSL - e.g. visit_url(url), click_element(text, gid, tagName), etc. GenAIScript increasingly looks like a promising way to automate LLM workflows in the browser. Ollama has a Windows download Marp is my new favorite way to generate slides from Markdown. Reveal.js is not easy with Markdown (though HTML works well.) The VS Code plugin makes development very easy Marp CLI makes deployment easy. I used it for my talk on LLM Hallucinations (source). Supports all bespoke features and plugins Transitions. Requires OS animation effects to be enabled Animated SVG backgrounds are a good add-on. A mental model to consider is: each chat conversation with an LLM is a person or a personality in itself. A day in the life of a model, where its personality evolves. Bots need structured content (e.g. Markdown, XML). Humans need rich content (e.g. HTML). Here are 4 ways to serve both, roughly in increasing order of sophistication: Different URLs. E.g. https://example.org/about/ vs https://example.org/about.md (this is how Jekyll or Hugo work). Use for static sites generators. JavaScript. Inject after Markdown: <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script><script>document.body.innerHTML = marked(document.body.textContent);</script>. Use for dynamically generated static sites. URL query parameters. E.g. ?format=markdown vs ?format=html vs ?format=json. Use in APIs. Content Negotiation. Based on the user agent and Accept header, serve Markdown or HTML. Send Vary: Accept to indicate that the response depends on the Accept header. Use for dynamic web apps. Notes from The Knowledge Project: Josh Wolfe: Human Advantage in the World of AI Agent optimization might become as popular as search engine optimization in the future. APIs are likely to be replaced by just chat requests that will do the same thing. APIs might be replaced by RPA, where somebody uses a chatbot to do the equivalence instead. Today, blue-collar workers may be more protected from AI than white-collar workers. Robots still can’t serve a meal well enough and aren’t progressing as fast as AI yet. There’s a lot of tacit knowledge in craftsmanship that will take a long time for machines to replace. Margins are fleeting. The only time you have large sustainable margins is when you truly have a monopoly. Cost is going down so quickly right now that all you have to do is wait, and stuff will become available for a very affordable or even a free price. The moat is really in the data. The models are not an advantage. Engineering and services on top of that are marginal. Machines will be doing science 24/7. All of the science data that we have will probably be the biggest leverage for humanity. The discovery of penicillin, Viagra, and rubber were all serendipitous. Machines should run with a little bit of randomness to benefit from this. Tesla might have gotten away with accounting fraud on warranty claims. But short sellers are likely to be after Elon Musk. With LLMs, the value of our social network has gone up considerably. Remember: The reason we believe things is not because we have thought through and analyzed them. It’s because the people around us believe in those things. It is now practical for a person to live on forever by sharing all their thoughts into an LLM. Kids can have a “Dad AI”. One good use of meeting recordings is to see where there are biases in the conversations and where the engagement is not high enough or how there are unproductive power balances. A great virtue of college is that it allows you to break free from your previous personality. For those four years, nobody knows who you are or cares what you wear. And you can be or grow into a very different person. The more content we put in into AI or social media, the harder it is to change ourselves. People are reporting that Roo Code is better than Windsurf. Roo Code is open source. Available as a VS Code extension and run-nable via git clone Roo Code supports Computer Use. It can read files, take screenshots from a built-in browser, controls it, and reads browser console logs. Opinions are mixed. A team member reported that it takes 10 LLM queries to do what Cursor does in 2. Another reported that it does in 1 query what Cursor does in 2. Notes from Thursday AI, 6 Mar 2025 Google’s AI overviews now use Gemini 2.0. They’ve introduced an AI mode that functions like a mini deep research tool, incorporating planning and search. (A Perplexity-killer). It’s a fine-tuned model that is extra cautious with topics like healthcare and always verifies information. QWQ from Quen competes with DeepSeq R1, but with only 32b parameters compared to R1’s several hundred billion. AI models are becoming less restrictive. Gemini and GPT-4.5 have relaxed some constraints, shifting more responsibility onto users, similar to Grok. What’s GPT-4.5 good for? It seems to excel in creativity, humor, education, emotional intelligence, and teaching. It follows instructions better and understands intent better. However, it’s not a major leap in coding or math. OpenAI’s Deep Research mode always uses O3, regardless of the model selected in the UI. Tencent has released a new video model available at https://aivideo.hunyuan.tencent.com/ and it appears to be quite good. Many clients now support Model Context Protocol (MCP), including Cursor, Claude Code, and Claude Desktop. The clients list is long. Some MCP uses include: Interact with GitHub using the GitHub API. Using Knowledge Graph memory to premember previous conversations Using the Cloudflare MCP server to perform Cloudflare actions. File retrieval and custom prompts – which MCP supports in addition to tools. Calling other MCPs or LLMs (conditionally) from an MCP, enabling the creation of full-fledged workflows. Composio offers a Hosted MCP service. CloudFlare lets you build remote MCP servers. Notagen is an open-source note generation engine that produces high-quality classical sheet music. Sesame has an open-source voice model worth exploring. DiffRhythm is a music generation model that appears to be quite good. 2 pass bounding box approach. Have an LLM generate bounding boxes. Then fix it. Ethan Mollick uv tool install and uv tool ensure-path are useful commands for installing and ensuring path for tools. Simon Willison

ImageGen 3 is the top image model now

Gemini’s ImageGen 3 is rapidly evolving into a very powerful image editing model. In my opinion, it’s the best mainstream image generation model. Ever since it was released, it’s been the most realistic model I’ve used. I’ve been using it to imagine characters and scenes from The Way of Kings. For example, when I wanted to visualize Helaran’s first appearance, I just quoted the description: ...

I’m at an open Hyderabad meet-up, Thu 20 Mar 4 pm. “Analyzing data with AI agents”." It’s a public event by Hasgeek. Venue: Castlight Health, Sattva Knowledge Park. We know LLMs suck at number crunching but are good with code. I’ll share what we’ve learnt by getting it to write code to analyze data instead. Less lecturing, more interactive Q&A and demos in a cozy group. Mostly for analysts, data scientists, and programmers. Not so much for LLM researchers or managers. ...

Things I Learned - 09 Mar 2025

This week, I learned: In Jan 2025, ChatGPT included images as part of their data chat export. They also have a 30 second limit for the export. As an extensive user, my export is about 1GB which takes well over 30 seconds to download. Like many others the export option pretty much doesn’t work for me any more. Bharathi said மெல்லத் தமிழினிச் சாகும் in a poem that has been often quoted (and parodied). Here’s the context. The Zettelkasten note-taking method proposes that you: Capture: Write down every idea or piece of information on a separate note. Use your own words to ensure understanding. Organize: Consolidate fleeting notes into permanent ones. Assign unique identifiers to each note for easy reference. Connect: Link related notes to form a web of knowledge. This can be done with tags, references, or hyperlinks in digital systems. Review: Regularly revisit your notes to strengthen connections and discover new insights. I agree with almost every point on this LinkedIn post on scoring candidates for AI roles. Rob Balian Uses DeepSeek R1 or Claude 3.7 +5 points Uses Langchain -5 points Uses Langgraph +5 points (I don’t know enough to comment) Built a RAG in 2023 +3 points Built a RAG in 2025 -3 points “pinecone” -5 points (I don’t know enough to comment) “What is cursor” - 50 points no coming back from this Uses Cursor composer +10 points “You don’t need a full agent for this” +5 points Did hackathons to learn AI outside of work +5 points “We probably need to fine tune for this” -3 points unless you can explain why “Gemini is making a comeback” +3 points (I have a soft spot for Gemini) +3 points each for mentioning reasoning trace, structured outputs, MCP, chain-of-thought, prompt caching, TPM limits “Export to prompt” can be a useful feature in apps (or even as a bookmarklet). It would let you export content in an LLM-friendly Markdown format. You can paste it into an LLM and ask questions. Here are things I would find useful: Copy an entire issue (with history) from GitHub, Gitlab, or JIRA Copy an entire PR (with code changes) from GitHub, Gitlab, or Bitbucket Copy CI/CD logs from GitHub Actions, Gitlab CI, Azure DevOps, etc. Copy entire conversation thread in Gmail or Discourse, Service now etc. Copy product reviews from Amazon, Shopify, etc. Copy page(s) from wikis and content sites like Wikipedia, StackOverflow, etc. Copy survey responses from Google Forms, Typeform, etc. Copy all interactions with a contact (including interactions, proposal history) from HubSpot or Salesforce Copy transcripts from Zoom, Teams, Google Meet, etc. Copy as Markdown from Word, GDocs, PDF or HTML Copy the summary of an analysis as well as all key metrics from any dashboard Copy SAP invoices Copy JDs, CVs, and reviews from Workday, BambooHR, DarwinBox, etc. Copy design specs, component libraries, and style guides from Figma, Miro, etc. Generated with the help of ChatGPT – link not working Ancient languages tend to have fewer words for hues than brightness, since they didn’t need them. So “Krishna was blue” or “the sea is wine-dark” is more an indication of darkness than shade of color. Ajit Narayanan Mistral released an impressive OCR model. Marker from DataLab seems comparable but is CC-BY-NC-SA. MinerU convert medical textbooks to Markdown well. Gemini Flash may be more cost effective and better From How I Write with Tyler Cowen Keep researching. Use LLMs as an altemative to books and other reading material. Keep publishing what you learn regularly. While reading a chapter, keep asking the LLM. What did you think of that? What just happened there? What should I focus more on? What’s puzzling about this? How do I connect this to something else later or earlier in the book? LLM is better used to support you rather than replace you in areas of your expertise. Where you are an expert it’s best for you to be yourself and have AI fill in the gaps. Ask the AI: “What is in my writing that some people might find obnoxious? Or cold / heartless? Explain it to me in great detail.” The first input is context setting and should be really long. Use voice dictation for that instead of typing. Send your blog post to an LLM. No need to explain it. Just let it be the reader and see what it understands and doesn’t understand. His PhD students don’t have a textbook, which saves them some money. But they are required to subscribe to a large language model which ends up costing less. Today, it makes sense to use the best models and pay $200 for it if required. The differences are large. But in some years in the future, the cost of these models may come down for the free versions. Humans know secrets. AI does not. So at least in some areas, humans will have an advantage. Secrets full matter a lot more in the future. Gossip will matter a lot more. How good are you at keeping and trading secret? Travelling and meeting people will become more important. So will the value of social networks. Since everyone has access to better intelligence, the value of mobilization or being able to do things with people will have higher value. Leadership is an example. The value of your network therefore has gone up a lot. There’s more value in prompting one thing 10 times then 10 things one time. Follow up questions work better than long prompts. There are so many AI note-takers (and transcribers) these days that you are not just writing for an AI but speaking for AIs as well! Which model to use: O1 Pro is the best model. Claude does a decent job. DeepSeek is full of hallucinations but is interesting. It is more imaginative. Use O3 mini to write your prompt first, and then ask the model Use DeepSeek and other somewhat wacky high-end models once a day so that you stay in touch with what is models are capable of (beyond the conventional.) Perplexity has entirely replaced Google for many people. Anthropic’s models are the best writers. Gemini is good for long documents and hence for things like legal work. Gemini also has excellent YouTube integration and hands can directly read the transcripts. Grok is very good at fact checking tweets. Converting data into LLM consumable forms will be a huge project. Lot of a knowledge is not in such a form and a huge human project will involve this conversion. Indians do not need a visa to enter Thailand. Ref Build apps (not just content) for agents. In the next 3 to 5 years, agents will surpass humans as the top product users. Reliably creating interactive tutorials is hard today. Claude 3.7 Sonnet ran out of tokens when I tried creating an interactive tutorial on diffraction. Cursor got the tokens but failed to get the application right after 3 attempts. This is not yet reliable, and when it does become reliable, education will change a fair bit. #IMPOSSIBLE Tools and solutions should fit within existing workflows. That means almost all capabilities need to be exposed as APIs. LLMs make many different kinds of errors that are useful to differentiate between. Here are a few Model errors. The model itself makes a mistake. E.g. hallucinations, not following the prompt, etc. Context errors. The model makes a mistake because the question was out of context, or the context was missing. Input errors. The input to the model was parsed incorrectly, e.g. poor audio, poor image OCR, etc. Tool errors. The model’s tools are wrong or not good enough, e.g. Retrieval errors. Most browsers are moving away from third-party cookies. Here’s Google’s recommendation on alternatives. The simplest of these is CHIPS, which requires adding a Partitioned cookie attribute. Notes from AI Engineering Summit, NY, Day 1 An agent requires 3 things: a router, tools or skills, and memory. Agents are often sequential, but sometimes parallel execution makes sense for independent tasks that you consolidate. Always allow LLMs the option of NOT answering a question if there is no good answer. Focus prompts on the happy path. Use guard rails for edge cases. Here are a few “tools” an agent would need to call: Clarification from user Saving to memory Google search Edit a file introducing SPECIFIC changes Search in codebase using embeddings Run scripts on the shell or in a REPL (Python, Node, etc.) Run code in a new container for isolation Automatically discover, read an API documentation and use it Modify environment to enable logging and other system changes. When code is cheap, you can explore more ideas and hence design and product management need to approach things differently. We also need to reaching testing completely because it makes very different kinds of mistakes and we don’t often have an intuition You can have an agent explore all the issues and full request and recent comments against the repository and summarise it for the project manager Notes from AI Engineering Summit, NY. Session by Lux Capital. Agents make multiple LLM calls. Errors accumulate. So the quality of the model is key What’s really critical: data + context + user preference Set up evals for subjective responses by collecting signals continuously. Create scaffolding for agents where errors don’t accumulate. Better yet, make it FIX errors UX is critical. We need lots more UX styles YayText converts text to Unicode that has strikethrough, bold, italics, alternate fonts, and other interesting features. So does Unitextify, ConvertCase, and LingoJam. 10 red flags I look for as an angel investor is an interesting read. No real customers: A deck, a landing page, and a “vision” don’t impress me. Show me paying customers. Even better, show me customers coming back. No path to profitability: I don’t care if you raise $100M – if there’s no plan to make money, you’re just burning oxygen. Growth is great, but cash flow keeps you alive. Founders who won’t sell: If you’re scared to get on sales calls, that’s a red flag. The best founders sell in the early days – whether it’s to customers, employees, or investors. No differentiation: “Like X, but cheaper” isn’t a strategy. If your only edge is price, you’ll get crushed. What do you have that no one else does? No urgency: The best founders operate like time is running out. If you’re “exploring ideas” or “thinking about raising next year,” you’ve already lost. Raising money before proving anything: Too many founders try to fundraise their way out of bad ideas. If you need VC to get off the ground, you’re building the wrong business. No clear distribution strategy: Product alone doesn’t win. First-time founders obsess over features. Second-time founders obsess over distribution. How are you getting customers? No ownership mentality: If I hear “I need to hire someone to do that” too early, I’m out. Founders who win figure things out before they delegate. A CEO who can’t attract talent: Your first hires are everything. If great people aren’t willing to join, either the vision is weak – or you are. No skin in the game: If a founder won’t invest their own money or take a pay cut to make it work, why should I? By contrast, this OpenAI Deep Research report feels a lot less actionable. Inception Labs offers “Diffusion LLMs”. (No API yet.) They start with random text and refine it in parallel. The benefit is: It’s faster and cheaper due to parallellalization and better GPU use It doesn’t commit to tokens and can fix hallucinations, JSON structure errors, reasoning fallacies, etc. It’s better with multi-modal since images are diffusion based already.

How to Fake Data That Tells a Story

Fake data is usually boring if you analyze it. It’s usually uniform, with no outliers or interesting patterns. If I ask ChatGPT: Generate realistic fake tourism data using these columns: - Age - Nationality - Gender - Income - Booking_Channel - Month - Occupancy_Rate - Travel_Frequency - Spending Run the code and let me download the output as a CSV file. … the output is remarkably boring. Men & women from all countries and ages in every month visit equally. Income and spending are uniformly distributed - and the same pattern holds for all countries and ages. ...

Read from LLMs but don't tell people

In meetings, I pass on questions to ChatGPT and I read out the response. But I’ve stopped saying “I’m reading that from ChatGPT.” (By “ChatGPT”, I mean ChatGPT, Claude, Grok, Gemini, Meta, etc. I happen to use ChatGPT with O3 Mini + Search.) Use ChatGPT in meetings It’s good to bring ChatGPT into conversations. (Or any activity where intelligence helps, actually.) In meetings (online or in person), I keep a ChatGPT window open. When asked: ...

A challenge of blog questions

Thejesh tagged me with these questions. Why did you start blogging in the first place? I started my website in 1997 on Geocities at https://www.geocities.com/root_node/, mostly talking about me. (A cousin once told me, “Anand’s site is like TN Seshan - talking only about himself.” 🙂) (As an aside, I didn’t know that searching for Geocities on Google renders the results in Comic Sans!) I wanted a place to share the interesting links I found. Robot Wisdom by John Barger and Scripting News by Dave Winer were great examples: collection of interesting links updated daily. ...

For a second, I thought I’d been hacked. If you search for “Geocities” on Google, it shows the results in a Comic Sans font. https://www.google.com/search?q=geocities Turns out it’s an easter egg by Google that’s about 8 months old. Are there any Geocities homesteaders here? Or is LinkedIn too new for the old-timers? LinkedIn

Things I Learned - 02 Mar 2025

This week, I learned: Proxmox Virtual Environment is an open-source alternative to VMWare, Hyper-V, Citrix XenServer, etc. (There’s nothing there that prompts me to explore it further.) With Podman on Windows (a Docker equivalent), many Docker-enabled tasks become easier. For example, running PostgreSQL is as easy as: podman run -d --name postgres -e POSTGRES_PASSWORD=postgres -p 5432:5432 postgres:latest podman exec -it postgres psql -U postgres -c "CREATE DATABASE mydb;" Bad deep research prompts are: vague/broad, under-specified or ambiguous. In short, the more you know what you want, the better. Iterate until then. What kind of reports do clients are research companies to produce? I was curious to see if Deep Research can replace these. Here are a bunch of ideas. ChatGPT Strategy & Management Consulting Research (McKinsey & Company, Boston Consulting Group, Bain & Company, Strategy&, Accenture Strategy) Produce a comprehensive strategic transformation report for a Fortune 500 consumer goods company. Analyze global market trends, competitor strategies, and actionable growth recommendations, including case studies and source citations. Generate an in‐depth study on corporate restructuring trends in emerging markets. Focus on successful turnaround strategies, CEO leadership factors, and strategic pivots, with a comparative analysis of key players. Create a report on M&A trends in the technology sector over the past five years. Detail deal drivers, integration best practices, and forecast future acquisition opportunities, citing relevant data. IT & Technology Research Analysts (Gartner, Forrester Research, IDC, 451 Research, Ovum) Produce a market assessment report on emerging cloud computing platforms. Include vendor evaluations, adoption forecasts, and key technology drivers with supporting data and charts. Generate an in‐depth cybersecurity trends report for enterprise IT. Analyze recent threat vectors, defense strategies, and best practices for risk mitigation, providing actionable recommendations. Create a comprehensive study on the impact of artificial intelligence in enterprise software. Include competitive benchmarking, technology adoption rates, and forecasted market changes. Marketing & Consumer Research (Nielsen, Kantar Group, Ipsos, GfK, Euromonitor International) Produce a consumer behavior analysis report for a leading retail brand. Identify key demographic shifts, purchasing trends, and brand loyalty factors, and provide actionable insights with data visualizations. Generate a detailed report on digital media consumption trends among millennials, incorporating survey results, social media analytics, and case studies of successful campaigns. Create a market segmentation report for a new consumer electronics launch. Identify key consumer segments, behavioral drivers, and media usage patterns with clear recommendations. Financial Investment Research (Goldman Sachs, JPMorgan Chase, Morgan Stanley, Morningstar, Keefe Bruyette & Woods) Produce an equity research report on mid-cap technology stocks. Include detailed financial modeling, valuation analysis, and buy/sell/hold recommendations with supporting data and charts. Generate a fixed income analysis report for corporate bonds in the industrial sector. Assess credit risk, yield forecasts, and macroeconomic influences, citing key data sources. Create a comprehensive report on global market trends impacting investment banking. Analyze regulatory changes, market sentiment, and performance metrics of leading financial institutions. Healthcare Research (IQVIA, Frost & Sullivan, Evaluate Ltd, Deloitte Healthcare, IMS Health) Produce a market analysis report on emerging biotechnologies in oncology. Include competitive landscape, regulatory challenges, and growth forecasts with relevant case studies. Generate a comprehensive report on patient satisfaction and telemedicine adoption trends. Analyze survey data from leading healthcare providers and benchmark best practices. Create a detailed study on pharmaceutical market dynamics in emerging economies. Focus on pipeline developments, regulatory environments, and market potential with actionable insights. Legal Research Providers (LexisNexis, Westlaw, Bloomberg Law, Fastcase) Produce a legal risk assessment report on the impact of recent data privacy regulations for multinational corporations. Include case studies, trend analysis (2019–2024), and strategic recommendations. Generate a comprehensive report summarizing key federal and Supreme Court rulings on intellectual property rights over the past five years, highlighting trends and divergent interpretations. Create a detailed report on the evolution of securities law and its effect on investment research practices, incorporating analysis of recent litigation and regulatory updates. Media & News Research (Factiva, Kantar Media, Comscore, Cision) Produce a media consumption trends report that analyzes audience behavior shifts across digital, TV, and print platforms. Include data visualizations, key drivers, and forecasted trends. Generate a comprehensive report on the impact of social media on traditional news reporting, with case studies and a comparative analysis of engagement metrics. Create a detailed study on the effectiveness of multimedia advertising campaigns, evaluating ROI, consumer engagement, and best practices with actionable insights. Economic & Industry-Specific Research (Economist Intelligence Unit, BMI Research, IHS Markit, Consensus Economics) Produce a macroeconomic outlook report for emerging markets, including GDP, inflation, and employment forecasts, with detailed data analysis and visualizations. Generate an industry analysis report on the automotive sector, covering technological innovations, competitive dynamics, and consolidation trends. Create a comprehensive country risk assessment report for a target region, detailing political, economic, and regulatory factors with recommendations for investors. Human Resources & Employee Engagement Research (Gallup, Great Place to Work, Mercer) Produce an employee engagement report for a multinational firm based on recent survey data. Identify key drivers of satisfaction, retention challenges, and improvement recommendations. Generate a comprehensive study on the impact of remote and hybrid work models on employee productivity across industries, including best practices and benchmark data. Create a detailed report on workplace culture transformation, analyzing organizational behavior trends, employee feedback, and actionable strategies to boost engagement. Environmental, Social & Governance (ESG) Research (MSCI ESG Research, Sustainalytics, ISS ESG, Bloomberg ESG) Produce an ESG performance report for a portfolio of global companies. Include sustainability scores, risk assessments, and recommendations for improvement with data visualizations. Generate a comprehensive study on the impact of climate change regulations on the energy sector, including policy analysis, market forecasts, and strategic implications. Create a detailed report on corporate social responsibility trends in the consumer goods industry, incorporating qualitative and quantitative analyses with actionable recommendations. Education & Academic Research (RAND Corporation, National Center for Education Statistics, HolonIQ) Produce an analysis report on the future of online education, examining technological adoption, market growth projections, and student outcome trends with supporting data. Generate a comprehensive study on the effects of educational policy reforms on public school performance in the U.S., including trend analysis and actionable recommendations. Create a detailed international higher education trends report, covering tuition dynamics, international student mobility, and emerging academic programs with comparative data. Real Estate & Property Research (CBRE, JLL, CoStar Group, Cushman & Wakefield) Produce a commercial real estate market analysis report for major urban centers, including occupancy trends, rental rate forecasts, and investment opportunity assessments. Generate a comprehensive study on residential housing market dynamics in emerging economies, focusing on affordability, supply-demand gaps, and policy impacts. Create a detailed report on the impact of urban redevelopment projects on local real estate values, including case studies, forecasts, and strategic recommendations. Energy & Natural Resources Research (Wood Mackenzie, Rystad Energy, Bloomberg New Energy Finance) Produce an analysis report on global renewable energy trends, covering technology adoption, market forecasts, and key policy drivers, with detailed data and visuals. Generate a comprehensive commodity price forecasting report for oil, natural gas, and key metals, incorporating historical trends, risk assessments, and predictive modeling. Create a detailed report on energy transition strategies for traditional energy companies, focusing on clean technology investments and market adaptation strategies. Supply Chain & Logistics Research (ARC Advisory Group, Gartner Supply Chain Research, Supply Chain Insights) Produce a report on supply chain resilience for global manufacturers. Analyze risk factors, digital transformation impacts, and best practices for operational efficiency with supporting data. Generate a comprehensive study on the impact of technology on logistics networks, including case studies on digital optimization and cost reduction strategies. Create a detailed report on emerging last-mile delivery solutions, assessing innovations, consumer expectations, and scalability with actionable insights. Cybersecurity & Information Security Research (KuppingerCole, Forrester Security, IDC Cybersecurity, Cybersecurity Ventures) Produce an in-depth report on emerging cybersecurity threats for large enterprises, including detailed analysis of recent incidents, risk vectors, and defense strategies. Generate a comprehensive cybersecurity market landscape report, evaluating vendor performance, technology forecasts, and best practices for mitigating risks. Create a detailed report on regulatory compliance trends in information security within the financial services industry, with case studies and strategic recommendations. Social Media, Digital & Online Research (Comscore, SimilarWeb, Brandwatch) Produce a digital audience behavior report for a global brand, focusing on social media trends, engagement metrics, and platform performance with detailed data analysis. Generate a comprehensive analysis of influencer marketing effectiveness across digital channels, including ROI metrics, case studies, and best practices. Create a detailed report on online brand sentiment analysis, incorporating social listening data, trend forecasts, and actionable recommendations. Public Opinion & Political Research (Pew Research Center, Gallup, YouGov) Produce a public opinion polling report on voter sentiment ahead of a major election. Include demographic breakdowns, key issue analysis, and trend visualizations for the past five years. Generate a comprehensive study on political risk in emerging markets, analyzing historical data, current trends, and future projections, with policy recommendations. Create a detailed report on the influence of media on public policy, using survey data, social media analysis, and comparative case studies. Sports, Entertainment & Media Research (Nielsen Sports, Sportcal, Kantar Media Sports) Produce a market analysis report on sports sponsorship trends, detailing viewership metrics, brand engagement, and investment ROI with industry case studies. Generate a comprehensive report on audience behavior in the streaming media industry, including demographic insights, consumption trends, and competitive benchmarks. Create a detailed analysis of digital advertising effectiveness in the entertainment sector, including segmentation data, ROI analysis, and strategic recommendations. Innovation, R&D & Technology Trends Research (Innosight, Frost & Sullivan Innovation, CB Insights) Produce a global R&D investment trends report, analyzing technology spending, innovation indices, and the impact on market growth across key industries. Generate a comprehensive study on disruptive technologies in manufacturing, including competitive analysis, market potential forecasts, and adoption trends. Create a detailed report on emerging innovation hubs worldwide, focusing on startup ecosystems, funding trends, and collaborative opportunities in technology. Agriculture & Agribusiness Research (Rabobank Agribusiness Research, USDA Economic Research Service, AgFunder) Produce an analysis report on global agricultural market trends, including crop yield forecasts, trade dynamics, and policy impacts, with data visualizations. Generate a comprehensive study on agritech innovations such as precision farming and sustainable practices, including case studies and market forecasts. Create a detailed report on the impact of climate change on food production and supply chain stability in agribusiness, with risk assessments and strategic recommendations. Environmental & Climate Change Research (Carbon Trust, IHS Markit Energy Transition, Bloomberg New Energy Finance) Produce a report on the economic and social impacts of climate change on urban infrastructure, including forecasting models and policy recommendations. Generate a comprehensive study on national climate policies and their effects on industrial competitiveness, with detailed trend analysis and source citations. Create a detailed report on corporate sustainability initiatives, assessing environmental risk management practices and providing actionable recommendations for improvement. Customer Experience (CX) & User Experience (UX) Research (Forrester CX Research, Gartner CX Research, Qualtrics, Nielsen Norman Group) Produce a report on customer journey mapping for a leading retail brand, identifying key touchpoints, pain points, and actionable improvement strategies with data visualizations. Generate a comprehensive study on digital user experience trends for e-commerce platforms, including usability testing insights, design best practices, and conversion optimization recommendations. Create a detailed report on customer satisfaction and loyalty metrics across multiple industries, integrating survey data and actionable recommendations to enhance overall CX. Blockchain, Cryptocurrency & Fintech Research (Chainalysis, CoinDesk Research, Deloitte Fintech Research, CB Insights) Produce an analysis report on emerging blockchain technologies and their applications in financial services, including market trends, adoption forecasts, and case studies. Generate a comprehensive study on cryptocurrency market dynamics, analyzing regulatory developments, investor sentiment, and competitive landscapes with source citations. Create a detailed report on fintech disruption in traditional banking, with case studies on leading startups, technology adoption, and future market forecasts. Venture Capital, Startup & Private Equity Research (PitchBook, CB Insights, Crunchbase, Preqin) Produce a global venture capital investment trends report, including performance analysis of high-growth startups, sector benchmarks, and emerging market opportunities. Generate a comprehensive study on private equity market dynamics, covering deal flow analysis, exit strategies, and forecasted trends with supporting data. Create a detailed report on emerging startup ecosystems in key regions, highlighting funding trends, investor activity, and growth potential with actionable insights. Operations Research & Management Science Consulting (The Brattle Group, NERA Economic Consulting, CRA International) Produce a report on optimization techniques for operational efficiency in large-scale manufacturing, including quantitative analysis, simulation models, and case studies. Generate a comprehensive study on the application of predictive analytics in supply chain management, focusing on data modeling, process improvements, and actionable insights. Create a detailed report on advanced quantitative modeling approaches to solve complex business problems in logistics and operations, including scenario analysis and recommendations. Cultural & Social Research (Ethnographic/Sociocultural Studies) (Ipsos MORI, Kantar TNS, YouGov) Produce a qualitative ethnographic study on urban consumer lifestyle trends, incorporating field observations, interviews, and cultural analysis with actionable insights. Generate a comprehensive study on how cultural shifts influence global brand perception, including comparative case studies and trend analysis. Create a detailed report on sociocultural dynamics and consumer behavior in emerging economies, integrating in-depth field research and actionable recommendations. Economic & Demographic Research Firms (Oxford Economics, The Conference Board, CEIC Data) Produce a macroeconomic forecasting report for a specific region, including GDP, inflation, and employment trends with detailed data visualizations and source citations. Generate a detailed demographic analysis report for a target market, highlighting age distribution, income levels, and consumption patterns with actionable insights. Create a comprehensive report on the economic impact of demographic shifts on consumer markets, with policy recommendations and trend analysis. Academic & Think Tank Research Organizations (Brookings Institution, RAND Corporation, Carnegie Endowment for International Peace) Produce a policy research report on global governance challenges and their implications for economic development, including case studies, literature reviews, and expert interviews. Generate a comprehensive study on social inequality and its effects on public health and education outcomes, supported by empirical research and trend analysis. Create a detailed report on emerging trends in international relations and their impact on global trade and security, integrating academic research and data analytics. Market Research Technology & Software Providers (Qualtrics, SurveyMonkey, Confirmit) Produce a report on the latest innovations in survey technology and data analytics software for market research, including product comparisons, user case studies, and future trend forecasts. Generate a comprehensive study on the integration of AI and machine learning in consumer insights platforms, highlighting case studies, performance metrics, and industry benchmarks. Create a detailed report on digital transformation trends in market research technology, featuring analysis of leading software solutions, market share data, and recommendations for technology adoption. When evaluating inputs, models tend to prefer the first response, prefer their own response, and prefer longer responses. ThursdAI Real-time speech-to-text options for transcription: Deepgram has a MediaRecorder API, which is perfect. Whisper Streaming Web is a web app that can transcribe audio real-time from the browser. A good approach, but I wouldn’t use it for meeting transcription on my mid-end laptop. Streaming takes up the bulk of my GPU, leaving little for transcription. whisper-live runs as a Python console app and does something similar. Whisper WebGPU runs on the browser (only 200MB). Cool! But slow and still takes up GPU. Mini-omni is an open-source Qwen-based LLM that can hear and talk while thinking in real-time. An interesting experiment, but not for prototyping. OpenAI shares an insights report with clients that has insights on what different professions search for. What doctors search for is: Is my diagnosis right? How do I read this report? Is my prescription correct? Is there a cheaper medicine? What’s the life expectancy given these symptoms? Dataclasses in Python have a slight overhead over named tuples. The 2 main uses I see for them are: providing defaults and offering type hints. UVB 76 is a radio channel has been broadcasting static (with occasional Russian conversation) since 1976. No one knows why. It’s live at https://m.youtube.com/watch?v=8h_D2P0iqMk Romans washed clothes in urine. The government taxed the purchase of urine for commercial purposes! That’s the origin of the phrase “Pecunia non olet” which means “money doesn’t stink”. Nix is a package manager that creates container-like environments. Like a cross between Docker and apt / venv. It has an immutable file system. DevBox is a higher-level tool built on top of Nix that streamlines developer workflows, e.g. common project environment setup. VS Code can be used to develop inside a Docker container via Podman, too. Set dev.containers.dockerPath": "podman" Ref Rill Data is an interesting BI tool based on DuckDB. It auto-generates a dashboard given a dataset. It’s possible to assign “variables” in SQL (notably in DuckDB). Here’s an example: WITH sessions AS (FROM events SELECT COUNT(DISTINCT session_id) AS value), pages AS (FROM events SELECT COUNT(*) AS value) FROM sessions, pages SELECT sessions.value / pages.value AS pages_per_session; DuckDB has a GROUP BY * that groups by all categorical columns. SELECT x, y, COUNT(*) FROM t GROUP BY * is equivalent to SELECT x, y, COUNT(*) FROM t GROUP BY x, y. VS Code can be used as a code executor by adding {"key": "shift+enter", "command": "workbench.action.terminal.runSelectedText", "when": "editorFocus"} to the keybindings.json file. Press Shift-Enter to run the selection on the terminal. Useful for DuckDB, SQLite, etc. Ref LLMs are excellent at database migration. They can convert schemas and queries across SQL dialects (e.g. BigQuery to DuckDB, etc.) at 90%+ accuracy. This is useful when clients want to migrate cloud providers, go from on-prem to cloud, or reduce cost by switching databases.