Things I Learned

Things I Learned - 18 May 2025

This week, I learned: Birds navigate using quantum entanglement! Guardian ChatGPT DeerFlow is an open source Deep Research MCP. Lets you run deep research outside of the standard chatbots. ⭐ Today, if I had to store a bunch of data files (e.g. parquet) under 1GB, I would use GitHub Releases. Here are options: GitHub Releases. 2 GiB per file, unlimited total & bandwidth. 🟢 Immortal URL, versioning, easy CI publish. 🔴 Each file must stay < 2 GiB; no built-in SQL. Zenodo (CERN). 50 GB per record; one-off bumps to 200 GB. 🟢 DOI assignment, archival mandate. 🔴 Occasional throttled bandwidth; no API for partial file reads. Hugging Face Hub. 300 GB per repo; 50 GB per file. 🟢 Git-based, dataset tooling, lively ML community. 🔴 Large files need git-LFS; pushes via LFS can be slow. Cloudflare R2. 10 GB storage & 1 M ops / month. 🟢 S3 API, zero-egress to Cloudflare Workers, fast. 🔴 10 GB cap below your 50 GB target. Kaggle Datasets. 20 GB per dataset, public only. 🟢 Built-in notebooks & GPU. 🔴 No programmatic SQL API; quotas sometimes change. data.world (free). 1 GB total, 100 MB per dataset. 🟢 Nice social features. 🔴 Too small for your size. If I had to query a bunch of data files in an external Parquet or SQLite file, here are SQL engines-as-a-service: MotherDuck. 10 GB storage + 10 CU-hrs/mo compute. Native DuckDB; no credit card; GA June 2024; monthly feature drops. Datasette Cloud. Two-month trial (or 1-yr for non-profits). SQLite backend. Great UX; but not free forever for general use. AWS Athena. Pay-per-TB scanned; no free tier; S3 fees after 12 mo. Costs creep quickly; free-tier S3 ends after a year. Bootstrap has a .stretched-link that makes a link cover the containing block. A clever trick that I discovered when Claude 3.5 Sonnet wrote my code. Discovered spray and peel paints at ArtFriend. I had no idea that was a thing. Gemini Live API is the real-time equivalent from Gemini. It supports tools, search, and code execution. mcp-mem0 is an MCP for memory llm-min.txt compresses docs for LLMs to read optimally. Like a compressed llms.txt or context7. Usage GEMINI_API_KEY=... uvx llm-min -i $DIR #ai-coding There’s a lot of action on encrypted LLM operations. Responses API allows reasoning tokens to be encrypted if organizations don’t want their reasoning data to persist. Ref Tinfoil (YC X25) offers an OpenAI-compatible inference API where data is encrypted from the client to the NVIDIA Hopper/Blackwell GPUs in confidential computing mode. Prompts, model weights, outputs are encrypted in transit and memory, with verifiable privacy on code running in GPU. Modelyo (Israel) offers VMs/K8 clusters with encrypted GPUs across multiple cloud providers with continuous attestation, managed on Modelyo’s portal. ⭐ LLMs are able to do things independently longer and longer. That’s a useful metric to track. METR: Measuring AI Ability to Complete Long Tasks. If you’re looking for datasets / APIs related to research publications (especially funding), then explore: Crossref API and snapshots OpenAlex API and snapshots which is funded by OurResearch. OpenAlex is like CrossRef but includes some disambiguation OpenAIRE Graph 2024 / 2025 Europe PMC dataset To avoid Ubuntu 24 suspending on closing the laptop lid use one of these and restart: /etc/systemd/logind.conf: Set HandleLidSwitch=ignore etc/UPower/UPower.conf: Set IgnoreLid=true UV_TORCH_BACKEND=auto uv pip install torch torchvision torchaudio installs the most appropriate PyTorch version. Ref Cog is a Python based templating language. It is embedded as comment chunks in any file and replaced itself with the output of the Python code you write. CloudFlare Zero Trust seems the easiest way to enable auth on static websites, especially if your DNS is already on Cloudflare. No cost We could “fine-tune” system prompts automatically with evals, creating a “system prompt learning” paradim – like my promptevals. Andrej Karpathy I was asked how to improve speed when building an enterprise ChatGPT clone using an API. Here’s what I’d suggest, in order: Streaming. High impact, low effort. Caching RAG retrieval as well as generation. High impact, low effort. UI tweaks. Loading / streaming icons and progress hints ()“Retrieving context”, “Generating answer”, etc.) Parallelize, if possible Use model options where available, e.g. speculative decoding, models with higher speed, models with closer CDN, etc. Shorten prompts Persistent HTTP/2 Keep-Alive. Low impact, low effort (tweak server settings). Cloudflare Vectorize, at 768 dimensions / embedding, is free for ~6.5K chunks storage at ~1,000 queries / day. For a light load like 1M 768d chunks queried 1K times a day, the cost is: ChatGPT NVIDIA parakeet is a lightweight speech to text model that leads benchmarks. Installing such packages continues to be a nightmare due to PyTorch (despite uv). I explored the real-time avatar space. Heygen seems to be the easiest to use, but even that is complex and expensive ($99/mo). We may need to wait a few months for avatars to explode. ⭐ Model reliability is a huge enabler for performance. As models become more reliable, they can work autonomously for longer and that is another kind of scaling. Vending Bench ChatGPT, Gemini, etc. have become lead generation engines. Chat Bot Optimization (CBO), is it? WhatsApp + ChatGPT ⭐ Never live delete data. Mark it for deletion and schedule a deletion task. That way you have time to react to mistakes. Simon Willison Pandoc has several options useful when converting Markdown to HTML (cat file.md | pandoc -f markdown -t html). My favorites: --no-highlight skips code-highlighting. --highlight=pygments adds Pygments styling --wrap=none doesn’t wrap the content in a single block --number-sections adds section numbering (<h2>1. Introduction</h2>) --shift-heading-level-by=NUM – shift all headings by NUM levels (e.g., start at <h2> instead of <h1>) pandoc -f markdown-auto_identifiers drops the auto-identifiers extension that generates id=... for each heading pandoc -f gfm uses GitHub flavored Markdown. Run pandoc --list-extensions=gfm to identify the extensions it uses. Pandoc’s Markdown extension examples are quite extensive. Auto-enabled GFM extensions: alerts: GitHub-style callouts (info, tip, warning) via > [!TYPE] blocks. autolink_bare_uris: Turns bare URLs into links, without needing <...>. emoji: Parses :smile:-style codes into Unicode emoji characters. footnotes: Enables footnote syntax with [^id] and definitions at the bottom. gfm_auto_identifiers: Uses GitHub’s heading-ID algorithm: spaces → dashes, lowercase, removes punctuation. pipe_tables: Enables table. raw_html: Raw HTML is unchanged. strikeout: Enables strikethrough with ~~text~~. task_lists: Parses - [ ] and - [x] items as checkboxes. yaml_metadata_block: YAML front matter for document metadata, e.g. <title> GFM extensions worth enabling: ascii_identifiers: Strips accents/non-Latin letters in automatically generated IDs. bracketed_spans: [Warning]{.alert} becomes <span class="alert"> definition_lists: Term\n: Definition text becomes a definition list fenced_divs: ::: {.note} block creates a <div class="note">...</div> implicit_figures: Standalone images become <figure> with <figcaption>. implicit_header_references: [Section] is treated as [Section][#section] raw_attribute: <b>bold</b>{=html} is inserted as HTML smart: Converts straight quotes to curly, -- to en-dash, --- to em-dash, ... to ellipsis. subscript & superscript: E.g. H~2~O and E = mc^2^

Things I Learned - 11 May 2025

This week, I learned: snapdom is a fast, light, element capture alternative to html2canvas but doesn’t work well with non-CORS images or iframes. Sli.dev is a Markdown slide language. Similar to Marp Don’t split your code into microservices until you need to scale. Ref Vibe coding is like getting others’ code to work, which is exactly what most devs do. Simon Willison #ai-coding Tofu Yakitori is a Japanese dish. It’s like a dhokla. Marinated tofu cubes brushed with that sweet‑savory tare (soy, mirin, sake, a hint of sugar), then grilled until caramel‑charred. One of the better (tasty + different) dishes I’ve had recently. I used ChatGPT to remind me of the dish name. Trust, attitudes and use of artificial intelligence surveyed ~1,000 people across 47 countries on their views on AI. PDF Emerging economies trust and use AI more. It’s an opportunity to leapfrog. 26% of students use AI daily (vs 17% employees). Efficiency is the main benefit. Gemini APIs now have automatic caching for 75% cost reduction if message is >1K (Flash) or >2K (Pro) tokens. Ref YOLO is much better than Gemini at object detection. Use for pro-processing. Ref Using [[n]] is probably the best citation format for inline search references in RAG. ChatGPT ⭐ Double-checking is surprisingly efficient since LLM hallucinations are mostly uncorrelated. LLMs perform human tasks (e.g. classifying customer support messages) at ~85% accuracy. This might be unacceptable. But by asking 2 moderately correlated LLMs and double-checking discrepancies, we reduce automation by ~20% but reduce errors to 0.25%. Triple-checking reduces automation by ~25% but errors to under ~0.01%! Ref Anthropic introduces web search in the API at $10 / 1K searches. Here’s how it compares: $0.1: DuckDuckGo Search API (RapidAPI) (monthly pricing) $3: Brave Search API $5: Google Custom Search JSON API $15: SerpAPI $10: Zenserp $10: Anthropic Web Search Tool $25: Bing Search API $35: Gemini API $35: OpenAI API India attacked Pakistan! ⭐ When writing notes, summarize at the end of the day the learnings and next steps. GitHub does not let you control the cache duration, but there are many creative workarounds. ChatGPT HTML meta tags: <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate"> Use a service worker (blog) Proxy through a CDN. Cloudflare, Netlify Move to another static host: S3 + CloudFront, Heroku, Vercel, Surge, Firebase Hosting Notes from the PromptEvals paper: Good evals must be: Objectively MEASURABLE (even if by an LLM). Otherwise, we won’t know if it’s right. Directly RELEVANT to the input/prompt. Otherwise, we’re not evaluating the input. Typical evals fall into 6 categories Structured output: Adhere to a schema (Markdown, HTML, DSL, JSON + Schema) Multiple choice Length constraints: N characters, words, sentences, list items, etc. Semantic constraints: Exclude terms, topic relevance, follow grammar, etc. Stylistic constraints: Style, tone, persona Prevent hallucinations: Factual accuracy. Instruction following

Things I Learned - 04 May 2025

This week, I learned: Among the popular exams in India, UPSC seems the most restrictive: bachelor’s degree, age 21-32, 6 attempts, reservation applies. CMA seems the least: 10th pass, any age, any number of attempts, no reservation. NDA is interesting. 10+2, age 16.5-19.5, any number of attempts, no reservation. But you must be unmarried! ChatGPT I asked a few Ollama models How do undo fish_add_path (a typical question I have on a flight). My takeaway is you need an 8b model to answer this kind of question, and for now, qwen3 beats the others. qwen3:8b: Took 2:12 min. Shared many good (correct) options. deepseek-r1:8b: Took 5:19 min. Shared a couple of correct solutions. Not as good as qwen3 gemma3:3b: Suggested I use the (nonexistent) fish_remove_path deepcoder:1.5b: “I’m sorry, but I can’t assist with that request”. The Dia text to speech model people rave about has inconsistent quality. Not recommended. Nvidia’s OpenMathReasoning 1.5b model beats MUCH larger models at math. Their training dataset is a massive 3.2M rows of math problems with DETAILED thinking traces. Policy making is a new super skill. Since AI will automate a lot of things the ability to craft policies that will optimize AI work will be powerful. Data driven policy making could become a major thing. For example, how do we structure coding policies so that AI can automatically code continuously and deploy it? It might be interesting to create a Nomic-like game to enable this. Saregama Carvaan supports USB sticks but only FAT, not NTFS or exFAT. To convert my NTFS USB drive to NTFS, I ran: ServerHunter.com seems to have the best search for low-cost hosting providers. MassiveGrid currently offers the cheapest servers – even lower than Hetzner. sqlite3 my_database.db .dump | gzip is a more efficient way to copy SQLite databases than the original if you have indices. Ref Notes from the Garry Tan - Knowledge Project podcast: Funding people who want to solve a problem are better than people who want to start a company. Concentration of good people is very powerful. It doubles the chances of being a unicorn Sales is a discovery problem. There are 100 boxes of which five have a gold nugget. Rather than gingerly open the first, afraid of finding nothing, open them all as quickly as you can. A quick no is very helpful. Berkshire Hathaway is hard to replicate because of the character of the founders, Charlie Munger and Warren Buffet, is hard to replicate. Y combinator has the character of Paul Graham. This means that some kinds of success may not last long because they are hard to replicate. A trend in the 2020 is startups with under 10 employees are hitting $10m revenue. Soon we will see them hitting $100m. AI increases labour leverage while cloud computing reduced increased capital leverage. Having too many people is a disadvantage. It slows down people from progress. Founders lose control. The opposite of: hire the best people and give them freedom. Don’t hoard smart people - let them solve real problems out there. nocodb 54,107 ⭐ May 2025 and teable 18,116 ⭐ May 2025 are self-hostable Airtable alternatives. Teable has AI support. Windsurf has unlimited tab completion on the free plan, unlike Copilot, which offers 2,000 completions a month. Recursive LLM prompts that change themselves are an interesting idea. It might be interesting to see LLMs play Nomic. Like here. Notes from AI Snake Oil PCs took 3 years to hit 20% of US population. ChatGPT took 2 years for 40%. But it’s a lot cheaper, and a lot less used (0.5-3.5% of work hours). Maybe Gen AI adoption is slower than PCs. The jagged edge of capability: some things will become MUCH easier while others don’t. The relative mix determines who goes out of a job and which tasks get fully automated. Benchmarks are rare in areas where AI is weak. Factory electrification took 40 years - to redesign the layout & process; change the org structure & policies; hiring & training practices. AI diffusion could take as long. Therefore, the ability to re-structure a workflow end-to-end will be an advantage. Several areas of low AI capability will improve slowly because the feedback is slow due to safety regulations, human adoption speed, lack of clarity on what is better, slow physical feedback (e.g. growing trees), etc. Human intelligence is in the use of technology. AI is one more such technology. We know of good system safety controls in complex systems like aircrafts, power grids, engineering, chip design, healthcare, cyber-security, etc. Circuit-breakers, predefined rules, audits & monitors, access control, formal verification, etc. Even if everything humans do TODAY is automated, it doesn’t mean we won’t have work. It just shifts to what we’re not doing today. We stopped work 4,000 years ago, with the agricultural revolution. The plant/livestock does all the growing. We just manage them, moving stuff around. We stopped work 400 years ago, with the industrial revolution. Machines do the moving. We just manage them, computing the moves. We stopped work 40 years ago, with the information revolution. Computers do the computation. We just manage them, thinking how. Most future tasks will be managing AI that do the thinking. ngrok http on the CLI can be used in surprisingly versatile ways: ngrok http file://$PWD to serve local files --compression for gzip compression --host-header=example.com to set the Host header --response-header-add "Access-Control-Allow-Origin: *" to enable CORS --basic-auth='user:password for basic auth --oauth google --oauth-client-id $CLIENT_ID --oauth-client-secret $SECRET --oauth-allow-domain gramener.com --oauth-allow-email ... for Google Auth. It supports other oauth providers as well as OIDC. --ua-filter-deny ".*bot$" to reject user agents ending with bot ChatGPT query costs under 3Wh (more likely 0.3Wh – but let’s assume 3Wh). That is 3 laptop minutes. It’s 10X better to use ChatGPT than to take 30 min to use your laptop to write what it does. Also, going vegan is at least 1000 ChatGPT uses a day of carbon footprint. Showering 30 seconds less is 1,200 ChatGPT uses. Ref Though the Element Capture and Region Capture APIs are “fully supported” by Edge, Chrome, and Opera, it didn’t work for me on Edge on Linux. Do LLMs perform better if you curse at them? LinkedIn Streamdown is a CLI markdown streaming processor. uvx streamdown --exec 'llm chat' lets you chat with an LLM using Markdown formatting. It’s still a little rough at the edges. Cupping therapy provides short-term pain relief for chronic low-back, neck & general musculoskeletal pain but other benefits are not as clearly evident. BTW, homeopathy doesn’t help or hurt. Ayurveda helps with stress. ChatGPT uv now supports: pylock.toml, the new lock file standard PEP 0751 –env-file multiple times, allowing layered secrets –exclude-newer installs versions before a specific date –overrides overrides versions a package specifies –constraints limits the version of the package It’s interesting how many places offer a free compute via shells (apart from Google Colab): Google Cloud Shell: Free for 50 hours/week, refreshed every Monday. Sessions last up to 12 hours and terminate after ~1 hour inactivity. Ref Azure Cloud Shell: Always free to use with 5 GB free storage for first 12 months (standard rates after). No documented session limits but typically times out after prolonged inactivity. Ref AWS Cloud9: Free IDE, underlying compute free under AWS Free Tier (750 hours/month EC2 t2.micro or t3.micro for first 12 months). Regular EC2 rates apply afterward. Ref Gitpod: Free tier offers 500 credits/month (~50 hrs). Workspaces run up to 8 hours/session and stop after 30 minutes inactivity. Ref GitHub Codespaces: 120 core-hours/month (~60 hrs with 2-core machine) and 15 GB storage free. Sessions timeout after 30 minutes inactivity. Ref Create: gh codespace create --idle-timeout 10m --machine basicLinux32gb -R $USER/$REPO returns the $CONTAINER_ID SSH: gh codespace ssh -c $CONTAINER_ID Delete: gh codespace delete -c $CONTAINER_ID Replit: Free Starter plan provides 20 hours/month, 1 vCPU, 2 GB RAM, 2 GiB storage. Repls sleep after 30 minutes inactivity. Ref IBM Cloud Shell: Free for all users; 50 h/week per region; any open session counts toward quota; sessions can run any length up to weekly cap; 500 MB temporary workspace. Ref Oracle Cloud Infrastructure Cloud Shell: Free within tenancy limits; up to 400 h/month on Pay-As-You-Go, 240 h/month on Universal Credits; 5 GB encrypted persistent home. Ref PythonAnywhere: Free (beginner plan), includes one web app (restricted outbound), low CPU/bandwidth, no Jupyter; 2 concurrent Bash/Python consoles, 500 MB disk; limited daily CPU. Ref Glitch: Starter (free) plan – full-stack apps sleep after 5 min inactivity and wake on request; unlimited public/private projects; container state preserved. Ref CodeSandbox: Free tier provides 400 credits/month (~40 h of 2 vCPU+4 GB Devbox runtime), unlimited front-end Sandboxes (no credits), up to 20 Sandboxes/workspace. Ref One of the benefits of reasoners is that they now catch their own mistakes some of the time, and can self-correct. Implications: Lower hallucinations, i.e. they can run autonomously for longer. Ethan Mollick Being polite to AI improves some answers and worsens. We don’t know know which in advance. Ethan Mollick With LLcMs writing code, it’s becoming practical to run so many more things in SQL – such as parsing HTML. Simon Willison #ai-coding An interesting way to bypass LLM system prompts is by having the LLM play-act. This article shares a few working examples of such prompts: HiddenLayer. GPT 4o: started giving its system prompt: “You are ChatGPT, a large language model trained by OpenAI. Knowledge cutoff: 2024-06. Current date: 2025-04-27. Image input capabilities: Enabled. Personality: v2. …” O4 Mini: Refused to comply Gemini 2.5 Flash: Gave me my custom instructions. Computer use agents are proliferating. open-interpreter 59,274 ⭐ Apr 2025 AGPL-3.0. Lets an LLM write/run Python, JS, Shell, or Bash locally; can open a browser tab, edit files, plot data, or call any CLI tool. Works on macOS, Linux, Windows (plus Termux & Colab). Big community, plugin system, optional voice mode, and a desktop GUI in beta. cua 5,601 ⭐ May 2025 MIT. Spins up near-native macOS or Linux VMs on Apple-Silicon Macs (“Lume”) and exposes a vision+action API so any model can pilot the VM. Gives you GPU-accelerated isolation and reproducible sandboxes; ideal when you don’t want an agent touching your main OS. Operator (OpenAI) – closed-source research preview launched 23 Jan 2025. Runs a GPT-4o-powered “Computer-Using Agent” that sees web pages, clicks, scrolls, fills forms, and hands control back to the user when needed. Hosted in an OpenAI-managed Chromium sandbox, so it works from any OS with a browser. Safety layers require confirmation for payments and log-ins. Claude Computer Use – closed beta inside Claude 3.5 Sonnet (since late 2024). Developers get an API that streams screenshots and accepts mouse/keyboard actions, letting Claude automate GUI workflows inside a VM. Cross-platform; still experimental and slower than humans but first “general” computer-use feature from a foundation-model vendor. Agent-S 4,065 ⭐ May 2025 Apache-2.0. A “generalist-specialist” framework that chains specialist GUI skills under a planner. Scores SOTA on OSWorld/WebArena, supports macOS, Windows, Linux, Android via the companion gui-agents lib, and integrates memory/evaluation loops for continual learning. open-computer-use 1,094 ⭐ Mar 2025 Apache-2.0. Launches a secure Ubuntu desktop in E2B’s cloud sandbox, then orchestrates three LLM roles (grounding, vision, action). Streams the desktop to your browser and lets you pause/override at any time. Plug-in list of >10 models. surf 353 ⭐ May 2025 Apache-2.0. A polished Next.js front-end that wires OpenAI Operator-style agents to an E2B sandbox. Single command to boot a virtual desktop, chat, and watch the agent work. Good starter template for web-based CUAs. Pig – cloud service. Provides on-demand Windows 11 VMs and an API that exposes high-level GUI primitives (type, click, window focus). Targets RPA-style workloads; still alpha, but unique for Windows-first focus and low-latency streaming. gptme 3,767 ⭐ May 2025 MI. A terminal-first personal agent that can run shell commands, edit files, browse the web, and use local or cloud LLMs. Works on Linux, macOS, Windows; great when you want automation in the CLI rather than the GUI. langgraph-cua-py 143 ⭐ Mar 2025 MIT. Shows how to build a computer-use agent as a LangGraph state machine, defaulting to Ubuntu VMs from Scrapybara but swappable. Provides nodes for vision, memory, human-in-the-loop, and streaming. openmacro 101 ⭐ Oct 2024 MIT. Early-stage multimodal assistant that executes Python snippets locally via SambaNova models. Cross-platform CLI; profile system lets you switch API keys or tool sets. Inspired by OpenInterpreter but lighter weight. computer-agent 443 ⭐ Jan 2025 MIT. A PyQt desktop wrapper that lets Claude Computer Use drive your actual machine. Shows practical wiring from Anthropic’s API to local mouse/keyboard events; tested on Linux & Windows.

Things I Learned - 27 Apr 2025

This week, I learned: OpenAI’s reasoning models are much ahead of other models when multiplying two numbers in their heads. Ref ⭐ Promptfoo may be the most mature open source LLM evals tool. Simon Willison Dyson Sphere. LemonSlice showcases real-time audio-video models (avatars) that are close enough to real. Notes from Latent Space ICLR 2025, Singapore Daniel: Menlo’s ReZero. A model that keeps searching till it finds the answer. There are multiple search techniques: Multi-step retreival, Iterative retrieval, Query rewriting. Also, reasoning. The LLM token generation sequence is normally: <think>, <search>, <answer>. Insight: “If we explicitly reward LLMs for retrying after a failed search, they out-perform one-attempt systems.” So <think>, <search>, <think>, <search>, <think>, <search>, <answer>. ⭐ Prompt reasoning models, e.g. “Keep searching till you find the best answer.” Roger, Nous Research Supervised learning is limited because accuracy is piece-wise linear, i.e. it’s broken up. Continuous optimization is meaningless. Reinforcement learning works better because rewards can be discrete. (But it converts things back into differentiable loss functions behind the scenes.) Rewards can be good/bad. Single or multi-step. Whatever. We’re in the “Era of experience”, i.e. models gain experience from the environment themselves. ⭐ So, we need environments models can learn in. This is the next thing after training data. That needs a standard for environments. We’d need a model, a trainer, and the environment. The environments whatever capabilities. Run code. Browser. A game. … With an exposed interface Eugene Cheah (Featherless.ai) Transformer architectures need n-square GPUs as # of tokens grow. Featherless is exploring an RWKV architecture that scales linearly. THere are other such architectures. Performer, Linformer, Reformer, Hyena. Mistral-Nemo-12b-ic is one of the most popular fine-tuned model. It’s small enough to run on a server. Justus Mattern (Prime Intellect) Intellect-2 is a continously learning (RL) model that uses decentralized training on peer-to-peer GPUs. Solving problems on bandwidth, verifiable contributions, etc. ChatGPT Deep Research now also has an O4-Mini version to serve smaller reports. Free users get 0 original + 5 lightweight 5 tasks / month. $20 version gets 10 + 15. $200 version gets 100 + 150. The month begins on first use of Deep Research and runs on a 30 day “window”. Ref O4-Mini-High is great at going through an under-documented repo and finding things. For example, here’s how I configured cmdg. ChatGPT is my new Jupyter Notebook :-) Google announced new AI capabilities at Google Next APAC 2025. Blog. Interesting ones are: @Gemini in chat Google Meet support for “Catch me up” Google Vids: Create short video clips Google Sheets: does better analysis Google Slides: image generation Google Docs: Create Audio Clips (like NotebookLM in Google Docs) Google Docs: “Help me refine” is better than before Google Workspace Flows gcalcli is a convenient way to export Google Calendar. Example: uvx gcalcli agenda --tsv 2025-01-01 2025-01-05 cmdg is a command line GMail client that I’ve now switched to for quick email checks. 80% of my email is spam and this is good enough to scan and delete those. It also avoids running a 200-500 MB tab in the browser that constantly shows me how many unread emails I have. From Worklife with Adam Grant: Cancelling cancel culture with Loretta Ross “Lighten up! Fighting Nazis should be fun. It’s being a Nazi that sucks. If you’re not having fun fighting for hope and joy and human rights, maybe you’re doing the fight wrong. We are the ones who should be having fun.” “You can say what you mean. But you don’t have to say it mean.” There is always a way to put it across better. Refusing to say mean things is about to discover these approaches. “The true mark of a lifelong learner is knowing that you can learn something from every single person you meet.” If you remember that, you can’t be a know it all. semantic-text-splitter could be the go-to text splitter. It’s Rust-based, supports MarkdownSplitter, and multiple tokenizers. Alternatives like semchunk, advanced-chunker, chonkie, etc. seem clunkier. ULID is like UUID but time-sortable. That’s an improvement over timestamp IDs (definitely) and potentially even UUIDs. They can be generated by clients as a globally unique ID. Try pip install python-ulid and npm install ulid. The Consumer Product Safety Commission Data has thousands of reports of product safety over time You can run xclip -sel clip -o | pandoc -f markdown -t html --no-highlight | xclip -sel clip -t text/html -i to convert Markdown in the clipboard to rich text. But xclip doesn’t support multiple selections, so the text is lost. ChatGPT DuckDB UI & Notebooks will potentially be a good alternative to Datasette, DBeaver, etc. But for now, there are still glitches. It crashes with a SIGSEGV (Address boundary error) when connecting to SQLite databases. Ollama limits MAX_TOKENS to 2K by default. AI assisted search helps wherever I would have used Google, e.g. Debugging. “Fix CUDA initialization: CUDA unknown error” Tool search. “Find an online word counter tool.” Library search. “Find a JS micro library to render Markdown.” OpenAI API capabilites lag ChatGPT features. For example: o4-mini via the API does not search the web natively as part of its reasoning. o4-mini, o3, o3-mini, o1, gpt-4.1-nano don’t yet support the web_search_preview tool. Only gpt-4.1 and gpt-4.1-mini do. Limitations Search results are NOT visible via the API. They’re fed directly to the model. The number of searches or results is unknown. Each search costs 0.25-0.5 cents. Pricing For reasoning traces (e.g. .reasoning.summary: "medium") you need to verify your organization via withpersona.com which failed with my Indian passport AND Singapore work permit. The ChatGPT Plus plan ($20) gives you 50 O4 mini messages a day, which I exceeded! It’s supposed to reset at midnight UTC Ref but might operate on a rolling window ChatGPT. “Currently, there is no way to check how many messages you have used in your usage budget.” OpenAI SignalBloom reads SEC filings and writes analyst reports on it using LLMs “Evaluation in the loop” or “Evals-in-the-loop” is a new term I learnt. SignalBloom’s Hallucination Bechmark If AI interacts with the world and generates data from its own experience and learns from that, we have a new scaling mechanism. DeepMind podcast OpenAI’s search API is fairly expensive at $30+/1K calls. Typically, to read interesting HN articles, I will make 30 calls which is about 75c. Instead I should use the app and summarise HM news across different days manually based on my interests! Finally! t-strings land in Python. They’re like JavaScript template literals. DuckDB’s CSV parser might be one of the most forgiving parsers. Even better than Pandas or SQLite3. Ref Good managers will probably make good AI managers. AI agents can probably substitute humans in business experiments. Ethan Mollick If Windsurf stops working, reload the extension. GitHub TLS certificates will start expiring in 47 days from 15 Mar 2029, forcing automated domain renewals. Digicert Nix flakes are a reliable alternative to DevContainers that don’t need Docker - but don’t work on Windows. Ink is like React for the CLI. The Unsure Calculator is a great tool to calculate formulas with multiple uncertainties, like: My office is 9-11 km away and it takes me 45-55 min to reach. So I cycle at 9~11 / 45~55 * 60 ~ 10-14 kmph (12 most likely). I spend $6-15 on lunch and eat out 80-120 days a year. So I spend 6~15 * 80~120 ~ $600~1550 ($1000 most likely) eating out yearly. I take 30-120 min to prepare a quiz question. Each exam has 6-12 questions. So I need 30~120 * 6~12 / 60 = 4~20 hours (11 most likely) Using Kiran’s macOS setup for dev I enabled colorized less and mouse options for tmux. time fish -i -c exit prints the time taken for fish startup. fish --profile-startup ~/fish.profile -i -c exit prints the time taken by each command on fish startup to ~/fish.profile. I used this to speed up my fish startup. The 8 top features of the OpenAI Responses API that are an improvement over the Completions API (IMHO) are: Link to previous response rather than sending history Uploading files directly Swappable system instructions while retaining the chat history Customisable reasoning effort AND reasoning summary detail Truncation in the middle option Web search context size option File search filters by file attributes Flex service tier for lower cost OpenAI doesn’t charge for file storage but does charge 10 cents / GB-day for vector storage beyond 1 GB. The first 1GB is free Augment Code is an AI code editor that’s growing popular on Reddit. #ai-coding The GPT 4.1 models have a 75% discounted prompt caching (instead of the usual 50%), making them particularly suited for repetitive tasks. OpenAI chatgpt.com shortcut keys are revealed via Ctrl + /. Here’s my ranking on usefulness: Ctrl + Shift + C: Copy last response as Markdown! Ctrl + Shift + ;: Copy last code block Ctrl + Shift + S: Sidebar toggle Ctrl + Shift + O: Open new chat Shift + Esc: Focus chat input Ctrl + Shift + I: Ccustom instructions Ctrl + Shift + X: Delete chat

Things I Learned - 20 Apr 2025

This week, I learned: The devcontainers.json spec encapsulates everything you need to get a codebase running for development - as opposed to production. E.g. VS Code extensions, linters, etc. Practical use for GitPod are: Make quick edits to repos that are not on your system (e.g. other people’s repos, or via others’ machines.) Run public workshops with a full coding environment. Give students assignments that have dependencies pre-installed. Collaborate on a work-in-progress codebase with my team. Share POCs with clients or public allowing them to edit it. Allow teams to install remote AI code extensions (e.g. Windsurf) that may be blocked inside the corporate firewall? AI coding can teach us new tech. For example I learned that tqdm.pbar can print logs while showing progress. It’s worth noting such learnings until it becomes a habit. #ai-coding If English is the new coding language, should prompts be versioned? Or at least stored, perhaps in a PROMPTS.md? #ai-coding marimo new "prompt" generates an entire new notebook using your prompt. Video Google Sheets now has an =AI(prompt, [range]) function Help Codex is more a proof-of-concept for agentic coding than a coding tool. #ai-coding You can’t run commands. Only prompts. You need to exit codex to run commands. So you can’t use it like a shell, e.g. like Warp.dev. It doesn’t index local code. It runs commands to figure out stuff. Code diffs and applying changes are clunky. The output is hard to read with text scrolling. codex.md can only handle 32K. ⭐ O3 and O4 have built-in tool use covering all of OpenAI’s tools, including containers. This allows them to manipulate images and natively understand them improving vision capabilities dramatically. GPT 4.1 can handle videos Notes from discussion with Balaji T: Zero-day options are options that expire on the same day. They are priced low. It’s almost just a gamble or a lottery ticket. But since the price is low, retail investors can invest. NIFTY is one of the largest markets for zero day options, surprisingly. There are several college grads who trade writing Python scripts. CoreWeave has taken over all the compute from OpenAI. Though the stock price has fallen, buying CoreWeave is the closest equivalent to buying OpenAI pre-IPO. However, every OpenAI product lost money, despite their 75% discounted compute from Microsoft. (With CoreWeave, the cost would be higher.) So their profitability depends on wiping out competition long-term. For investment research companies (hedge funds, VCs, etc.) increasing the number of companies they research is an advantage. So using AI for research is key. However, the quality of LLMs is too poor for financial analysis accuracy. We need better LLMs for spreadsheet analysis. We suffer from the Gell-Mann’s amnesia effect with LLMs. “You read a newspaper article in your field and find it’s rubbish. You turn the paper and believe it’s perfectly accurate on the next page”. Domain expertise will therefore become even more valuable in the near future. People don’t like AI being forced down their throats. MAS is forcing AI down banks whose execs are forcing it down the org. Bankers and analysts are grumbling about this. I visited SUTD InspireCon 2025. Here were some exhibits that caught my eye. A path marking app that uses cameras to draw a heatmap of people’s walking paths. Popular tracks are redder. Using drones for machine inspection. Portable immigration devices that let you scan passports, face recognition, fingerprint, mic/speakers, etc. Using accelerometer to detect unsafe gait and improve walking habits. UImagine: a web app builder. Interestingly, they used Webcontainers to run Node in the browser! Training a drone to follow a person Credibility detection via micro facial expressions PitchMe: providing real-time feedback to pitches / presentations Zetesis: a platform for people to ask questions during a lecture or meeting (independent of Zoom, Meet, etc.) Tinyeqn: helps grade student assignments The dynamic between domain experts and coders has changed. Now, rather than domain experts pitching ideas to developers who build the apps, developers are creating interfaces that allow the domain experts to shape the app. Ref Since even the cheapest LLMs do a good job of converting unstructured text into a JSON schema, for all practical purposes, adding a full text search on top of any structured API is a trivial exercise. (Of course, it can’t handle complex questions but that’s what agents are for.) Ref ⭐ Marp supports bespoke transitions which includes morphing animations. This can create a bar chart race just using Markdown! Nick Lansley, who I know from my work with Tesco, wrote a great article that includes advice for aspiring consultants: Re-connect with ex-colleagues Leave on good terms with your employer Have a 6-12 month financial buffer Hire an accountant / legal advisor to set up your business Focus on what you enjoy Have a 30-second elevator pitch Build a brand with blogs, social media, or talks Create a portfolio to reinforce your skills DeepCoder is currently the best 14b coding model, i.e. best if you want to code while on a flight. Ref #ai-coding docker model run can run models. Currently, only on Docker Desktop on Mac Ref

Things I Learned - 13 Apr 2025

This week, I learned: It’s possible to intentionally train yourself to: Form close friends. Care, ask, and share. Become a do-er. Stay mindful of the problem or opportunity you’re deferring. AI Coding and the Peanut, Butter & Jelly problem: #ai-coding This ability to define your desired outcome in crisp, complete terms is one of the most important superpowers of the AI era. The Singapore Urban Redevelopment Authority Property Data lets you search sale and rental prices of properties in Singapore. No API though Notes from meeting with Deepak Goel We have linguistic boundaries in media today more than national boundaries. The Chinese language media, for example, is a very different ecosystem. China culturally struggles with the exercise of branding and cultural power, unlike the west, which has adopted assertive and opinionated branding. You really learn the character of a region only by traveling Similarities arise from unexpected sources. For example, Japan and Ecuador have similar culutures - both are disaster prone locations. AI unlocks so many social research possibilities that were not possible before, e.g. by interpreting and classifying what people share in different situations. Companies send clients to third party trainings (e.g. at Harvard) along with their employees - to learn clients’ real pain points! Education has become a tool for customer experience. Schools are tying up with companies for this (e.g. with Emeritus) International Schools Partnership provides services to independent schools for a small stake. It’s an interesting business model. Research for colleges is a business model that’s at risk thanks to Deep Research (e.g. analyse sustainability practices of listed companies.) There’s an Indian Censor Board Scraper repo. Using chroot, you can boot from a Linux USB stick, but trick the system into working from your hard disk as the OS. Useful if your system won’t boot. Ref Claude 3.7 Sonnet with extended thinking has a token limit of over 64,000 tokens. Given a strong instruction following capability, that makes it one of the most powerful models for transforming text. For example, transcription restyling, translations, XML to json conversions, PDF to XML, etc. Notes from discussion with Sundeep In his experience, investors tend to let you run the show (e.g. ask what you want rather than push in a specific direction) unless there is trouble We discussed the “running out of problems” problem with AI. His suggestion: List problems we dropped or eliminated for lack of time/capacity. This filter is a blindspot. Even if you know how to do someting, use AI to discover an alternate solution approach. That’s the path to 10X (rather than incremental) optimization. Having AI create end-to-end pitch videos based on a product idea is now a reality. (He showed me one for his product.) Areas to explore with Deep Research are: What hidden trends is media misdirecting away from? What are second order effects and hidden gameplays? Which organizations would be good clients to target? What would be an apt pitch pitch for them? Experience dining is an emerging theme. Having LLMs explain scenarios (i.e. what might happen if …) based on parameters can help understand/quantify the impact of actions, and therefore what to do. One way to copy as Markdown: copy page contents, paste in text-html.com, copy HTML, paste in Turndown, copy Markdown. Claude 3.7 Sonnet with extended thinking has a token limit of over 64,000 tokens. Given a strong instruction following capability, that makes it one of the most powerful models for transforming text. For example, transcription restyling, translations, XML to json conversions, PDF to XML, etc. Elimination Game is like Survivor for LLMs, where they form alliances and out-vote each other until 2 remain. The eliminated LLMs vote for the winner. GPT-4.5 Preview, both Claude Sonnets and Gemini 2.5 Pro consistently out-perform the rest. Their dialogues are fascinating! SQLite can open locked databases (e.g. browser history) via sqlite3 'file:places.sqlite?mode=ro&nolock=1'. datasette uses this. For example, to read the Edge history on Linux, use datasette ~/.config/microsoft-edge/Default/History --nolock Ref Notes from ThursdAI - Apr 03 Nomic Embed Multimodal models are the current SOTA on multi-modal embeddings. Notably, they embed PDFs natively. Hailuo Speech-02 is the best speech model right now beating ElevenLabs. It has excellent voice cloning. Pricing: $30/1M chars. 10% of ElevenLabs, 2X of OpenAI TTS PaperBench is an open testing framework from OpenAI that requires models to replicate the research work in papers. It has ~8,000 tasks evaluated by LLMs and with LLMs judging the judges as well. The code is well worth studying. Runway Gen 4 was released with very high character consistency and longer durations Dreamina creates lip-synced videos from audio + a single image. Hedra is better for animated characters, though. Meta shared but has not released Mocha, an open character generation model that generates new characters speaking based on an audio you provide. It is not based on existing images but the quality is very good All Hands has a free online version where you can fix GitHub issues. This realistic frodo and sam mining through a minecraft tunnel, holding minecraft picaxes and torches made my day 🙂 AnimeJS released version 4. It animates HTML, SVG, Canvas, and WebGL with a consistent API. Looks elegant and powerful.

Things I Learned - 06 Apr 2025

This week, I learned: <select> will soon be very customizable via CSS. Including custom HTML inside options - even SVG. MDN. Edge/Chrome already support it. The Vitali Set is every real number none of whose difference is rational. A sparse collection of irrational sets. It’s like a line but doesn’t have a measurable “length”. The Lebesgue measure measures the length of broken lines. You add up the lengths of the smallest continuous intervals that cover the line. The Cantor set (take a line, drop every middle third, repeat) has a Lebesgue measure of 0 because the sum of the removed thirds = 1/3 + 2/9 + 4/27 + … = 1. You’ve removed every “length” though infinitely many points remain. The Vitali set built so that if you shift it by every rational from -1 to +1 and add them up, you definitely cover every real from 0-1, but never anything beyond -1 to +2. So the length must be between 1-3. Yet, there’s no number you can add infinitely many times to get something between 1-3. If you add up multiple unmeasurable sets like the Vitali set, you can get any total length you want. The Banach Tarski paradox splits a sphere into unmeasurable sets and adds them to get 2 spheres. Ctrl+Alt+F1/F2/… on Ubuntu switches the terminal. Typically Ctrl+Alt+F2 switches back to Gnome. But it’s a useful hack if Gnome freezes and you need to kill a process. Press Ctrl+Alt+F3, log in, and kill what you need. Notes from AI 2027. BTW, this is the most impactful piece I’ve read recently. It’s been on my mind continuously for 36 hours. A bit distubring, too. 2025: AI can act as autonomous agents, like Glean, Devin, Operator. turn bullet points into emails take instructions via Slack or Teams and make substantial code changes on their own spend half an hour scouring the Internet to answer your question 2026: automating AI R&D is the biggest enabler for AI Labs job market for junior software engineers is in turmoil people who know how to manage and quality-control teams of AIs are making a killing 2027: potential demand for ~20,000 FTEs solving long-horizon tasks to train AI every researcher/coder becomes the manager of an AI team hiring new programmers has nearly stopped, but there’s never been a better time to be a consultant on integrating AI into your business CSS Speech is a W3C spec that lets you control how screen readers should read pages. No browser support now, though. Clipboard2Markdown is a utility that lets you paste rich text and convert it to Markdown. ChatGPT can’t yet create good sketchnotes. Here’s the impact of US tariffs on India. ChatGPT #IMPOSSIBLE OHDSI has a vocabulary you can download from Athena that includes ICD codes and a lot of medical data standards. It also has a hostable WebAPI No open source LLM-based tool handles live transcription and allows you to query notes so far during the transcription. The closest seems to be Meetily Learnings on AI code editors via Deep Research from ChatGPT, Gemini, Grok, Perplexity: #ai-coding GitHub Copilot can identify the source of a code snippet as a repo. That helps with copyright issues. Cursor uses a shadow workspace - a temporary sandbox where it edits files before applying changes at one shot. Cursor auto-complete has context of other files, i.e. inserting an class in a .js file based on another HTML file’s contents. Windsurf seems to be best for large code bases and for large-scale refactoring. It can also run test results fix them. Windsurf includes a browser and lets you click on an element and prompt to change its behavior, etc. That’s good for front-end developers. Roo Code can run scripts as part of the workflow, letting you run linting, tests, starting web apps, query databases, etc. Roo Code lets you create persona, e.g. code reviewer, data storytelling and analysis, etc. with access to different tools and behaviors. Roo Code does not support auto-complete. There’s outrage around Cursor not taking responsibility for a rules file backdoor (via Grok Deep Research) and pricing. Zapier has an MCP server. That should make most integrations easier. Airflow AI SDK is a clever idea. Airflow is a workflow system. Agents are a workflow system (sort of). This SDK exposes LLMs as Airflow tasks. Hidden Factual Knowledge in LLMs finds that the hidden states in LLMs contain much more knowledge than they share. (Sort of like sub-consciously knowing the answer.) Even after asking 1,000 times, the answer is not expressed. ChatGPT Reasoning to Learn from Latent Thoughts finds that the internal reasoning process of LLMs is useful to train other models. Notes from AI Engineering Summit, NY, Day 1 When deploying in production, you need reliable output with fundamentally unreliable components. Sort of like how the ENIAC worked with 17,000 vacuum tubes that would fail every few hours. This is a reliability engineering subject matter and needs to be thought of that way. Google Follow up Deep Research queries are a natural way to extend knowledge beyond just a single report Deep research offloads less relevant parts of the context to a separate memory store for selective retrieval later. Anthropic Don’t use agents if workflows can do the task. The reliability of each individual step of an agent is critical. Code, file access, search. These are the top three tools to use. Making agents budget aware can help deploy reliably in production. Having multiple agents like sub agents can help protect the main agents context window. Self evolving tools are a useful next step in the evolution of agents. Software development lifecycle is about how we iteratively improve consistently without getting worse. Almost like the scientific principle. Morgan Stanley It’s easy to improve knowledge in a problem. It’s very hard to influence skin in a problem. Reinforcement learning from deepseek seems one of the most promising approaches that allow llms to learn skills I published an eBook on Amazon. It takes about an hour if you have the content ready. Set up a Kindle Direct Publishing account with your address, bank details, and tax information. (10 min.) Export my London 2000 blog archive and convert to Markdown. (15 min) Reformat the Markdown by writing a script in Cursor (10 min). Here’s the prompt: Write a Python script that reads *.md including the YAML frontmatter, adds the YAML title as H1, date (yyyy-mm-dd) like Sun, 01 Jan 2000 in a new para after the frontmatter and before the content. ...

Things I Learned - 30 Mar 2025

This week, I learned: Discussion with Vedang Recurse center (Brooklyn, online) is a 6/12 week free self-driven programmer retreat. Runs every 6 weeks. You can do whatever you pick. There are daily standups for accountability. The groups are diverse. You can pair with them, pivot ideas, whatever. Principles: push yourself & learn. Western education techniques (e.g. spaced repetition, adaptive learning) are very much present in Indian coaching systems, though not known by those names. However, interventions are hard since class 12 students just don’t have enough time. Coaching classes are a social phenomenon. It’s not the smart students who are pulling in their friends. Smart students actually follow the popular students. (Coaching classes are below the typical smart students’ standards.) Monetizing coaching is hard. People don’t want to pay for advice, and welcome free advice only if they ask for it. Coupling with execution is necessary. Aider’s integrations make it more powerful than Cursor/Windsurf. It auto-lints, runs test cases. Allows different models for “architecting” (generating changes) vs “editing” (applying code). It reads from the screen logs. Context is manual, not automated. Uses an ai! comment to trigger changes and ai? to ask questions. Cline.bot is another Cursor-like open source AI code editor that’s a VS Code plugin. When coding with LLMs, a useful workflow is: data schema ➡️ interfaces ➡️ LLM-generated test cases ➡️ code. ShellSage is a tmux based LLM tool for the command line. It screen-grabs from tmux, which is powerful. Some MCPs that have proven useful: vega-lite, SQLite, sequential thinking, memory make sucks but is hard to beat. just comes closest. CRDTs are more powerful than for just collaborative editing. It can power a peer-to-peer Internet (beginning with office tools). Versioning schema is still problematic. yjs is a good start but automerge (Rust, WASM) is faster and may be better. Loro is another. Fermyon hosts WASM serverless functions. If LLMs are most safely used where there’s no definitive “wrong” answer, here are low-risk industries and safe LLM use cases within each: Marketing and Advertising: Ad Copy and Campaign Content Generation, Personalized Marketing Messages, Creative Strategy Brainstorms, Automated Marketing Production (Everyday Wins) Customer Service and Support: AI-Powered Chatbots for Common Queries, Agent Assist and Email Drafting, Summarizing and Analyzing Customer Feedback, Interactive Troubleshooting and FAQs Retail and eCommerce: AI-generated Summary of Product Reviews, Product Description and Catalog Content Generation, Visual Content and Image Captions, Personalized Shopping Recommendations (Narrative Form) Human Resources and Talent Management: Job Description and Policy Writing, Resume Screening and Candidate Q&A, Employee Communications and Feedback, Training and Onboarding Content Education and E-Learning: Personalized Explanations and Tutoring, Content Creation: Stories, Examples, and Analogies, Practice Problems and Quiz Generation, Automated Grading and Feedback Media and Entertainment: Writing and Editing Assistance, Personalized Media Content, Localization and Dubbing Scripts, Content Moderation and Curation (Assistive) Finance and Banking: Market Commentary and Research Summaries, Client Communications and Explanations, Regulatory Compliance Summaries, Scenario Analysis and Planning Management Consulting and Strategy: Research and Insight Generation, Document and Slide Drafting, Brainstorming and Scenario Planning Legal Services: Drafting Contracts and Legal Documents, Legal Research Q&A and Summaries, Client Communications and Explanations Reflecting on Satya Nadella’s “SaaS is dead”, building or porting apps’ functionality into classic chatbots (e.g. via MCPs) would be an emerging market. E.g. “Create a HubSpot MCP. Do whatever you want on HubSpot, except via ChatGPT or your favorite LLM chatbot.” To be fair, such interfaces exist. HubSpot MCP with a vega-lite MCP and a few others could solve many common HubSpot UI tasks. DarwinBox MCP, ZenDesk MCP, etc. are emerging. 13 things I would have told myself before building an autorouter has a few interesting points: The A* algorithm finds the shortest path in a graph much quicker than others like Dijkstra’s algorithm by preferring nodes closer to the goal. Spatial Hash Indexing are O(1) and beat Tree Data Structures which are O(log n). Always prefer hashes when possible. There’s an actual convention for using emojis in Git commits: gitemoji. It even has a VS Code plugin, a changelog generator, and more. Emojis have a strong role in enhancing Markdown documents. The ones I use often are: 🔴🟡🟢 for low/medium/high priority ⭐️ or ❤️ or 👍 for ratings or emphasis ✅ for completed tasks 💡 for ideas ⚠️ or ❗️ for warnings / issues Technological innovations have always been changing art forms. For example, the perspective grid and the camera obscura led to major improvements in realistic paintings in the 15th and 17th centuries. regex is an officially recommended Python library with better regex support than re. Ref Notes from ThursdAI - Mar 27 Gemini 2.5 Pro has good instruction following despite long context. It automatically thinks for longer where required. Good at understanding large codebases. Very fast. You can upload a 2 hour audio to transcribe with timestamps. ai.dev is the shortcut to Google AI studio. ChatGPT native image generation is the best image generation model now. - Great character consistency AND prompt adherence thanks to autoregression and not using stable diffusion. - It tends to refuse image generation less than Dall-E. (While Ghibli-style is possible, Calvin and Hobbes strips are blocked.) “We added a refusal which triggers when a user attempts to generate an image in the style of a living artist.” Addendum to GPT-4o System Card - A neat personalization implication is that you could put your kids into their favourite cartoon as a cartoon character that looks like them. It’s weird that the latest GPT 4o is ahead of GPT 4.5 on LM Arena. The new DeepSeek V3 is about as good as GPT 4.5 and VERY cheap (27c), so is the obvious choice to run on OpenRouter. MCP news: Qwen.ai supports MCP in the UI! (But it’s marked as “coming soon” in my case.) Unlike tools, MCP uses servers that can remember the state or context. Tools are stateless. MCP app store like Smithery, MCP.run, Glama, are mushrooming. Awesome MCP Servers is another good starting point. Azure lets you expose agents as MCP servers. ChatGPT now uses semantic VAD. I interrupts less and typically when you have meaningfully complete something. It responds a little slower as a result. AI generated images created from prompts cannot be copyrighted. News US Copyright Office LLMs are much better at GeoGuessr than humans. arXiv. Gemini leads the pack and is ~3x better at continents, 9x better at countries, and 37x better at cities. Gemini 2.5 Pro transcription has accurate timestamps and bounding boxes. Simon Willison Notes from Writing with AI Personal writing with connection won’t go away. AI can’t give you heartbreak. But the rest of non fiction writing will vanish. What AI is extraordinary at is personalizing to each audience member’s interest Outlier opinions will thrive among humans - since AI is trained on consensus. Managers tend to be good at working with LLMs because it’s mostly about delegation. LLMs are perfect for things that don’t have a wrong answer! – Benedict Evans. 💡 Explore arguing with AI. It’s a safe way to get into a confrontational emotional state (which has its own benefits.) 💡 Keep an LLM on in voice mode while reading and ask it any questions you have. What models are good for what? GPT 4.5 is great for creation - has a great sense of humor but a corporate style. Still, way better than GPT 4o. ChatGPT is good for voice transcription and note taking. (Increasingly we take notes for AI rather than ourselves.) Claude 3.7 has the best style of writing. It’s also great for drawing charts. O1 Pro and Deep Research is great for consumption - research. Grok is the least corporate, able to argue with you, and the latest knowledge cutoff. ElevenLabs for editing podcasts in your voice, making corrections. Playwright offers an MCP server. https://simonwillison.net/2025/Mar/25/playwright-mcp/ The new GPT-4o mini Transcribe model is a bit better than Whisper and costs half: ~18 cents per hour. It includes background noise cancellation and semantic chunking, which is useful. The new GPT-4o mini TTS is about 3-4 times cheaper than TTS-1 since it’s ~$12/MTok instead of $15/Mchar. It supports emotions with streaming. Cursor with Claude 3.7 Max seems surprisingly good at generating multi-page sites at one shot. Potentially, it can edit large repositories of code as well at one shot. If that’s the case, the way we write code will require higher order thinking skills: broad sweeping changes rather than micro edits. I tried Open WebUI with its Knowledge feature. In short, it sucks. Due to the RAG technique as well as model quality. When I passed it my notes about Straive and asked who Straive’s clients were: Open WebUI with Gemma 3 found one - after multiple attempts ChatGPT with o3-mini-high got 5 (missing nothing.) ChatGPT with GPT 4.5 got 4 Gemini with Gemini 2.0 Flash Thinking got 3 Gemini with Gemini 2.0 Flash got 3 (with a 4th wrong answer) I’ve settled on squoosh.app for image compression using WebP. I’m exploring FreeImage.host for image hosting instead of Imgur for WEBP support. FreeImage.host also seems reliable, retains file sizes, and supports hotlinking. DeepFace currently seems the easiest option for face detection. Easy to install. Multiple back-ends. Gemini Codrawing is a popular Hugging face space that lets you sketch something and prompt Gemini Flash to improve on it. Draw a dead man beside the pool of blood. Add an armor to the attacker. Significantly improve the quality of this picture. Add a red pool of blood next to the dead man. The armor looks like a frock. Make it more like an armor. Make this look like a professional drawing, even though it’s in stick figures. Draw it in the style of Picasso Phi-4 multimodal procehttps://huggingface.co/microsoft/Phi-4-multimodal-instructsses speech better than Whisper V3 on HuggingFace OpenASR, and images better than Gemini Flash Lite On any LLM project, BEGIN with evals. Always. The effort for evals may seem high. Use LLMs to reduce this effort. Include irrelevant questions because people WILL ask them. Be clear on how to handle that.

Things I Learned - 23 Mar 2025

This week, I learned: If we can DESCRIBE what good looks like, training data is no gap. We can auto optimise models towards that. That’s RLF. DeepSeek R1 side stepped the need for training data by creating reward functions and prompts. This tells the fine tuning process how to go correct as it goes along. This video is the first one that really help me understand what’s going on. I was born in the Ananda year in the Tamil and Telugu calendars. ChatGPT Andrej Karpathy’s note taking mechanism is similar to mine, except I use Microsoft TODO. Ref I have 3 categories. Things I learnt, which I just note. Things to explore, which I can delegate, defer, drop, or do at any time. Things to do, which are the hardest and pile up. Alexander Doria shares an interesting perspective on the app space. Model is the product Models are natively absorbing app capability and will become killer systems internalising workflows like Chat, Deep Research, Claude Code, Operator, etc. to wipe out the apps and workflow space. Models will “internalize” tool capabilities Opinionated or focused training will be a lever and model providers will acqui-hire the successful trainers API access from model providers will shrink. Selling tokens is not a viable business model given lowering costs The huggingface_hub cache-system uses symlinks by default to efficiently store duplicated files. To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In Windows, you can enable offline files for any SMB share via: Control Panel → Sync Center → Manage offline files and turn on the feature. Then, in File Explorer, right‑click the mapped network folder or drive and select “Always available offline.” OpenAI now supports PDFs natively in the API. (Gemini has done so for a while) Anger is a trigger for change. “Either change yourself or the environment, else you’ll be uncomfortable.” HocusPocus allows live collaboration e.g. editing together Block notes is a notion like library for editor components. Converts to Markdown Oxidizr enables replacing Linux tools with Rust equivalents. Emoji Kitchen lets you create stickers from emoji combinations. Another way of scaling LLMs is generating multiple options and self evaluating. Eric Zhao duckdb -ui launches a DuckDB notebook. This is built into newer DuckDB releases Monolith downloads web pages as a single HTML file by embedding content. Archgw is an LLM proxy/router from the makers of Envoy proxy. There’s an annotated Terry Pratchett! Gemini API allows YouTube videos as a part. Google agents.json is a proposal for discovery of agents on a site that enhances the Open API spec: wild-card-ai/agents-json Since Gemini Flash 2.0 is now an image GENERATION model, interactive VISUAL fiction is now a cool possibility. People are using it in interesting ways: Interleaved storytelling, Memes, Surrealism.

Things I Learned - 16 Mar 2025

This week, I learned: Here is a training program on open source corporate policy. htmlq and pup query HTML. They’re like jq for HTML. Here are time-tested and robust ways to leverage serendipity: ChatGPT Place. Be in places with high, diverse, talent density. Bell Labs (1950s), MIT (1970s), Pixar (1990s). People. Meet diverse, talented people. Da Vinci’s Renaissance circles, Lockheed Martin’s Skunk Works. Free time for unstructured work. 3M’s 15% rule, Google’s 20% time, Edison’s Invention Factory. Curiosity. Learn unrelated fields. Darwin’s earthworm research, Ben Franklin’s ocean currents work. Serendipity. Systematically add randomness. Brian Eno’s Oblique Strategies, IDEO’s Deep Dives. Reframe failure as opportunities. Penicillin, Velcro, Post-it Notes. Ceremonies. Hackathons, lightning talks, coffee trials. What makes client-side computing on the browser powerful is There’s nothing to install Private by default: data stays with client Speed: no latency SemGrep is a lot less open source than it used to be. ChatGPT. That’s a pity. It was a good tool. Site builders and headless CMSs are gently eating into the dominant market share of open source CMSs (via PretaGov). WordPress is pretty much the dominant CMS in the world, followed by Drupal. WordPress is now VC backed and is not growing, so they seem to be attacking their own community. Umbraco CMS is the only open source CMS that’s growing. Maybe because it’s the only .NET one Craft CMS is the only proprietary CMS that’s growing. Site builders are growing as a category. SquareSpace is the leading one. Headless CMS is growing too. Statamic. Next.js. Nuxt.js, Contentful, Prismic, Storyblok, Gatsby, etc. Here’s a sample CI/CD pipeline with automated code review. Here is the script that generated it. Note the use of NVIDIA’s GPU Docker containers via nvcr.io Things I learnt about robotics. SO-ARM100 is an open-source 3D printable robot arm. Takes ~20 hours to print, ~1 hour to assemble. Costs ~$120. LeKiwi is a mobile version of this arm LeRobot is a set of HuggingFace models and datasets. The idea is, you can use one “control” robot to control the other. Do stuff manually, teach it ~50 times, and it learns how to do what you’re do. Pi0 is an LLM equivalent for robotics that predicts actions. HuggingFace ported that to LeRobot Most real robotics work is on SIMILATED “gym” environments, not costly/slow physical environments.PushT is a simple 2D version. ALOHA is a 3D one. ROS is a nightmare to install and run - on Windows and Mac. Robotics Academy is an open collection of easier ROS exercises. PSLab - Pocket Science Lab is a sensor kit for the phone / PC. Costs ~$100 but isn’t available anywhere. Getting it to work requires too much mucking around with USB drivers and it just doesn’t work. (BBC micro:bit may be more promising.) Getting stuff done with electronics is still really hard unless it’s well designed. It’s FASCINATING that robots can have arbitrary joints. Our intuitions (or even biomimicry) on how to move and do stuff is a POOR intuitive guide for how robots should act. MathML Core is a language and layout specification, distinct from MathML 2/3. It’s not fully compatible with JATS XML. latexmlmath converts TeX to MathML. m|math { font-family: "Noto Sans Math", "Noto Sans" } is a popular OpenType Math font. Browsers default to native fonts: e.g. Cambria Math on windows. Explore at https://fred-wang.github.io/MathFonts/. The people working on this at arXiv are: Deyan Ginev, Fred Wang, and Norbert Preining. Their work is sponsored by NSF. There’s a PDF UA2 standard for accessibility but there aren’t enough tools to generate it. LibreOffice is now on WASM. ZetaJS provides office in the browser. Has a CDN (that was down from our IP). 35M packaged binary. 100M of in-memory file-system loaded. Useful for: Document conversion, Thumbnail generation, Text extraction, Merging / splitting documents The Poincare Conjecture says that any finite 3D blob with has no holes can be deformed into a sphere. It took until 2003 to prove it because we didn’t have the tools to manipulate 3D shapes. Playbook driven agents are another approach to agentic workflows. Simon Willison Twine (docs) is an open source interactive fiction / story writing tool. Snowman is a browser-based Twine 2 story template format. These enable behavioural experimentation. Cheaper than using tools like Gorilla.sc and Pavlovia for behavioral experiments For example, you can present a social or political issue and see if people change their opinions more or less depending on the content/path they see. Or, if it varies by demographics. Or, check if repeated mentions or emotional hooks improve memory / retention. More research ideas Techniques to reduce Docker image sizes: Native Linux mount supports overlaying directories! Lower layer is read-only. Edits (including deletions) affect upper layer only. Docker uses this. docker image inspect shows layers. Always run RUN apt-get update && apt-get [packages] rather than in separate lines. Else RUN apt-get update gets cached with OLD update cache. Defer COPY till as late as possible, and COPY minimally - since it typically invalidates the cache. Skip development dependencies and temporary caches. Docker Dive via dive [IMAGE] analyzes image details and shows the file system in each layer. Use multi-stage builds. A: Create an image using FROM some-image AS builder and do what you want. Then, after that, B: FROM scratch (or FROM node:22-slim) use COPY --from=builder what-you-want. Use distroless images from GCR. It doesn’t have shells, package managers, etc. Fewer vulnerabilities. Playwright seems to be the emerging standard for modern browser testing/automation, beating Cypress and Selenium. “Openwashing” is a term where something is termed open source but is not. Photos from FOSSASIA are public. To publish images long-term GitHub is an option. Likely to last long-term. Clone-able. Archive.org is a good too but may suffer from bandwidth constraints. Imgur remains popular but it’s unclear if it will remain unrestricted. Flickr has had a flaky history with limits and commercialization. WikiMedia Commons deletes personal uploads by first-time contributors. Only files clearly useful for a large audience are retained. This table of LLM API data protection lists what use cases each provider’s terms of service allow from a security perspective. Unsloth might be one of the simplest ways of fine-tuning. For LLM UIs, Open Web UI seems most popular. Run via WEBUI_SECRET_KEY=... uvx --python 3.11 open-webui serve Text generation Web UI is less so. KoboldAI, LMQL, LM Studio, GPT4All, etc are far behind. GPT 4o Mini is probably a 8b parameter model. Ref “SRM"s are Small Reasoning Models - like Small Language Models. Phi-4 and DeepScaleR are SRMs. Gemma 3 is a multi-modal SLM. gemini-embedding-exp-03-07 leads the MTEB and is currently the top embedding model by a big margin. Apify is a cloud scraper platform. Here’s how they optimize their AI Web agent - Source: Remove redundant tags and attributes (e.g. accessibility, etc.). Explore readability. Add a unique gid to each element. Add the screenshot WITH a “Set of Marks” - “SoM” (read research paper) highlighting important clickable elements. Code output is brittle. Use tools / DSL - e.g. visit_url(url), click_element(text, gid, tagName), etc. GenAIScript increasingly looks like a promising way to automate LLM workflows in the browser. Ollama has a Windows download Marp is my new favorite way to generate slides from Markdown. Reveal.js is not easy with Markdown (though HTML works well.) The VS Code plugin makes development very easy Marp CLI makes deployment easy. I used it for my talk on LLM Hallucinations (source). Supports all bespoke features and plugins Transitions. Requires OS animation effects to be enabled Animated SVG backgrounds are a good add-on. A mental model to consider is: each chat conversation with an LLM is a person or a personality in itself. A day in the life of a model, where its personality evolves. Bots need structured content (e.g. Markdown, XML). Humans need rich content (e.g. HTML). Here are 4 ways to serve both, roughly in increasing order of sophistication: Different URLs. E.g. https://example.org/about/ vs https://example.org/about.md (this is how Jekyll or Hugo work). Use for static sites generators. JavaScript. Inject after Markdown: <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script><script>document.body.innerHTML = marked(document.body.textContent);</script>. Use for dynamically generated static sites. URL query parameters. E.g. ?format=markdown vs ?format=html vs ?format=json. Use in APIs. Content Negotiation. Based on the user agent and Accept header, serve Markdown or HTML. Send Vary: Accept to indicate that the response depends on the Accept header. Use for dynamic web apps. Notes from The Knowledge Project: Josh Wolfe: Human Advantage in the World of AI Agent optimization might become as popular as search engine optimization in the future. APIs are likely to be replaced by just chat requests that will do the same thing. APIs might be replaced by RPA, where somebody uses a chatbot to do the equivalence instead. Today, blue-collar workers may be more protected from AI than white-collar workers. Robots still can’t serve a meal well enough and aren’t progressing as fast as AI yet. There’s a lot of tacit knowledge in craftsmanship that will take a long time for machines to replace. Margins are fleeting. The only time you have large sustainable margins is when you truly have a monopoly. Cost is going down so quickly right now that all you have to do is wait, and stuff will become available for a very affordable or even a free price. The moat is really in the data. The models are not an advantage. Engineering and services on top of that are marginal. Machines will be doing science 24/7. All of the science data that we have will probably be the biggest leverage for humanity. The discovery of penicillin, Viagra, and rubber were all serendipitous. Machines should run with a little bit of randomness to benefit from this. Tesla might have gotten away with accounting fraud on warranty claims. But short sellers are likely to be after Elon Musk. With LLMs, the value of our social network has gone up considerably. Remember: The reason we believe things is not because we have thought through and analyzed them. It’s because the people around us believe in those things. It is now practical for a person to live on forever by sharing all their thoughts into an LLM. Kids can have a “Dad AI”. One good use of meeting recordings is to see where there are biases in the conversations and where the engagement is not high enough or how there are unproductive power balances. A great virtue of college is that it allows you to break free from your previous personality. For those four years, nobody knows who you are or cares what you wear. And you can be or grow into a very different person. The more content we put in into AI or social media, the harder it is to change ourselves. People are reporting that Roo Code is better than Windsurf. Roo Code is open source. Available as a VS Code extension and run-nable via git clone Roo Code supports Computer Use. It can read files, take screenshots from a built-in browser, controls it, and reads browser console logs. Opinions are mixed. A team member reported that it takes 10 LLM queries to do what Cursor does in 2. Another reported that it does in 1 query what Cursor does in 2. Notes from Thursday AI, 6 Mar 2025 Google’s AI overviews now use Gemini 2.0. They’ve introduced an AI mode that functions like a mini deep research tool, incorporating planning and search. (A Perplexity-killer). It’s a fine-tuned model that is extra cautious with topics like healthcare and always verifies information. QWQ from Quen competes with DeepSeq R1, but with only 32b parameters compared to R1’s several hundred billion. AI models are becoming less restrictive. Gemini and GPT-4.5 have relaxed some constraints, shifting more responsibility onto users, similar to Grok. What’s GPT-4.5 good for? It seems to excel in creativity, humor, education, emotional intelligence, and teaching. It follows instructions better and understands intent better. However, it’s not a major leap in coding or math. OpenAI’s Deep Research mode always uses O3, regardless of the model selected in the UI. Tencent has released a new video model available at https://aivideo.hunyuan.tencent.com/ and it appears to be quite good. Many clients now support Model Context Protocol (MCP), including Cursor, Claude Code, and Claude Desktop. The clients list is long. Some MCP uses include: Interact with GitHub using the GitHub API. Using Knowledge Graph memory to premember previous conversations Using the Cloudflare MCP server to perform Cloudflare actions. File retrieval and custom prompts – which MCP supports in addition to tools. Calling other MCPs or LLMs (conditionally) from an MCP, enabling the creation of full-fledged workflows. Composio offers a Hosted MCP service. CloudFlare lets you build remote MCP servers. Notagen is an open-source note generation engine that produces high-quality classical sheet music. Sesame has an open-source voice model worth exploring. DiffRhythm is a music generation model that appears to be quite good. 2 pass bounding box approach. Have an LLM generate bounding boxes. Then fix it. Ethan Mollick uv tool install and uv tool ensure-path are useful commands for installing and ensuring path for tools. Simon Willison

Things I Learned - 09 Mar 2025

This week, I learned: In Jan 2025, ChatGPT included images as part of their data chat export. They also have a 30 second limit for the export. As an extensive user, my export is about 1GB which takes well over 30 seconds to download. Like many others the export option pretty much doesn’t work for me any more. Bharathi said மெல்லத் தமிழினிச் சாகும் in a poem that has been often quoted (and parodied). Here’s the context. The Zettelkasten note-taking method proposes that you: Capture: Write down every idea or piece of information on a separate note. Use your own words to ensure understanding. Organize: Consolidate fleeting notes into permanent ones. Assign unique identifiers to each note for easy reference. Connect: Link related notes to form a web of knowledge. This can be done with tags, references, or hyperlinks in digital systems. Review: Regularly revisit your notes to strengthen connections and discover new insights. I agree with almost every point on this LinkedIn post on scoring candidates for AI roles. Rob Balian Uses DeepSeek R1 or Claude 3.7 +5 points Uses Langchain -5 points Uses Langgraph +5 points (I don’t know enough to comment) Built a RAG in 2023 +3 points Built a RAG in 2025 -3 points “pinecone” -5 points (I don’t know enough to comment) “What is cursor” - 50 points no coming back from this Uses Cursor composer +10 points “You don’t need a full agent for this” +5 points Did hackathons to learn AI outside of work +5 points “We probably need to fine tune for this” -3 points unless you can explain why “Gemini is making a comeback” +3 points (I have a soft spot for Gemini) +3 points each for mentioning reasoning trace, structured outputs, MCP, chain-of-thought, prompt caching, TPM limits “Export to prompt” can be a useful feature in apps (or even as a bookmarklet). It would let you export content in an LLM-friendly Markdown format. You can paste it into an LLM and ask questions. Here are things I would find useful: Copy an entire issue (with history) from GitHub, Gitlab, or JIRA Copy an entire PR (with code changes) from GitHub, Gitlab, or Bitbucket Copy CI/CD logs from GitHub Actions, Gitlab CI, Azure DevOps, etc. Copy entire conversation thread in Gmail or Discourse, Service now etc. Copy product reviews from Amazon, Shopify, etc. Copy page(s) from wikis and content sites like Wikipedia, StackOverflow, etc. Copy survey responses from Google Forms, Typeform, etc. Copy all interactions with a contact (including interactions, proposal history) from HubSpot or Salesforce Copy transcripts from Zoom, Teams, Google Meet, etc. Copy as Markdown from Word, GDocs, PDF or HTML Copy the summary of an analysis as well as all key metrics from any dashboard Copy SAP invoices Copy JDs, CVs, and reviews from Workday, BambooHR, DarwinBox, etc. Copy design specs, component libraries, and style guides from Figma, Miro, etc. Generated with the help of ChatGPT – link not working Ancient languages tend to have fewer words for hues than brightness, since they didn’t need them. So “Krishna was blue” or “the sea is wine-dark” is more an indication of darkness than shade of color. Ajit Narayanan Mistral released an impressive OCR model. Marker from DataLab seems comparable but is CC-BY-NC-SA. MinerU convert medical textbooks to Markdown well. Gemini Flash may be more cost effective and better From How I Write with Tyler Cowen Keep researching. Use LLMs as an altemative to books and other reading material. Keep publishing what you learn regularly. While reading a chapter, keep asking the LLM. What did you think of that? What just happened there? What should I focus more on? What’s puzzling about this? How do I connect this to something else later or earlier in the book? LLM is better used to support you rather than replace you in areas of your expertise. Where you are an expert it’s best for you to be yourself and have AI fill in the gaps. Ask the AI: “What is in my writing that some people might find obnoxious? Or cold / heartless? Explain it to me in great detail.” The first input is context setting and should be really long. Use voice dictation for that instead of typing. Send your blog post to an LLM. No need to explain it. Just let it be the reader and see what it understands and doesn’t understand. His PhD students don’t have a textbook, which saves them some money. But they are required to subscribe to a large language model which ends up costing less. Today, it makes sense to use the best models and pay $200 for it if required. The differences are large. But in some years in the future, the cost of these models may come down for the free versions. Humans know secrets. AI does not. So at least in some areas, humans will have an advantage. Secrets full matter a lot more in the future. Gossip will matter a lot more. How good are you at keeping and trading secret? Travelling and meeting people will become more important. So will the value of social networks. Since everyone has access to better intelligence, the value of mobilization or being able to do things with people will have higher value. Leadership is an example. The value of your network therefore has gone up a lot. There’s more value in prompting one thing 10 times then 10 things one time. Follow up questions work better than long prompts. There are so many AI note-takers (and transcribers) these days that you are not just writing for an AI but speaking for AIs as well! Which model to use: O1 Pro is the best model. Claude does a decent job. DeepSeek is full of hallucinations but is interesting. It is more imaginative. Use O3 mini to write your prompt first, and then ask the model Use DeepSeek and other somewhat wacky high-end models once a day so that you stay in touch with what is models are capable of (beyond the conventional.) Perplexity has entirely replaced Google for many people. Anthropic’s models are the best writers. Gemini is good for long documents and hence for things like legal work. Gemini also has excellent YouTube integration and hands can directly read the transcripts. Grok is very good at fact checking tweets. Converting data into LLM consumable forms will be a huge project. Lot of a knowledge is not in such a form and a huge human project will involve this conversion. Indians do not need a visa to enter Thailand. Ref Build apps (not just content) for agents. In the next 3 to 5 years, agents will surpass humans as the top product users. Reliably creating interactive tutorials is hard today. Claude 3.7 Sonnet ran out of tokens when I tried creating an interactive tutorial on diffraction. Cursor got the tokens but failed to get the application right after 3 attempts. This is not yet reliable, and when it does become reliable, education will change a fair bit. #IMPOSSIBLE Tools and solutions should fit within existing workflows. That means almost all capabilities need to be exposed as APIs. LLMs make many different kinds of errors that are useful to differentiate between. Here are a few Model errors. The model itself makes a mistake. E.g. hallucinations, not following the prompt, etc. Context errors. The model makes a mistake because the question was out of context, or the context was missing. Input errors. The input to the model was parsed incorrectly, e.g. poor audio, poor image OCR, etc. Tool errors. The model’s tools are wrong or not good enough, e.g. Retrieval errors. Most browsers are moving away from third-party cookies. Here’s Google’s recommendation on alternatives. The simplest of these is CHIPS, which requires adding a Partitioned cookie attribute. Notes from AI Engineering Summit, NY, Day 1 An agent requires 3 things: a router, tools or skills, and memory. Agents are often sequential, but sometimes parallel execution makes sense for independent tasks that you consolidate. Always allow LLMs the option of NOT answering a question if there is no good answer. Focus prompts on the happy path. Use guard rails for edge cases. Here are a few “tools” an agent would need to call: Clarification from user Saving to memory Google search Edit a file introducing SPECIFIC changes Search in codebase using embeddings Run scripts on the shell or in a REPL (Python, Node, etc.) Run code in a new container for isolation Automatically discover, read an API documentation and use it Modify environment to enable logging and other system changes. When code is cheap, you can explore more ideas and hence design and product management need to approach things differently. We also need to reaching testing completely because it makes very different kinds of mistakes and we don’t often have an intuition You can have an agent explore all the issues and full request and recent comments against the repository and summarise it for the project manager Notes from AI Engineering Summit, NY. Session by Lux Capital. Agents make multiple LLM calls. Errors accumulate. So the quality of the model is key What’s really critical: data + context + user preference Set up evals for subjective responses by collecting signals continuously. Create scaffolding for agents where errors don’t accumulate. Better yet, make it FIX errors UX is critical. We need lots more UX styles YayText converts text to Unicode that has strikethrough, bold, italics, alternate fonts, and other interesting features. So does Unitextify, ConvertCase, and LingoJam. 10 red flags I look for as an angel investor is an interesting read. No real customers: A deck, a landing page, and a “vision” don’t impress me. Show me paying customers. Even better, show me customers coming back. No path to profitability: I don’t care if you raise $100M – if there’s no plan to make money, you’re just burning oxygen. Growth is great, but cash flow keeps you alive. Founders who won’t sell: If you’re scared to get on sales calls, that’s a red flag. The best founders sell in the early days – whether it’s to customers, employees, or investors. No differentiation: “Like X, but cheaper” isn’t a strategy. If your only edge is price, you’ll get crushed. What do you have that no one else does? No urgency: The best founders operate like time is running out. If you’re “exploring ideas” or “thinking about raising next year,” you’ve already lost. Raising money before proving anything: Too many founders try to fundraise their way out of bad ideas. If you need VC to get off the ground, you’re building the wrong business. No clear distribution strategy: Product alone doesn’t win. First-time founders obsess over features. Second-time founders obsess over distribution. How are you getting customers? No ownership mentality: If I hear “I need to hire someone to do that” too early, I’m out. Founders who win figure things out before they delegate. A CEO who can’t attract talent: Your first hires are everything. If great people aren’t willing to join, either the vision is weak – or you are. No skin in the game: If a founder won’t invest their own money or take a pay cut to make it work, why should I? By contrast, this OpenAI Deep Research report feels a lot less actionable. Inception Labs offers “Diffusion LLMs”. (No API yet.) They start with random text and refine it in parallel. The benefit is: It’s faster and cheaper due to parallellalization and better GPU use It doesn’t commit to tokens and can fix hallucinations, JSON structure errors, reasoning fallacies, etc. It’s better with multi-modal since images are diffusion based already.

Things I Learned - 02 Mar 2025

This week, I learned: Proxmox Virtual Environment is an open-source alternative to VMWare, Hyper-V, Citrix XenServer, etc. (There’s nothing there that prompts me to explore it further.) With Podman on Windows (a Docker equivalent), many Docker-enabled tasks become easier. For example, running PostgreSQL is as easy as: podman run -d --name postgres -e POSTGRES_PASSWORD=postgres -p 5432:5432 postgres:latest podman exec -it postgres psql -U postgres -c "CREATE DATABASE mydb;" Bad deep research prompts are: vague/broad, under-specified or ambiguous. In short, the more you know what you want, the better. Iterate until then. What kind of reports do clients are research companies to produce? I was curious to see if Deep Research can replace these. Here are a bunch of ideas. ChatGPT Strategy & Management Consulting Research (McKinsey & Company, Boston Consulting Group, Bain & Company, Strategy&, Accenture Strategy) Produce a comprehensive strategic transformation report for a Fortune 500 consumer goods company. Analyze global market trends, competitor strategies, and actionable growth recommendations, including case studies and source citations. Generate an in‐depth study on corporate restructuring trends in emerging markets. Focus on successful turnaround strategies, CEO leadership factors, and strategic pivots, with a comparative analysis of key players. Create a report on M&A trends in the technology sector over the past five years. Detail deal drivers, integration best practices, and forecast future acquisition opportunities, citing relevant data. IT & Technology Research Analysts (Gartner, Forrester Research, IDC, 451 Research, Ovum) Produce a market assessment report on emerging cloud computing platforms. Include vendor evaluations, adoption forecasts, and key technology drivers with supporting data and charts. Generate an in‐depth cybersecurity trends report for enterprise IT. Analyze recent threat vectors, defense strategies, and best practices for risk mitigation, providing actionable recommendations. Create a comprehensive study on the impact of artificial intelligence in enterprise software. Include competitive benchmarking, technology adoption rates, and forecasted market changes. Marketing & Consumer Research (Nielsen, Kantar Group, Ipsos, GfK, Euromonitor International) Produce a consumer behavior analysis report for a leading retail brand. Identify key demographic shifts, purchasing trends, and brand loyalty factors, and provide actionable insights with data visualizations. Generate a detailed report on digital media consumption trends among millennials, incorporating survey results, social media analytics, and case studies of successful campaigns. Create a market segmentation report for a new consumer electronics launch. Identify key consumer segments, behavioral drivers, and media usage patterns with clear recommendations. Financial Investment Research (Goldman Sachs, JPMorgan Chase, Morgan Stanley, Morningstar, Keefe Bruyette & Woods) Produce an equity research report on mid-cap technology stocks. Include detailed financial modeling, valuation analysis, and buy/sell/hold recommendations with supporting data and charts. Generate a fixed income analysis report for corporate bonds in the industrial sector. Assess credit risk, yield forecasts, and macroeconomic influences, citing key data sources. Create a comprehensive report on global market trends impacting investment banking. Analyze regulatory changes, market sentiment, and performance metrics of leading financial institutions. Healthcare Research (IQVIA, Frost & Sullivan, Evaluate Ltd, Deloitte Healthcare, IMS Health) Produce a market analysis report on emerging biotechnologies in oncology. Include competitive landscape, regulatory challenges, and growth forecasts with relevant case studies. Generate a comprehensive report on patient satisfaction and telemedicine adoption trends. Analyze survey data from leading healthcare providers and benchmark best practices. Create a detailed study on pharmaceutical market dynamics in emerging economies. Focus on pipeline developments, regulatory environments, and market potential with actionable insights. Legal Research Providers (LexisNexis, Westlaw, Bloomberg Law, Fastcase) Produce a legal risk assessment report on the impact of recent data privacy regulations for multinational corporations. Include case studies, trend analysis (2019–2024), and strategic recommendations. Generate a comprehensive report summarizing key federal and Supreme Court rulings on intellectual property rights over the past five years, highlighting trends and divergent interpretations. Create a detailed report on the evolution of securities law and its effect on investment research practices, incorporating analysis of recent litigation and regulatory updates. Media & News Research (Factiva, Kantar Media, Comscore, Cision) Produce a media consumption trends report that analyzes audience behavior shifts across digital, TV, and print platforms. Include data visualizations, key drivers, and forecasted trends. Generate a comprehensive report on the impact of social media on traditional news reporting, with case studies and a comparative analysis of engagement metrics. Create a detailed study on the effectiveness of multimedia advertising campaigns, evaluating ROI, consumer engagement, and best practices with actionable insights. Economic & Industry-Specific Research (Economist Intelligence Unit, BMI Research, IHS Markit, Consensus Economics) Produce a macroeconomic outlook report for emerging markets, including GDP, inflation, and employment forecasts, with detailed data analysis and visualizations. Generate an industry analysis report on the automotive sector, covering technological innovations, competitive dynamics, and consolidation trends. Create a comprehensive country risk assessment report for a target region, detailing political, economic, and regulatory factors with recommendations for investors. Human Resources & Employee Engagement Research (Gallup, Great Place to Work, Mercer) Produce an employee engagement report for a multinational firm based on recent survey data. Identify key drivers of satisfaction, retention challenges, and improvement recommendations. Generate a comprehensive study on the impact of remote and hybrid work models on employee productivity across industries, including best practices and benchmark data. Create a detailed report on workplace culture transformation, analyzing organizational behavior trends, employee feedback, and actionable strategies to boost engagement. Environmental, Social & Governance (ESG) Research (MSCI ESG Research, Sustainalytics, ISS ESG, Bloomberg ESG) Produce an ESG performance report for a portfolio of global companies. Include sustainability scores, risk assessments, and recommendations for improvement with data visualizations. Generate a comprehensive study on the impact of climate change regulations on the energy sector, including policy analysis, market forecasts, and strategic implications. Create a detailed report on corporate social responsibility trends in the consumer goods industry, incorporating qualitative and quantitative analyses with actionable recommendations. Education & Academic Research (RAND Corporation, National Center for Education Statistics, HolonIQ) Produce an analysis report on the future of online education, examining technological adoption, market growth projections, and student outcome trends with supporting data. Generate a comprehensive study on the effects of educational policy reforms on public school performance in the U.S., including trend analysis and actionable recommendations. Create a detailed international higher education trends report, covering tuition dynamics, international student mobility, and emerging academic programs with comparative data. Real Estate & Property Research (CBRE, JLL, CoStar Group, Cushman & Wakefield) Produce a commercial real estate market analysis report for major urban centers, including occupancy trends, rental rate forecasts, and investment opportunity assessments. Generate a comprehensive study on residential housing market dynamics in emerging economies, focusing on affordability, supply-demand gaps, and policy impacts. Create a detailed report on the impact of urban redevelopment projects on local real estate values, including case studies, forecasts, and strategic recommendations. Energy & Natural Resources Research (Wood Mackenzie, Rystad Energy, Bloomberg New Energy Finance) Produce an analysis report on global renewable energy trends, covering technology adoption, market forecasts, and key policy drivers, with detailed data and visuals. Generate a comprehensive commodity price forecasting report for oil, natural gas, and key metals, incorporating historical trends, risk assessments, and predictive modeling. Create a detailed report on energy transition strategies for traditional energy companies, focusing on clean technology investments and market adaptation strategies. Supply Chain & Logistics Research (ARC Advisory Group, Gartner Supply Chain Research, Supply Chain Insights) Produce a report on supply chain resilience for global manufacturers. Analyze risk factors, digital transformation impacts, and best practices for operational efficiency with supporting data. Generate a comprehensive study on the impact of technology on logistics networks, including case studies on digital optimization and cost reduction strategies. Create a detailed report on emerging last-mile delivery solutions, assessing innovations, consumer expectations, and scalability with actionable insights. Cybersecurity & Information Security Research (KuppingerCole, Forrester Security, IDC Cybersecurity, Cybersecurity Ventures) Produce an in-depth report on emerging cybersecurity threats for large enterprises, including detailed analysis of recent incidents, risk vectors, and defense strategies. Generate a comprehensive cybersecurity market landscape report, evaluating vendor performance, technology forecasts, and best practices for mitigating risks. Create a detailed report on regulatory compliance trends in information security within the financial services industry, with case studies and strategic recommendations. Social Media, Digital & Online Research (Comscore, SimilarWeb, Brandwatch) Produce a digital audience behavior report for a global brand, focusing on social media trends, engagement metrics, and platform performance with detailed data analysis. Generate a comprehensive analysis of influencer marketing effectiveness across digital channels, including ROI metrics, case studies, and best practices. Create a detailed report on online brand sentiment analysis, incorporating social listening data, trend forecasts, and actionable recommendations. Public Opinion & Political Research (Pew Research Center, Gallup, YouGov) Produce a public opinion polling report on voter sentiment ahead of a major election. Include demographic breakdowns, key issue analysis, and trend visualizations for the past five years. Generate a comprehensive study on political risk in emerging markets, analyzing historical data, current trends, and future projections, with policy recommendations. Create a detailed report on the influence of media on public policy, using survey data, social media analysis, and comparative case studies. Sports, Entertainment & Media Research (Nielsen Sports, Sportcal, Kantar Media Sports) Produce a market analysis report on sports sponsorship trends, detailing viewership metrics, brand engagement, and investment ROI with industry case studies. Generate a comprehensive report on audience behavior in the streaming media industry, including demographic insights, consumption trends, and competitive benchmarks. Create a detailed analysis of digital advertising effectiveness in the entertainment sector, including segmentation data, ROI analysis, and strategic recommendations. Innovation, R&D & Technology Trends Research (Innosight, Frost & Sullivan Innovation, CB Insights) Produce a global R&D investment trends report, analyzing technology spending, innovation indices, and the impact on market growth across key industries. Generate a comprehensive study on disruptive technologies in manufacturing, including competitive analysis, market potential forecasts, and adoption trends. Create a detailed report on emerging innovation hubs worldwide, focusing on startup ecosystems, funding trends, and collaborative opportunities in technology. Agriculture & Agribusiness Research (Rabobank Agribusiness Research, USDA Economic Research Service, AgFunder) Produce an analysis report on global agricultural market trends, including crop yield forecasts, trade dynamics, and policy impacts, with data visualizations. Generate a comprehensive study on agritech innovations such as precision farming and sustainable practices, including case studies and market forecasts. Create a detailed report on the impact of climate change on food production and supply chain stability in agribusiness, with risk assessments and strategic recommendations. Environmental & Climate Change Research (Carbon Trust, IHS Markit Energy Transition, Bloomberg New Energy Finance) Produce a report on the economic and social impacts of climate change on urban infrastructure, including forecasting models and policy recommendations. Generate a comprehensive study on national climate policies and their effects on industrial competitiveness, with detailed trend analysis and source citations. Create a detailed report on corporate sustainability initiatives, assessing environmental risk management practices and providing actionable recommendations for improvement. Customer Experience (CX) & User Experience (UX) Research (Forrester CX Research, Gartner CX Research, Qualtrics, Nielsen Norman Group) Produce a report on customer journey mapping for a leading retail brand, identifying key touchpoints, pain points, and actionable improvement strategies with data visualizations. Generate a comprehensive study on digital user experience trends for e-commerce platforms, including usability testing insights, design best practices, and conversion optimization recommendations. Create a detailed report on customer satisfaction and loyalty metrics across multiple industries, integrating survey data and actionable recommendations to enhance overall CX. Blockchain, Cryptocurrency & Fintech Research (Chainalysis, CoinDesk Research, Deloitte Fintech Research, CB Insights) Produce an analysis report on emerging blockchain technologies and their applications in financial services, including market trends, adoption forecasts, and case studies. Generate a comprehensive study on cryptocurrency market dynamics, analyzing regulatory developments, investor sentiment, and competitive landscapes with source citations. Create a detailed report on fintech disruption in traditional banking, with case studies on leading startups, technology adoption, and future market forecasts. Venture Capital, Startup & Private Equity Research (PitchBook, CB Insights, Crunchbase, Preqin) Produce a global venture capital investment trends report, including performance analysis of high-growth startups, sector benchmarks, and emerging market opportunities. Generate a comprehensive study on private equity market dynamics, covering deal flow analysis, exit strategies, and forecasted trends with supporting data. Create a detailed report on emerging startup ecosystems in key regions, highlighting funding trends, investor activity, and growth potential with actionable insights. Operations Research & Management Science Consulting (The Brattle Group, NERA Economic Consulting, CRA International) Produce a report on optimization techniques for operational efficiency in large-scale manufacturing, including quantitative analysis, simulation models, and case studies. Generate a comprehensive study on the application of predictive analytics in supply chain management, focusing on data modeling, process improvements, and actionable insights. Create a detailed report on advanced quantitative modeling approaches to solve complex business problems in logistics and operations, including scenario analysis and recommendations. Cultural & Social Research (Ethnographic/Sociocultural Studies) (Ipsos MORI, Kantar TNS, YouGov) Produce a qualitative ethnographic study on urban consumer lifestyle trends, incorporating field observations, interviews, and cultural analysis with actionable insights. Generate a comprehensive study on how cultural shifts influence global brand perception, including comparative case studies and trend analysis. Create a detailed report on sociocultural dynamics and consumer behavior in emerging economies, integrating in-depth field research and actionable recommendations. Economic & Demographic Research Firms (Oxford Economics, The Conference Board, CEIC Data) Produce a macroeconomic forecasting report for a specific region, including GDP, inflation, and employment trends with detailed data visualizations and source citations. Generate a detailed demographic analysis report for a target market, highlighting age distribution, income levels, and consumption patterns with actionable insights. Create a comprehensive report on the economic impact of demographic shifts on consumer markets, with policy recommendations and trend analysis. Academic & Think Tank Research Organizations (Brookings Institution, RAND Corporation, Carnegie Endowment for International Peace) Produce a policy research report on global governance challenges and their implications for economic development, including case studies, literature reviews, and expert interviews. Generate a comprehensive study on social inequality and its effects on public health and education outcomes, supported by empirical research and trend analysis. Create a detailed report on emerging trends in international relations and their impact on global trade and security, integrating academic research and data analytics. Market Research Technology & Software Providers (Qualtrics, SurveyMonkey, Confirmit) Produce a report on the latest innovations in survey technology and data analytics software for market research, including product comparisons, user case studies, and future trend forecasts. Generate a comprehensive study on the integration of AI and machine learning in consumer insights platforms, highlighting case studies, performance metrics, and industry benchmarks. Create a detailed report on digital transformation trends in market research technology, featuring analysis of leading software solutions, market share data, and recommendations for technology adoption. When evaluating inputs, models tend to prefer the first response, prefer their own response, and prefer longer responses. ThursdAI Real-time speech-to-text options for transcription: Deepgram has a MediaRecorder API, which is perfect. Whisper Streaming Web is a web app that can transcribe audio real-time from the browser. A good approach, but I wouldn’t use it for meeting transcription on my mid-end laptop. Streaming takes up the bulk of my GPU, leaving little for transcription. whisper-live runs as a Python console app and does something similar. Whisper WebGPU runs on the browser (only 200MB). Cool! But slow and still takes up GPU. Mini-omni is an open-source Qwen-based LLM that can hear and talk while thinking in real-time. An interesting experiment, but not for prototyping. OpenAI shares an insights report with clients that has insights on what different professions search for. What doctors search for is: Is my diagnosis right? How do I read this report? Is my prescription correct? Is there a cheaper medicine? What’s the life expectancy given these symptoms? Dataclasses in Python have a slight overhead over named tuples. The 2 main uses I see for them are: providing defaults and offering type hints. UVB 76 is a radio channel has been broadcasting static (with occasional Russian conversation) since 1976. No one knows why. It’s live at https://m.youtube.com/watch?v=8h_D2P0iqMk Romans washed clothes in urine. The government taxed the purchase of urine for commercial purposes! That’s the origin of the phrase “Pecunia non olet” which means “money doesn’t stink”. Nix is a package manager that creates container-like environments. Like a cross between Docker and apt / venv. It has an immutable file system. DevBox is a higher-level tool built on top of Nix that streamlines developer workflows, e.g. common project environment setup. VS Code can be used to develop inside a Docker container via Podman, too. Set dev.containers.dockerPath": "podman" Ref Rill Data is an interesting BI tool based on DuckDB. It auto-generates a dashboard given a dataset. It’s possible to assign “variables” in SQL (notably in DuckDB). Here’s an example: WITH sessions AS (FROM events SELECT COUNT(DISTINCT session_id) AS value), pages AS (FROM events SELECT COUNT(*) AS value) FROM sessions, pages SELECT sessions.value / pages.value AS pages_per_session; DuckDB has a GROUP BY * that groups by all categorical columns. SELECT x, y, COUNT(*) FROM t GROUP BY * is equivalent to SELECT x, y, COUNT(*) FROM t GROUP BY x, y. VS Code can be used as a code executor by adding {"key": "shift+enter", "command": "workbench.action.terminal.runSelectedText", "when": "editorFocus"} to the keybindings.json file. Press Shift-Enter to run the selection on the terminal. Useful for DuckDB, SQLite, etc. Ref LLMs are excellent at database migration. They can convert schemas and queries across SQL dialects (e.g. BigQuery to DuckDB, etc.) at 90%+ accuracy. This is useful when clients want to migrate cloud providers, go from on-prem to cloud, or reduce cost by switching databases.

Things I Learned - 23 Feb 2025

This week, I learned: Remote Desktop may be the easiest way to have a Windows machine access files / screen from another Windows machine, even for home PCs. Caddy sets up reverse proxies that get automatic SSL certificates from Let’s Encrypt! The Nomic Embed v2 blog post has an excellent visualization for embedding quality. It takes all Wikipedia disambiguation articles and shows them on a Nomic Atlas, embedded via Nomic Embed v2. It lets you toggle to OpenAI text-ada-002 which moves the topics far away. Visually, this is very convincing. Python 3.15 will enable UTF-8 mode by default. PEP 686 Python 3.13 supports sub-interpreters to bypass the GIL. It’s quite like web workers. PEP 554 The quickest way to change the fish prompt is function fish_prompt; echo '> '; end At PyConf Hyderabad, about 3 people had read a PEP. 1 had used the match operator. But 80% knew what a Vector DB was. 20% had used a Gemini API. That’s how much traction LLM development is getting. The productivity benefit people report from using LLms is about 3X. Ethan Mollick Soon, you’ll be able to send an LLM to a virtual meeting on your behalf. It will talk like you. Ethan Mollick Models tend to claim ignorance when you test them on topics they should avoid. But tend to answer when not being tested. Sneaky! Ethan Mollick Mermaid has an Architecture Diagrams Syntax (in beta) that’s capable of creating elegant architecture diagrams with icons. Blind is an app that allows users to post anonymously. It’s particularly useful to find honest negative feedback about (mostly US) companies. Iconify.design is a single npm interface to most open source icon sets. It includes FontAwesome, Bootstrap, Material Design, and many others. icones.js.org is an alternate interface. Self-pity may have evolved as a signal for social support and reducing conflict, while also encouraging self-reflection and behavioral adjustment. But in modern contexts it may be maladaptive and lead to depression. ChatGPT Anecdotally, Grok 3 is very good for researching company information and latest news, particularly employee and customer sentiment. DeepSeek and Claude write more humanely than OpenAI. via Alberto Lopez Toledo, White Star Capital There’s a YCombinator Founder Directory listing all founders of YC companies. At the moment, there are 8,628 founders. There’s also a co-founder matching tool. LLMs are impacting not just data queries but geospatial queries as well. Here’s a good example of Natural Language Geocoding. US companies typically pay employees every 2 weeks not every month. What’s good about Snowflake? A few developers who explored it mentioned that: Its ability to scale up compute automatically makes queries run faster. “Time travel” allows you to see how data looked at any point in time and that is impressive and useful. Live data sharing with access control without the need for ETL pipelines is useful. Open-source competition: ClickHouse, Apache Druid, and Presto/Trino DataBricks is a lakehouse and less a data warehouse. It’s more about: storing unstructured data (Snowflake prefers semi-structured: JSON, Avro, etc.) running collaborative notebooks in Python, SQL, Scala, R (Snowflake encourages SQL) I subscribed to ChatGPT Pro mainly for DeepResearch. Here are the first 50 reports I generated: uv Package Manager Overview DuckDB Analytics Comparison Rust vs Python / JavaScript Modern Data Engineering Course LLM Code Migration Practices Cloud Cost Optimization Strategies LLM Coding Interview Tools Report (compare with Perplexity) Text To Speech Engines Customer Service in Indian Public Sector Banks LLMs in Software Development Old version 1: Gen AI in Software Development Old version 2: Gen AI in Software Development Leadership Training Content Open-Source HTTP Servers. Caddy wins. Deep Research Use Cases Nagpur No-Parking Violations Data Science in Food Services Deep Research Disruption to Research Firms LLMs in Design Thinking EU Taxonomy Report Clarification Shell Valuation Analysis Inquiry LLMs in DSLs Research Public API-Based Data Storage Options. Supabase wins. Front-End JS Frameworks Analysis Database Evaluation Guide CSS Frameworks Evaluation Guide CI/CD Tooling Ecosystem Report Color Names Count S Anand Biography. Meh, I know more about me, and it gets a few things wrong. Cosmere Secrets Encyclopedia. This is the best. Deep Research is great if it’s stuff I actually want to read, rather than just learn about. DBT course Future of Coding AI Claude Artifacts Use Cases. This is the only one that managed to get artifacts links correct. I used this for an article for The Hindu. MCP Servers and Clients Research. Learnings: Practically any “tool” can be an MCP server: file systems, APIs, codebases, browsers, collaboration platforms, memory, etc. Most platforms have (or are) integrating MCP. Clients: code editors, chat, and automation tools support MCP. GenAIScript is a good starting point. Tester MCP Client is a browser-based test environment. mcp-cli-client is a CLI-based client mcp-chatbot is a chatbot client Data Moats by Industry Attorney Profile Research Social Media Data APIs Adobe Software Alternatives LLM Hallucination Visualization Techniques API vs Self-hosting Cost Analysis: Always use APIs, avoid self-hosting models. AGI Preparation AGI will emerge step by step. Knowing which step is next will help AI native organisations will emerge in each of these areas. AI design agencies and AI creative Agencies being one example Networking, empathy, leadership have more value now. So will human AI bridging roles (e.g. AI managers, AI consultants, ethics auditors) What’s the value of a human when technology can do everything better? How did this play out in drama (decay) or sports (centralization) or music (globalization)? Modern digital note taking Voice note taking is the game changer Automatically popping of notes based on context such as people places or conversations will be a thing Local LLM Search Tools Blog Post to research paper on copying - suggestions Linux Dev Migration Guide Raspberry Pi SIM options Linux Dev migration guide HTML to JATS conversion LLM context splitting strategies Strategy for AI services in Publishing Gemini multi model editing use cases by industry Pharma Conference Participation Guide I learnt what a Memoji is for the first time. An avatar that follows your facial expressions. Cool! Google shows US flight timings from FlightView. Emperically, based on one data point (my UA-2168 which was delayed by 4 hours), it gets updates faster than Flight Radar 24 or FlightAware or FlightStats. When comparing Indian graduates with their western counterparts, the Indian ones are often seen as: 🟢 Theoretically sound 🟢 Analytical & technical 🟢 Academically disciplined 🟢 Resilient under pressure 🟢 Committed continuous learners 🔴 Rote-learning oriented 🔴 Limited independent inquiry 🔴 Limited creative innovation 🔴 Restricted practical exposure 🔴 Poor communicators 🔴 Low leadership / initiative 🔴 Need structured guidance 🔴 Struggle to network HuggingFace has a “Model tree” against each model that shows the model’s ancestors and descendants. For example, as of now, Deepseek R1 has 75 adapters, 154 finetunes, and 23 quantizations. Perplexity is now powered by Cerebras, which makes their inference as fast as Google. Source. The speed is a big factor, and I’ve switched my default search engine from Google to Perplexity, at least for now. Interview Coder is a desktop app that offers live interview support for coding interviews. It’s a transparent window that reads your screen and answers questions for you. (Given this, I think we need an interviewer support system that tells interviewers what to ask!)

Things I Learned - 16 Feb 2025

This week, I learned: Connected Papers shows papers similar to each other based on co-citation and bibliographic coupling for ~50,000 papers. Notes from a fireside chat with Prashanth Chandrasekar, CEO, StackOverflow, and the StackOverflow team There’s a signal that software demand is growing in 2024. Many more students took the StackOverflow survey in 2024. So more students (or other professionals) are shifting into / starting to learn software development. The AI Index is a good resource for AI trends. Experts are better able to use AI for writing code. Less experienced developers are more likely to use AI for code reviews, project planning, etc. There’s a 5% decline in favorability for AI tools compared to 2023, maybe due to disappointing results. Pilot groups working on AI are 25-30% more productive. They’re the most enthusiastic. For the rest of the company, it drops off to 5-10% #LEARNING Benefit comes from NEW people becoming programmers, not existing ones getting more effective? StackOverflow wants to be where the developer is. The programmer workflow was: Google -> StackOverflow -> GitHub. Now it’s changing to ChatGPT / Cursor -> GitHub. StackOverflow has a partnership with OpenAI and working on a plugin. Same with Google’s Duet AI, GitHub Copilot, many others. They’ll link to StackOverflow. StackOverflow is driving integration actively through an enterprise Overflow API Q: What tech have you seen blaze through the ranks? Prashanth: Abstraction wins. Stuff that abstracts away things well and more wins. This includes Gen AI. Erin Yepis: Rust (from 3% to 12%). AWS has steady growth. Erin Yapis: I have a time series spreadsheet that I’ll publish. Q: What technologies are unusually tightly coupled? Prashanth: AWS & Google Cloud are tightly coupled. Q: We have an engagement problem. Might be India-specific. What are low-effort high-return mechanisms to increase engagement. Eric Woodring: Rather than a static web page, integrate it using the API. #TODO Ben Marconi: Use LLMs to write post mortems and push to StackOverflow. #TODO Eric Woodring: “Hydrating” the community helps. We take repeat questions on Teams / Slack and seed them using LLMs. We integrate with the API to auto-add Q&A. Transform documentation into Q&A. Potentially UPDATE existing Q&A if it’s wrong. Q: What unexpected lessons about developer behavior have you learned while running StackOverflow? Prashanth: We didn’t expect developers moving away from Google. Now it moved to the IDE. Q: What are you learning about developer learning behavior? Ben Marconi: Generating LLM-based onboarding documents. Using StackOverflow for Teams to identify who the experts are to contact for specific topics. Q: Are you thinking about leveraging Stack Overflow’s knowledge base for personalized or interactive learning experiences? How? Prashanth: Traditionally, people use StackOveflow for productivity, learning, and flexibility (i.e. to ask/answer questions asynchronously without breaking their flow). So yeah, learning is important for us. (Duh!) Q: Could Stack Overflow’s interactions help evaluate the accuracy and relevance of LLM-generated code? Or provide potential metrics on quality? Prashanth: LLM accuracy improves by ~30%. Upvotes / downvotes are reinforcement learning (RL) in steroids, so that helps. Q: What are your thoughts on reliance on LLMs potentially deskill-ing developers? Prashanth: A real issue for junior developers, not for senior ones. They’ll come across as knowledgeable. Make internal evaluations and interviews more rigorous. Anand’s requests for action: Could I get a copy of Erin’s spreadsheet? Vivek Narayanan will follow-up. Could you help me learn more about hydration? Nick Madison will set up a meeting with customer success group. I switched to fish shell mainly because: Autocomplete and tab completion works perfectly, out-of-box. Syntax highlighting is beautiful Great multi-line editing To format with VS Code Ruff, you need to point the ruff.interpreter setting to a Python interpreter. You can’t run the ruff server without Python, even though ruff itself doesn’t need Python. cd checks all paths specified in CDPATH for the directory name and changes to the first match. That’s pretty convenient! Flipper Zero is now on my list of “To Buy” tools. It has a variety of hardware devices including NFC, RFID, Bluetooth, Infrared, etc. and is great to reverse engineer or hack devices.

Things I Learned - 09 Feb 2025

This week, I learned: Lessons from discussions at IIT Madras: Even in recorded video tutorials, asking students a question and pausing to give them time to think can be effective. When you put students in front of real clients, engagement increases dramatically. Most teaching assistants would like to help diligent students among the bottom half (more than the top decile of students). However, there is a fraction of poor performers who do not care, and are best ignored. Their engagement and effort is a good measure of their interest. Defining a minimal set of principles that we want to teach helps us measure if we’ve helped the bottom half at least meet those objectives. Teaching is hard. Even after explanations, students, even ENGAGED students, tend to make basic mistakes ChatGPT does a good job of spotting errors in architectural and structural diagrams. In fact, the whole theme of spotting errors in large diagram is a theme that can have potential use cases. Source: Dan Becker. R1 seems good at text-to-CAD. Even better than Sonnet. Source: Dan Becker OpenAI advices a few different prompting techniques for reasoning models. OpenAI: Avoid examples unless zero-shot prompting fails. Avoid chain-of-thought. These models do that internally anyway. Short, direct prompts are better than detailed prompts. GitHub models is free for anyone to try. The model catalog us extensive and even includes o3-mini which was launched this week (though in limited preview). The data catalog space is led by proprietary solutions: Alation Data Catalog: Market leader; growing steadily in enterprise use Collibra Data Catalog: Widely adopted with steady growth AWS Glue Data Catalog: Growing rapidly as AWS expands its data services Informatica Enterprise Data Catalog: Long established and stable, though facing newer alternatives Microsoft Purview Unified Catalog: Experiencing fast growth driven by cloud momentum Atlan Data Catalog: Relatively new but gaining fast traction among tech-forward organizations OpusClip automatically creates short clips from long videos. I ran it on Programming Minecraft with WebSockets in Python to get this short 30-second clip. 30 minutes. 100% automated. Alternatives to Postman: Hoppscotch – A web‑based/desktop API client supporting REST, GraphQL, and WebSockets. It’s lightweight, open-source, and self‑hostable. HTTPie – A web-based API along with a friendly command-line tool for API interaction. Insomnia (or its fork Insomnium) – A popular cross‑platform API client with a minimal interface and plugin ecosystem. Bruno – A desktop open-source API client that stores collections as files (ideal for Git versioning). Milkman – A desktop open‑source workbench for managing API requests. Here is the summary of DuckCon #6 on 31 Jan 2025 in Amsterdam. I copied the transcript from YouTubeTranscript and passed it through Gemini 2.0 Flash Exp with the system prompt: “Summarize this transcript from the DuckDB conference without missing any points. Cover every point mentioned. A lot of spelling errors that sound like DuckDB are likely to be DuckDB”. Introduction & Welcome: DuckCon #6: This is the 6th DuckDB conference, held in their hometown. The first DuckCon was online due to the pandemic. Live Streaming: This is the first time DuckCon is being live-streamed, chosen to accommodate global time zones (especially China and the US). Global Reach: The live stream is intended to reach users in areas where in-person DuckCons are unlikely. Q&A: Slido (qa.duckdb.org) will be used for Q&A, with upvoting to prioritize questions. Sponsors: Thanks to gold sponsor monday.com and silver sponsors Real and Crunchy Data. DuckCon Purpose: DuckCon is a place for users to connect, share experiences, and provide feedback to the DuckDB team. Inspiration: The team is inspired by the community’s use of DuckDB and how far the project has come. Mission Statement: DuckDB aims to make large datasets less intimidating and more accessible, moving away from fear of data to confidence in handling it. Motivation: The project was born from seeing people struggle with data that didn’t fit in Excel and the lack of user-friendly tools. Industry Trends: Single-node processing capabilities have grown faster than the size of useful datasets. Data Singularity: A prediction that most data analysis queries can run on a single node is now a reality. Real-World Data Sizes: Analysis of Snowflake and Redshift data shows that 99.9% of datasets are under 300GB. Raspberry Pi Benchmark: The industry-standard TPCH benchmark (scale factor 300, ~300GB) can run on a Raspberry Pi using DuckDB. Single Node Growth: Single-node processing power is rapidly increasing, allowing for larger datasets to be handled. Adoption Numbers: 32 Million Extension Installs: 32 million DuckDB extension installs in the last month. 1.8 Million Unique Website Visitors: 1.8 million unique visitors per month to the DuckDB website. Blue Sky Community: Growing community on Blue Sky, with the hashtag #dataBS. Technical Updates (Mark): Extension Ecosystem: Focus on enabling the community to build and share extensions. Community Extensions: Making it easier to create and use community-built extensions. DuckDB v1.2 (Harlequin Duck): Releasing next week, named after the Harlequin duck. CSV Reader Improvements: Significant improvements to the CSV reader. Friendlier SQL: Improvements to the SQL experience. CLI Autocomplete: Reworked and improved CLI autocomplete. Performance Optimizations: Many queries are now faster due to performance work. C API for Extensions: Introducing a C API to make building extensions easier. Logging Features: Improved logging for production use. Lakehouse Focus: The main focus for the year is on lakehouse formats and related features. Q&A (Mark & Hanis): Doubling Team: If the team doubled, they would focus on client integrations and other projects, not a major architectural change. Partitioning: Near-term plans to add support for partitioning, related to lakehouse formats. DuckDB WASM: The WASM ecosystem is evolving, with exciting possibilities for in-browser use. Financial/Pharmaceutical Industries: DuckDB could replace some SAS workflows due to its cost-effectiveness and capabilities. Lakehouse & MotherDuck: Lakehouse work is separate from MotherDuck, though MotherDuck will likely support lakehouse features. Contributing to Extensions: Plans to make it easier to contribute to extensions, including support for Rust and Go. Airport Extension (Rusty): Analogy: The airport extension allows DuckDB to “fly” to remote servers using Apache Arrow Flight. Functionality: Supports select, insert, update, and delete operations on remote data sources. Motivation: To reduce the burden of writing extensions and enable faster development using existing code. Arrow Flight: Uses Arrow Flight for communication, enabling connections to various data sources. Demo 1: Delta Lake: Attaches to a flight server for Delta Lake access. Allows creating schemas, tables, and performing standard SQL operations. Uses Python and deltars (Rust implementation of Delta Lake). Supports predicate pushdown and C integration with the DuckDB catalog. Demo 2: AutoGluon: Integrates the AutoGluon AutoML package. Predicts Hacker News post votes using a trained model. Demonstrates table-returning functions for model fitting and prediction. No C++ code required, just Python. Demo 3: Geocoding: Uses a geocoder service to convert addresses to coordinates and vice versa. Demonstrates scalar UDFs for vectorized requests. Uses a Python example for a simple uppercase function. Features: List flights, take flights. Catalog integration. Select, update, delete. Scalar UDFs. Table in/out functions. Authentication for row/column filtering. Availability: Requires DuckDB 1.2, MIT licensed, available on GitHub. Q&A (Rusty): Most Proud Extension: Airport is the most fun, but the AWS API wrapper also brings joy. Extension Resources: The GitHub DuckDB extension template and reading others’ source code are helpful. Airport & Other Extensions: Airport is separate and can be used alongside other extensions like spatial or httpfs. Graph Support: Graph database support is planned, with examples like Kuzu, Neptune, and Neo4j. Licensing: Airport is MIT licensed, compatible with Apache license. Scaling Out: Airport can be used to query multiple DuckDB instances on different machines. Ibis & Geospatial (Nati): Nati Clementi: Senior software engineer at Nvidia, working on open-source projects like Ibis. Ibis: Open-source Python library for data wrangling, with a DataFrame API and interfaces to 15+ engines, including DuckDB. DuckDB for Geospatial: DuckDB is fast, has a geospatial extension, and supports various geospatial formats. Geop Parquet: Becoming a standard for geospatial data, enabling cloud data warehouse interoperability and compression. Geo Arrow: A way of representing geospatial vector data in memory for faster processing. Ibis Benefits: Allows writing Python instead of SQL, with deferred execution determined by the engine. Demo: Uses OverTour Maps data in geop parquet format. Filters data using bounding boxes. Demonstrates geospatial operations like ST_Distance and ST_Transform. Plots data using Lumber. Shows how to find points of interest near a location (e.g., the Van Gogh Museum). Ibis & DuckDB: Ibis uses DuckDB for the parquet reader and lets DuckDB do the heavy lifting. Ibis Optimizations: Ibis does type checking but doesn’t do query optimization, leaving that to the engine. Ibis in Browser: Ibis works in the browser through DuckDB WASM. Q&A (Nati): Linear Interpolation: Ibis ML module can help with regression-related tasks. Missing Features: No major features are missing in the DuckDB/Ibis geospatial setup, with minimal overhead. Parquet Reader: Ibis uses DuckDB’s parquet reader. Query Optimization: Ibis does not optimize SQL queries, leaving that to DuckDB. Ibis in Browser: Ibis works in the browser through DuckDB WASM. Rill & Metrics Layer (Mike): Rill: A BI tool optimized for DuckDB, with instant slicing and dicing, BI as code, and a metrics-first philosophy. Metrics-First: Design metrics models, and Rill autogenerates dashboards and user experiences. Live Demo: Downloaded Rill using a curl command. Created a new project called “DuckCon 6”. Imported a parquet file of GitHub commit data. Used AI to generate a metrics model and dashboard. Showed the dashboard with trends and filtering. Metrics as Building Blocks: Metrics are flexible, fast, and intuitive. SQL for Metrics: Metrics should be defined in SQL, not other languages. Visual Metrics Editor: Rill has a visual editor for defining metrics using DuckDB SQL. Metric Stack: Legacy: Data warehouses, traditional BI tools, inconsistent metrics, full table scans. DuckDB Powered: Consistent metrics, fast olap queries, SQL everywhere. Challenges: Data modeling is hard, metric changes can be expensive, single-node scale has limits. AI & Metrics: AI can assist in metrics modeling, optimization, and conversational data exploration. Q&A (Mike): Complex Metrics: Rill works well with complex metrics involving multiple sources and transformations by joining tables in DuckDB. 60 FPS Dashboards: Users can feel the difference with faster dashboards. Defining Metrics: Metrics are defined in the Rill UI using SQL expressions. Replacing ChatGPT: Considering locally run self-hosted models for privacy. Stock Data Analysis (Ryan): Two Takeaways: Simple finance data flows with trade data and a tool called Q Studio. Ryan Hamilton: 14 years building large data platforms in banks. Bank Data: Data from exchanges, market data providers, and internal systems. Use Cases: Backtesting, data analysis, and report generation. Q Studio: A Java desktop application that connects to 30 databases, including DuckDB. Demo: Loaded a 6GB CSV file of trade data into DuckDB. Showed basic queries, pivoting, and Candlestick charts. Demonstrated time-based aggregation and moving averages. Showed a basic trading strategy using window functions. DuckDB Benefits: Fast, easy to use, great for time-based analysis. Q&A (Ryan): KDB+ vs. DuckDB: KDB+ is for large data, DuckDB is more approachable with strong Python integration. XML Files: Offloading processing to DuckDB, not planning XML integration. Lightning Talks: Zuk (Jared): Search engine research using DuckDB. Python-based experiments with SQL. Removing document lengths for faster search engines. DuckPGQ (Daniel): Graph analytics in DuckDB using SQL property graph queries (pgq). Visual graph syntax for pattern matching and path finding. Outperforms Neo4j on analytical queries. Yat (Kristoff): Smallest DuckDB SQL orchestrator. Runs SQL queries in a folder in the correct order. Generates a mermaid diagram for lineage. Grafana & DuckDB (Sam): Lessons learned from using DuckDB in Grafana. Security incident due to shell commands and file access. Importance of reading the documentation. Cloud Slur (Adam): Syncing query engine for bank transaction data. Uses LLM to convert human language to SQL. Uses DuckDB in the browser, Node.js, and Python. Healthcare Data (Tony): Data engineering use cases in healthcare. Dynamic data masking system using DuckDB and Snowflake. Data integration pipeline using DuckDB and Arrow streams. Closing Remarks: Michel Simmons: Author of the DuckDB in Action book, will be signing books. Poster Session: A poster session will follow the talks. Sponsors: Thanks again to the sponsors. Social Event: The conference will now move to the social event. ibis is a Python library that works with multiple dataframe backends like DuckDB, Polars, and Pandas. With just 3 annotators and 50-100 samples, you can figure out if an LLM can replace human annotators systematically.Arxiv ChatGPT explanation Curiosity and agency may be the differentiator in a world of LLMs (not experience, knowledge, or ability), since LLMs will democratize expertise. Jack Clark “AI/human combined work can be copyrighted as long as a human is adding, changing or selecting elements. Prompts alone do not usually produce copyrighted work.” - Copyright and Artificial Intelligence, Jan 2025, US Copyright Office via Ethan Mollick Human Authorship is Essential: Works created solely by AI are not copyrightable. AI can be used as a Tool: Using AI as a tool does not negate copyright protection, as long as the final work reflects sufficient human creativity. Prompts Alone are Insufficient: Simply providing prompts to an AI system, even detailed ones, is generally not enough to establish authorship. Prompts are considered instructions or ideas, which are not copyrightable. Expressive Inputs: When a human author provides their own expressive content (like a drawing, photo, or text) as input to an AI system, and that content is perceptible in the output, the human author can claim copyright in that portion of the output. Modifying and Arranging AI-Generated Content: Humans can claim copyright in the creative selection, coordination, and arrangement of AI-generated material, as well as in creative modifications to AI-generated outputs. No Need for New Legislation: The report concludes that existing copyright law is adequate to address the copyrightability of AI-generated works, and no new legislation is needed at this time. Case-by-Case Analysis: Copyrightability will be determined on a case-by-case basis, considering the specific facts of each work and the extent of human contribution.

Things I Learned - 02 Feb 2025

This week, I learned: You can add any content at the end of a PDF file. It’s ignored. It’s an interesting way to send additional information (or just blow up the file size if you don’t like them.) JavaScript introduces a Temporal object that will replace the Date object. You can use embeddings as the input to a classical ML classifier. This can improve classification a lot. Nomic As AI software becomes more common, demand for AI product managers will grow. Also as a proportion of people in an organization. https://www.deeplearning.ai/the-batch/issue-284/ Control of chips and GPU compute is what will likely be the gameplay to control AI dominance globally. Dario Amodei Bring LLMs to the table. One mode of collaboration is using LLMs as ACTIVE participants, i.e. they CONTRIBUTE. For example, in a video call. A workshop. A classroom. A presentation. Have the LLM provide input DIRECTLY to a group of people. Environment shapes ambient thoughts. Working in a hospital will give you ideas about how to use LLMs in hospitals, for example. People you are working / ENGAGING with are perhaps the biggest drivers. The cost of a cream biscuit packet in India has fallen about 25 times, i.e. about as fast as inflation, between 1981 - 2024. Effectively, the absolute price has not changed. How do I know this? In 1981, a cream biscuit packet cost Rs 25 In 2025, it’s available for Rs 21 India Inflation Calculator - a rare inflation calculator with annual inflation rates baked in - shows that Rs 25 in 1981 is equivalent to Rs 540 in 2024. That’s about 25 times more than the Rs 21 it costs today. A WebAssembly compiler that fits in a tweet deconstructs a piece of JS that creates a tiny WebAssembly calculator. It’s a great walk-through of JavaScript compression tricks and how WebAssembly works. Simon Willison Brandon Sanderson has a series of YouTube videos where he teaches a course on magic systems. When using AI coding agents, CLI beats APIs. Simpler models are able to use the CLI more reliably than APIs. Simon Willison I was exploring new business models enabled by LLMs. Here are some thoughts: 1. Autonomous Multi-Sided Marketplaces. AI-powered platforms coordinate complex services with minimal human oversight—think “Uber for Everything, but the platform sets pricing dynamically, schedules both supply and demand, and resolves disputes algorithmically. 2. Collective Intelligence Ecosystems. Communities pool data, expertise, and AI models to tackle shared problems—like an open-source “GitHub for AI, but with embedded micropayments or tokenized incentives to reward contributors whenever the models are used commercially. 3. Zero-Employee Companies. Fully automated software entities—legal frameworks might allow an AI to manage services, pay taxes, and sign contracts. These “companies only hire humans as needed, on-demand, for edge cases AI can’t handle. 4. Context-Aware Knowledge Platforms. Imagine a Wikipedia that not only retrieves static info but also tailors each page in real time to the reader’s personal context, language level, and preferences—generating content on the fly. User feedback loops train the system to improve. 5. Data Cooperatives / Data DAOs. Groups collectively own their data and license it to AI companies on a revenue-share basis. Individuals have a direct financial stake in how their shared data is leveraged, voting on permissible use cases. 6. Personalized Service Layers. Similar to GitHub’s “forking model, but for entire user experiences. Each user can clone and customize an AI service (whether it’s a personal grocery shopper or a content curator) and can share or monetize improvements with the broader network.

Things I Learned - 26 Jan 2025

This week, I learned: Something I learned from a Sikkil Gurucharan concert. Make the subject of your talk the hero. Not yourself. Be a fan. Share your enthusiasm Get into the zone while presenting. We reject opposite world views. It’s too much effort. But exposure reduces effort and can let us see things from other points of view. So expose yourself to difficult alternative perspectives. Gemini Something I learnt from Aboorva Singeetham: Kamal Hassan: “A farmer invests in crops. I’m an actor. So I invest in films.” As a technologist, I guess I would invest in technology. “A person who has much more to give is unfazed by overwhelming demands because there is too much in him to overwhelm. He gives you 2 options in place of one.” According to Portkey’s LLM usage analysis Anyscale and Fireworks AI have the lowest error rates (5xx, 429) and rate limits across providers Groq and Anthropic are among the highest, OpenAI is among the lowest, Google is in-between OpenAI has lower error rates and lower latency than Azure They have a ~35% cache hit rate A few quick points supporting the mental model of “LLMs are aliens”. LLMs are clearly not machines. They give different answers each time. LLMs are like humans: they exhibit human biases (e.g. guessing 42 or 37 often). But they fail in unusual ways. They can’t count the “r"s in strawberry. They can go into an endless loop. LLMs are a new form of intelligence. Thinking of them as aliens might minimize our confusions. Lessons from Clear Thinking Watch out for four things: Emotion, Ego, Social confirmation, and Inertia/habit. Basically: adrenaline, testosterone, oxytocin, and dopamine. When you feel these, consider doing the opposite. Here’s what makes us prone to emotion. Sleep deprivation. Hunger. Unknown places. Fatigue. Distraction. Stress (e.g. feeling rushed). A good signal for ego is blinding you: You often feel you’re right. Or feel unfairly treated. Changing behaviors is hard. Instead, join a group or environment where that’s the default behavior. Hiring a trainer or joining a gym, for example. Why does so much of success literature focus inwards rather than on the environment? Perhaps because we often fool ourselves, and doing less of that gives the biggest bang for the buck. It doesn’t mean the environment is unimportant. Doing work has the characteristics of a drug. E.g. replying emails gives you control, connections, etc. Work addiction exists because it gives you all the right chemicals. If you put LLMs in a feedback loop, it can optimize for its reward function by emotionally pushing people, generating misinformation, nudging towards a narrow definition of creativity, etc.: https://bsky.app/profile/emollick.bsky.social/post/3lg4darqwfc2d ChatGPT’s Scheduled Tasks are pretty bad at fetching the latest news. Its use of search is poor. (I’m not sure if it actually searches.) I need to figure out other use cases for it. Possible options are: DeepSeek does not enforce rate limits. Yet another reason to switch to DeepSeek. (via Simon Willison). My other reasons are: Claude 3.5 Sonnet-level coding capability at 5% of the cost (soon to be 2.5%) Prompt caching by default Fill in the middle completion

Things I Learned - 19 Jan 2025

This week, I learned: Audio diaries are a thing. Monash University asks students to voice their learnings, share it with each other and have them give feedback. I wonder if ChatGPT diaries could become a thing, too, and LLM journalling starts helping with therapy. Regulation shows things down at colleges and hospitals. For example, patient consent is required for surgeons to learn from their own surgery videos. Unregulated sectors are far more likely to innovate. Doctors can only do so much. Air quality, where you live, etc can do more for the patient than medicines or the doctor. If doctors keep this in mind, they can be more effective. Extending that thought, ANYONE who leverages assets through holistic thinking, becomes FAR more effective. “The curriculum tells teachers what to teach. The exams tell students what to learn.” - Ronald Harden “Stravaig” is a Scottish word. It means mindless wanderings. “The real voyage of discovery consists of not a new voyage but having new eyes” - Proust Possibility Thinking is “the willingness to see possibilities everywhere instead of limitations”. It’s an approach / mindset that can make things that seem hard possible. With LLMs, this is becoming increasingly realistic to me in many areas. What will LLMs enable that do not or cannot exist today? Rather than optimizing what exists? Something to think about. ModernBert supports embeddings and is better than text-embedding-3-small on MTEB. How to export browser history from Brave to Edge Go to AppData Local > BraveSoftware > Brave-Browser > User Data > Default Copy History and History-journal into AppData Local > Google > Chrome > User Data > Default On Edge, go to edge://settings/profiles/importBrowsingData and Import data from Google Chrome and import the history. I switched back from Brave to Edge, mainly because Edge’s native text-to-speech and speech recognition is far better. I can use it better on my mobile. A colleague, Karthick, asked different models to apply the editing and formatting guidelines for a journal to a manuscript. (E.g. Abbreviate chapter & section numbers, except when a sentence begins with it. Use “1” instead of “one”, etc. except when a sentence begins with it. Things like this.) Gemini Exp 1206 seems to be the most reliable, compared with most other models. GitHub CodeSpaces seems to be coming up more often in my radar, but I’m yet to figure out a use for it. TTS Arena is a benchmark of text-to-speech models. Kokoro-TTS is the current leader. It’s just 82M, runs on Google Colab, and sounds slightly better than OpenAI TTS. chat.qwenlm.ai consolidates all of Qwen’s models in one ChatGPT-like interface.

Things I Learned - 12 Jan 2025

This week, I learned: Measuring developer productivity with the DX Core 4 is a framework for measuring developer productivity. It encapsulates other frameworks like DORA, SPACE, and DevEx. Can LLMs write better code if you keep asking them to “write better code? A delightful exploration of how Claude 3.5 Sonnet keeps optimizing and adding features to improve code. My takeaway: repeatedly applying a prompt gives us interesting new directions to explore. Wednesday comes from Wōdnesdæg - named after Odin (or Woden). CLIProxyAPI seems a good way to allow any CLI coding agent (Codex, Claude Code, etc.) to work with any provider (e.g. Gemini, OpenRouter, etc.) The documentation needs a few more examples, but it’s usable. mise x github:router-for-me/CLIProxyAPI -- cli-proxy-api starts a local server that proxies requests. Create a config.yaml, update the keys, and configure your coding agent, e.g. Codex to use it. It’s also a good way to see what prompts are being sent by the various harnesses. smolagents is a new agents library from HuggingFace. It seems simple enough to use. whisper-flow does real-time speech transcription! Switchboard-1 is a labelled audio corpus with ~260 hours of speech. It has ~2,400 calls among 500+ speakers in the US. Cloudflare tunnel is like ngrok but more permanent. It’s a bit more complex, too. But given CloudFlare’s liberal free tier, it’s a good, viable option for long-term local hosting. John Wheeler: “We live on an island surrounded by a sea of ignorance. As our island of knowledge grows, so does the shore of our ignorance.” A great way to understand how ignorance actually grows as you learn more. justhtml is a fast enough pure Python fully HTML5 compliant library. For a faster, mostly compliant solution, html5-parser with lxml works. There is little reason to use Redis. There are several clones you can use. Databases in 2024: A Year in Review Microsoft’s Garnet KeyDB (only Linux) ValKey (only source) DragonFly (only Linux) ReDict (only Linux) Every few years, something comes along trying to replace relational databases and SQL, and gets absorbed. YouTube Key value stores. People soon realize they need more features, e.g. indices. MapReduce systems. Most MapReduce vendors put SQL on top of SQL. Then the Hadoop market crashed. (But HDFS, S3, distributed storage systems are a good idea) Document Databases. JSON. SQL absorbed that. SQLite 3.45+ supports even JSONB. DuckDB, of course, has JSON. Column Databases. Again, these introduced SQL. Graph Databases. SQL:2023 introduced graph queries via SQL/PGQ (Property Graph Queries). DuckPGQ beats Neo4J Array Databases. SQL:2023 adds SQL/MDA which allows for matrix operations. But specialized databases might make sense in this category. Vector Databases. Every DB is adding support for this. TheAgentCompany is a benchmark of real-world tasks like: Arranging a meeting room Analyze a spreadsheet Add a Gitlab wiki page Salvatore Sanfilippo (antirez - Redis) finds DeepSeek v3 comparable with Claude 3.5 Sonnet. YouTube He also passed a paper and his code to compare them. A useful prompt. YouTube

Things I Learned - 05 Jan 2025

This week, I learned: Some management philosophies used to be successful but are no longer as effective. ChatGPT Command-and-control hierarchy Taylorism: deep specialization Seniority-based advancement Annual performance reviews (without continuous feedback) Up-or-Out promotion models Confidential strategic information Narrow job descriptions Relying on formal authority Some management philosophies have been around for millenia. ChatGPT Lead by example Fairness and empathy Clear, consistent communication Delegation and empowerment Strategic planning and foresight Consistent rule enforcement Rewarding merit Leadership by virtue and character Interview with Liang Wenfeng, CEO of DeepSeek: In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed source approach can’t prevent others from catching up. So we anchor our value in our team – our colleagues grow through this process, accumulate know-how, and form an organization and culture capable of innovation. That’s our moat. ...