This week, I learned:

  • Birds navigate using quantum entanglement! Guardian ChatGPT
  • DeerFlow is an open source Deep Research MCP. Lets you run deep research outside of the standard chatbots.
  • ⭐ Today, if I had to store a bunch of data files (e.g. parquet) under 1GB, I would use GitHub Releases. Here are options:
    • GitHub Releases. 2 GiB per file, unlimited total & bandwidth. 🟢 Immortal URL, versioning, easy CI publish. 🔴 Each file must stay < 2 GiB; no built-in SQL.
    • Zenodo (CERN). 50 GB per record; one-off bumps to 200 GB. 🟢 DOI assignment, archival mandate. 🔴 Occasional throttled bandwidth; no API for partial file reads.
    • Hugging Face Hub. 300 GB per repo; 50 GB per file. 🟢 Git-based, dataset tooling, lively ML community. 🔴 Large files need git-LFS; pushes via LFS can be slow.
    • Cloudflare R2. 10 GB storage & 1 M ops / month. 🟢 S3 API, zero-egress to Cloudflare Workers, fast. 🔴 10 GB cap below your 50 GB target.
    • Kaggle Datasets. 20 GB per dataset, public only. 🟢 Built-in notebooks & GPU. 🔴 No programmatic SQL API; quotas sometimes change.
    • data.world (free). 1 GB total, 100 MB per dataset. 🟢 Nice social features. 🔴 Too small for your size.
  • If I had to query a bunch of data files in an external Parquet or SQLite file, here are SQL engines-as-a-service:
    • MotherDuck. 10 GB storage + 10 CU-hrs/mo compute. Native DuckDB; no credit card; GA June 2024; monthly feature drops.
    • Datasette Cloud. Two-month trial (or 1-yr for non-profits). SQLite backend. Great UX; but not free forever for general use.
    • AWS Athena. Pay-per-TB scanned; no free tier; S3 fees after 12 mo. Costs creep quickly; free-tier S3 ends after a year.
  • Bootstrap has a .stretched-link that makes a link cover the containing block. A clever trick that I discovered when Claude 3.5 Sonnet wrote my code.
  • Discovered spray and peel paints at ArtFriend. I had no idea that was a thing.
  • Gemini Live API is the real-time equivalent from Gemini. It supports tools, search, and code execution.
  • mcp-mem0 is an MCP for memory
  • llm-min.txt compresses docs for LLMs to read optimally. Like a compressed llms.txt or context7. Usage GEMINI_API_KEY=... uvx llm-min -i $DIR #ai-coding
  • There’s a lot of action on encrypted LLM operations.
    • Responses API allows reasoning tokens to be encrypted if organizations don’t want their reasoning data to persist. Ref
    • Tinfoil (YC X25) offers an OpenAI-compatible inference API where data is encrypted from the client to the NVIDIA Hopper/Blackwell GPUs in confidential computing mode. Prompts, model weights, outputs are encrypted in transit and memory, with verifiable privacy on code running in GPU.
    • Modelyo (Israel) offers VMs/K8 clusters with encrypted GPUs across multiple cloud providers with continuous attestation, managed on Modelyo’s portal.
  • ⭐ LLMs are able to do things independently longer and longer. That’s a useful metric to track. METR: Measuring AI Ability to Complete Long Tasks.
  • If you’re looking for datasets / APIs related to research publications (especially funding), then explore:
  • To avoid Ubuntu 24 suspending on closing the laptop lid use one of these and restart:
    • /etc/systemd/logind.conf: Set HandleLidSwitch=ignore
    • etc/UPower/UPower.conf: Set IgnoreLid=true
  • UV_TORCH_BACKEND=auto uv pip install torch torchvision torchaudio installs the most appropriate PyTorch version. Ref
  • Cog is a Python based templating language. It is embedded as comment chunks in any file and replaced itself with the output of the Python code you write.
  • CloudFlare Zero Trust seems the easiest way to enable auth on static websites, especially if your DNS is already on Cloudflare. No cost
  • We could “fine-tune” system prompts automatically with evals, creating a “system prompt learning” paradim – like my promptevals. Andrej Karpathy
  • I was asked how to improve speed when building an enterprise ChatGPT clone using an API. Here’s what I’d suggest, in order:
    • Streaming. High impact, low effort.
    • Caching RAG retrieval as well as generation. High impact, low effort.
    • UI tweaks. Loading / streaming icons and progress hints ()“Retrieving context”, “Generating answer”, etc.)
    • Parallelize, if possible
    • Use model options where available, e.g. speculative decoding, models with higher speed, models with closer CDN, etc.
    • Shorten prompts
    • Persistent HTTP/2 Keep-Alive. Low impact, low effort (tweak server settings).
  • Cloudflare Vectorize, at 768 dimensions / embedding, is free for ~6.5K chunks storage at ~1,000 queries / day. For a light load like 1M 768d chunks queried 1K times a day, the cost is: ChatGPT
  • NVIDIA parakeet is a lightweight speech to text model that leads benchmarks. Installing such packages continues to be a nightmare due to PyTorch (despite uv).
  • I explored the real-time avatar space. Heygen seems to be the easiest to use, but even that is complex and expensive ($99/mo). We may need to wait a few months for avatars to explode.
  • ⭐ Model reliability is a huge enabler for performance. As models become more reliable, they can work autonomously for longer and that is another kind of scaling. Vending Bench
  • ChatGPT, Gemini, etc. have become lead generation engines. Chat Bot Optimization (CBO), is it? WhatsApp + ChatGPT
  • ⭐ Never live delete data. Mark it for deletion and schedule a deletion task. That way you have time to react to mistakes. Simon Willison
  • Pandoc has several options useful when converting Markdown to HTML (cat file.md | pandoc -f markdown -t html). My favorites:
    • --no-highlight skips code-highlighting. --highlight=pygments adds Pygments styling
    • --wrap=none doesn’t wrap the content in a single block
    • --number-sections adds section numbering (<h2>1. Introduction</h2>)
    • --shift-heading-level-by=NUM – shift all headings by NUM levels (e.g., start at <h2> instead of <h1>)
    • pandoc -f markdown-auto_identifiers drops the auto-identifiers extension that generates id=... for each heading
    • pandoc -f gfm uses GitHub flavored Markdown. Run pandoc --list-extensions=gfm to identify the extensions it uses.
    • Pandoc’s Markdown extension examples are quite extensive.
    • Auto-enabled GFM extensions:
      • alerts: GitHub-style callouts (info, tip, warning) via > [!TYPE] blocks.
      • autolink_bare_uris: Turns bare URLs into links, without needing <...>.
      • emoji: Parses :smile:-style codes into Unicode emoji characters.
      • footnotes: Enables footnote syntax with [^id] and definitions at the bottom.
      • gfm_auto_identifiers: Uses GitHub’s heading-ID algorithm: spaces → dashes, lowercase, removes punctuation.
      • pipe_tables: Enables table.
      • raw_html: Raw HTML is unchanged.
      • strikeout: Enables strikethrough with ~~text~~.
      • task_lists: Parses - [ ] and - [x] items as checkboxes.
      • yaml_metadata_block: YAML front matter for document metadata, e.g. <title>
    • GFM extensions worth enabling:
      • ascii_identifiers: Strips accents/non-Latin letters in automatically generated IDs.
      • bracketed_spans: [Warning]{.alert} becomes <span class="alert">
      • definition_lists: Term\n: Definition text becomes a definition list
      • fenced_divs: ::: {.note} block creates a <div class="note">...</div>
      • implicit_figures: Standalone images become <figure> with <figcaption>.
      • implicit_header_references: [Section] is treated as [Section][#section]
      • raw_attribute: <b>bold</b>{=html} is inserted as HTML
      • smart: Converts straight quotes to curly, -- to en-dash, --- to em-dash, ... to ellipsis.
      • subscript & superscript: E.g. H~2~O and E = mc^2^