June 2024

Things I Learned - 30 Jun 2024

This week, I learned: Amara’s law: “We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.” LLM Patterns include Evals, RAG, Fine-tuning, Caching, Guardrails, Defensive UX, Collect feedback. Notably: Defensive UX: Microsoft, Google, and Apple have guidelines for Human-AI interactions Collect feedback: Explicit and implicit Rouge and Context Precision are metrics to evaluate LLM responses that serve as a starting point – but not sufficient, usually Any word with the letters izehsglbo can be spelt on a calculator. That includes Hobbes (538804)! Via Calculator spelling Tor Browser + DuckDuckGo is good for torrent searches. Maybe the Dark Web IS the original Internet. The ad-free hacker web

Hobbes on a calculator

I just learned that any word made of just these letters beighlosz can be spelt on a calculator. That includes Hobbes! 538804 upside-down looks like this: I’m surprised I never knew that. The longest, by far, appears to be hillbillies – 53177187714

Things I Learned - 23 Jun 2024

This week, I learned: Luma Labs Dream Machine generated videos. It’s free and is of reasonable quality. Update: 6 Jun 2025. Costs $10/month LLM DataHub has LLM training datasets, regularly updated From Dan Becker on running a workshop Answer questions at the end, not in parallel in a chat, to avoid distraction Have fewer words in slides when presenting. It’s less distracting Morgan Housel Shane Parrish podcast Risk is what stops you from achieving YOUR goals. What’s risky for me may not be risky for you The lesson from compounding is that you want to optimize for duration, not return. That’s what does the heavy lifting. Survival, consistency, long term - these matter. The performance does NOT matter.

The psychology of peer reviews

We asked the ~500 students in my Tools in Data Science course in Jan 2024 to create data visualizations. They then evaluated each others’ work. Each person’s work was evaluated by 3 peers. The evaluation was on 3 criteria: Insight, Visual Clarity, and Accuracy (with clear details on how to evaluate.) I was curious to see if what we can learn about student personas from their evaluations. ...

Embeddings in DuckDB

This article on Using DuckDB for Embeddings and Vector Search by Sören Brunk shows a number of DuckDB features I wasn’t aware of. DuckDB can read directly from Huggingface datasets DuckDB can read just the parts of a .parquet file it needs, even over HTTP DuckDB lets you write custom functions in Python DuckDB now has a vector similarity search extension I’ve recently become a DuckDB fan and continue to be impressed.

Things I Learned - 09 Jun 2024

This week, I learned: httpretty can mock ALL Python HTTP libraries Japanese pray to dead parents instead of gods. The dead are preserved in plates by priests. Japanese are generally non religious Looks like GPT-4o is using CNNs to create vector embeddings of images, with images gridded into a 1x1, 2x2, etc. PLUS OCR. Ref The sum of a sinusoidal series is like a spirogram. Spinning circle linked to another and so on https://www.andreinc.net/2024/04/24/from-the-circle-to-epicycles

Things I Learned - 02 Jun 2024

This week, I learned: Modal.com seems of offer reasonably priced GPUs Combining vector search and keyword search with reciprocal rank fusion seems to work well for RAG. Ref Knowledge Project podcast. Morgan Housel Differences of opinion exist because of different stories arising from origins and experiences. We are not debating facts. We are debating life lessons! Solution: hear their anecdotes. The stories that taught them their lessons. AI reporting templates are a trend. Domain expertise comes in via structuring the report template and associated prompts. Some audio embedding models: unoti/voice-embeddings, retkowsky/audio_embeddings, pyannote/embedding (for speaker similarity), and more. Hidden Brain podcast: Innovation 2.0: The power of less Subtraction is hard because we are biologically and economically wired against it. It’s also hard because there are fewer markers of subtraction. Additions are natural markers / triggers. Marie Kondo suggests keeping only what sparks joy #POST I tried Undermind.ai - an agent that researches for you. It guides you to ask a detailed question, spends 2-3 minutes finding the answer, and provides detailed results. But it’s worth the wait. It’s a good alternative to quick validations on SciSpace. For popular results, search actually makes results worse! When not to trust language models Perception of fluency and usefulness are NEGATIVELY correlated in LLM! Evaluating Verifiability in Generative Search Engines GPTs are now available to non paying users. Apparently for a few weeks! Everyone also has limited access to GPT-4o. Discussion with Anand Explore BBC Microbit Everyone should get a Raspberry Pi! Watch 2 minutes paper on YouTube More LLM routers: LiteLLM: Open source, OpenAI compatible, 100+ LLMs RouteLLM: Open source, OpenAI compatible, automatically routes based on cost OpenRouter: OpenAI compatible API, several models Unify: Supports many models Portkey: Supports popular providers Martian: Limited set of models d-id and Heygen can modify videos of a person.