January 2025

Things I Learned - 26 Jan 2025

This week, I learned: Something I learned from a Sikkil Gurucharan concert. Make the subject of your talk the hero. Not yourself. Be a fan. Share your enthusiasm Get into the zone while presenting. We reject opposite world views. It’s too much effort. But exposure reduces effort and can let us see things from other points of view. So expose yourself to difficult alternative perspectives. Gemini Something I learnt from Aboorva Singeetham: Kamal Hassan: “A farmer invests in crops. I’m an actor. So I invest in films.” As a technologist, I guess I would invest in technology. “A person who has much more to give is unfazed by overwhelming demands because there is too much in him to overwhelm. He gives you 2 options in place of one.” According to Portkey’s LLM usage analysis Anyscale and Fireworks AI have the lowest error rates (5xx, 429) and rate limits across providers Groq and Anthropic are among the highest, OpenAI is among the lowest, Google is in-between OpenAI has lower error rates and lower latency than Azure They have a ~35% cache hit rate A few quick points supporting the mental model of “LLMs are aliens”. LLMs are clearly not machines. They give different answers each time. LLMs are like humans: they exhibit human biases (e.g. guessing 42 or 37 often). But they fail in unusual ways. They can’t count the “r"s in strawberry. They can go into an endless loop. LLMs are a new form of intelligence. Thinking of them as aliens might minimize our confusions. Lessons from Clear Thinking Watch out for four things: Emotion, Ego, Social confirmation, and Inertia/habit. Basically: adrenaline, testosterone, oxytocin, and dopamine. When you feel these, consider doing the opposite. Here’s what makes us prone to emotion. Sleep deprivation. Hunger. Unknown places. Fatigue. Distraction. Stress (e.g. feeling rushed). A good signal for ego is blinding you: You often feel you’re right. Or feel unfairly treated. Changing behaviors is hard. Instead, join a group or environment where that’s the default behavior. Hiring a trainer or joining a gym, for example. Why does so much of success literature focus inwards rather than on the environment? Perhaps because we often fool ourselves, and doing less of that gives the biggest bang for the buck. It doesn’t mean the environment is unimportant. Doing work has the characteristics of a drug. E.g. replying emails gives you control, connections, etc. Work addiction exists because it gives you all the right chemicals. If you put LLMs in a feedback loop, it can optimize for its reward function by emotionally pushing people, generating misinformation, nudging towards a narrow definition of creativity, etc.: https://bsky.app/profile/emollick.bsky.social/post/3lg4darqwfc2d ChatGPT’s Scheduled Tasks are pretty bad at fetching the latest news. Its use of search is poor. (I’m not sure if it actually searches.) I need to figure out other use cases for it. Possible options are: DeepSeek does not enforce rate limits. Yet another reason to switch to DeepSeek. (via Simon Willison). My other reasons are: Claude 3.5 Sonnet-level coding capability at 5% of the cost (soon to be 2.5%) Prompt caching by default Fill in the middle completion

Things I Learned - 19 Jan 2025

This week, I learned: Audio diaries are a thing. Monash University asks students to voice their learnings, share it with each other and have them give feedback. I wonder if ChatGPT diaries could become a thing, too, and LLM journalling starts helping with therapy. Regulation shows things down at colleges and hospitals. For example, patient consent is required for surgeons to learn from their own surgery videos. Unregulated sectors are far more likely to innovate. Doctors can only do so much. Air quality, where you live, etc can do more for the patient than medicines or the doctor. If doctors keep this in mind, they can be more effective. Extending that thought, ANYONE who leverages assets through holistic thinking, becomes FAR more effective. “The curriculum tells teachers what to teach. The exams tell students what to learn.” - Ronald Harden “Stravaig” is a Scottish word. It means mindless wanderings. “The real voyage of discovery consists of not a new voyage but having new eyes” - Proust Possibility Thinking is “the willingness to see possibilities everywhere instead of limitations”. It’s an approach / mindset that can make things that seem hard possible. With LLMs, this is becoming increasingly realistic to me in many areas. What will LLMs enable that do not or cannot exist today? Rather than optimizing what exists? Something to think about. ModernBert supports embeddings and is better than text-embedding-3-small on MTEB. How to export browser history from Brave to Edge Go to AppData Local > BraveSoftware > Brave-Browser > User Data > Default Copy History and History-journal into AppData Local > Google > Chrome > User Data > Default On Edge, go to edge://settings/profiles/importBrowsingData and Import data from Google Chrome and import the history. I switched back from Brave to Edge, mainly because Edge’s native text-to-speech and speech recognition is far better. I can use it better on my mobile. A colleague, Karthick, asked different models to apply the editing and formatting guidelines for a journal to a manuscript. (E.g. Abbreviate chapter & section numbers, except when a sentence begins with it. Use “1” instead of “one”, etc. except when a sentence begins with it. Things like this.) Gemini Exp 1206 seems to be the most reliable, compared with most other models. GitHub CodeSpaces seems to be coming up more often in my radar, but I’m yet to figure out a use for it. TTS Arena is a benchmark of text-to-speech models. Kokoro-TTS is the current leader. It’s just 82M, runs on Google Colab, and sounds slightly better than OpenAI TTS. chat.qwenlm.ai consolidates all of Qwen’s models in one ChatGPT-like interface.

Wow. Every SINGLE person in the audience at this (Healthcare Education, Singapore) conference was on a laptop, tablet, or mobile. Some on multiple devices. I guess this is the new model of learning and listening. The only people who were NOT on a device were on stage. The speakers. I guess it’s up to me to fix that 🙂 LinkedIn

The Sassy AI Devil’s Advocate

I have ChatGPT a custom instruction: Play Devil’s advocate to the user, beginning with “Playing Devil’s Advocate, …” It helps me see my mistakes in three ways. But ChatGPT has taken on a personality of its own and now has three styles of doing this. How about… – It suggests a useful alternative. Are you sure…? – It thinks you’re wrong and warns you of risks. Yeah, right… – It knows you’re wrong and rubs it in. (Jeeves, the butler, would be proud.) Here are some examples. ...

Features actually used in an LLM playground

At Straive, only a few people have direct access to ChatGPT and similar large language models. We use a portal, LLM Foundry to access LLMs. That makes it easier to prevent and track data leaks. The main page is a playground to explore models and prompts. Last month, I tracked which features were used the most. A. Attaching files was the top task. (The numbers show how many times each feature was clicked.) People usually use local files as context when working with LLMs. ...

Things I Learned - 12 Jan 2025

This week, I learned: Measuring developer productivity with the DX Core 4 is a framework for measuring developer productivity. It encapsulates other frameworks like DORA, SPACE, and DevEx. Can LLMs write better code if you keep asking them to “write better code? A delightful exploration of how Claude 3.5 Sonnet keeps optimizing and adding features to improve code. My takeaway: repeatedly applying a prompt gives us interesting new directions to explore. Wednesday comes from Wōdnesdæg - named after Odin (or Woden). CLIProxyAPI seems a good way to allow any CLI coding agent (Codex, Claude Code, etc.) to work with any provider (e.g. Gemini, OpenRouter, etc.) The documentation needs a few more examples, but it’s usable. mise x github:router-for-me/CLIProxyAPI -- cli-proxy-api starts a local server that proxies requests. Create a config.yaml, update the keys, and configure your coding agent, e.g. Codex to use it. It’s also a good way to see what prompts are being sent by the various harnesses. smolagents is a new agents library from HuggingFace. It seems simple enough to use. whisper-flow does real-time speech transcription! Switchboard-1 is a labelled audio corpus with ~260 hours of speech. It has ~2,400 calls among 500+ speakers in the US. Cloudflare tunnel is like ngrok but more permanent. It’s a bit more complex, too. But given CloudFlare’s liberal free tier, it’s a good, viable option for long-term local hosting. John Wheeler: “We live on an island surrounded by a sea of ignorance. As our island of knowledge grows, so does the shore of our ignorance.” A great way to understand how ignorance actually grows as you learn more. justhtml is a fast enough pure Python fully HTML5 compliant library. For a faster, mostly compliant solution, html5-parser with lxml works. There is little reason to use Redis. There are several clones you can use. Databases in 2024: A Year in Review Microsoft’s Garnet KeyDB (only Linux) ValKey (only source) DragonFly (only Linux) ReDict (only Linux) Every few years, something comes along trying to replace relational databases and SQL, and gets absorbed. YouTube Key value stores. People soon realize they need more features, e.g. indices. MapReduce systems. Most MapReduce vendors put SQL on top of SQL. Then the Hadoop market crashed. (But HDFS, S3, distributed storage systems are a good idea) Document Databases. JSON. SQL absorbed that. SQLite 3.45+ supports even JSONB. DuckDB, of course, has JSON. Column Databases. Again, these introduced SQL. Graph Databases. SQL:2023 introduced graph queries via SQL/PGQ (Property Graph Queries). DuckPGQ beats Neo4J Array Databases. SQL:2023 adds SQL/MDA which allows for matrix operations. But specialized databases might make sense in this category. Vector Databases. Every DB is adding support for this. TheAgentCompany is a benchmark of real-world tasks like: Arranging a meeting room Analyze a spreadsheet Add a Gitlab wiki page Salvatore Sanfilippo (antirez - Redis) finds DeepSeek v3 comparable with Claude 3.5 Sonnet. YouTube He also passed a paper and his code to compare them. A useful prompt. YouTube

“Wait, That’s My Mic!”: Lessons from an AI Co-Host

I spoke at LogicLooM this week, with ChatGPT as my co-panelist. It was so good, it ended up stealing the show. Preparation Co-hosting an AI was one of my goals this year. I tried several methods. ChatGPT’s advanced voice mode: Lets you interrupt it. But if you pause, it replies immediately. Muting caused the app to hang. Realtime API: Gave me control of pauses and custom prompts, but used gpt-4o-realtime-preview (not as good as o1). Standard voice with o1 on Desktop: Worked best. It transcribes my speech, sends it to o1, and speaks back. There’s a lag, but it feels like it’s thinking. I prepped the chat with this prompt: ...

Launching an app only with LLMs and failing

Zohaib Rauf suggested using LLMs to spec code and using Cursor to build it. (via Simon Willison). I tried it. It’s promising, but my first attempt failed. I couldn’t generate a SPEC.md using LLMs At first, I started writing what I wanted. This application identifies the drugs, diseases, and symptoms, as well as the emotions from an audio recording of a patient call in a clinical trial. … and then went on to define the EXACT code structure I wanted. So I spent 20 minutes spec-ing our application structure and 20 minutes spec-ing our internal LLM Foundry APIs and 40 minutes detailing every step of how I wanted the app to look and interact. ...

Things I Learned - 05 Jan 2025

This week, I learned: Some management philosophies used to be successful but are no longer as effective. ChatGPT Command-and-control hierarchy Taylorism: deep specialization Seniority-based advancement Annual performance reviews (without continuous feedback) Up-or-Out promotion models Confidential strategic information Narrow job descriptions Relying on formal authority Some management philosophies have been around for millenia. ChatGPT Lead by example Fairness and empathy Clear, consistent communication Delegation and empowerment Strategic planning and foresight Consistent rule enforcement Rewarding merit Leadership by virtue and character Interview with Liang Wenfeng, CEO of DeepSeek: In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed source approach can’t prevent others from catching up. So we anchor our value in our team – our colleagues grow through this process, accumulate know-how, and form an organization and culture capable of innovation. That’s our moat. ...