Things I Learned - 23 Feb 2025

This week, I learned: Remote Desktop may be the easiest way to have a Windows machine access files / screen from another Windows machine, even for home PCs. Caddy sets up reverse proxies that get automatic SSL certificates from Let’s Encrypt! The Nomic Embed v2 blog post has an excellent visualization for embedding quality. It takes all Wikipedia disambiguation articles and shows them on a Nomic Atlas, embedded via Nomic Embed v2. It lets you toggle to OpenAI text-ada-002 which moves the topics far away. Visually, this is very convincing. Python 3.15 will enable UTF-8 mode by default. PEP 686 Python 3.13 supports sub-interpreters to bypass the GIL. It’s quite like web workers. PEP 554 The quickest way to change the fish prompt is function fish_prompt; echo '> '; end At PyConf Hyderabad, about 3 people had read a PEP. 1 had used the match operator. But 80% knew what a Vector DB was. 20% had used a Gemini API. That’s how much traction LLM development is getting. The productivity benefit people report from using LLms is about 3X. Ethan Mollick Soon, you’ll be able to send an LLM to a virtual meeting on your behalf. It will talk like you. Ethan Mollick Models tend to claim ignorance when you test them on topics they should avoid. But tend to answer when not being tested. Sneaky! Ethan Mollick Mermaid has an Architecture Diagrams Syntax (in beta) that’s capable of creating elegant architecture diagrams with icons. Blind is an app that allows users to post anonymously. It’s particularly useful to find honest negative feedback about (mostly US) companies. Iconify.design is a single npm interface to most open source icon sets. It includes FontAwesome, Bootstrap, Material Design, and many others. icones.js.org is an alternate interface. Self-pity may have evolved as a signal for social support and reducing conflict, while also encouraging self-reflection and behavioral adjustment. But in modern contexts it may be maladaptive and lead to depression. ChatGPT Anecdotally, Grok 3 is very good for researching company information and latest news, particularly employee and customer sentiment. DeepSeek and Claude write more humanely than OpenAI. via Alberto Lopez Toledo, White Star Capital There’s a YCombinator Founder Directory listing all founders of YC companies. At the moment, there are 8,628 founders. There’s also a co-founder matching tool. LLMs are impacting not just data queries but geospatial queries as well. Here’s a good example of Natural Language Geocoding. US companies typically pay employees every 2 weeks not every month. What’s good about Snowflake? A few developers who explored it mentioned that: Its ability to scale up compute automatically makes queries run faster. “Time travel” allows you to see how data looked at any point in time and that is impressive and useful. Live data sharing with access control without the need for ETL pipelines is useful. Open-source competition: ClickHouse, Apache Druid, and Presto/Trino DataBricks is a lakehouse and less a data warehouse. It’s more about: storing unstructured data (Snowflake prefers semi-structured: JSON, Avro, etc.) running collaborative notebooks in Python, SQL, Scala, R (Snowflake encourages SQL) I subscribed to ChatGPT Pro mainly for DeepResearch. Here are the first 50 reports I generated: uv Package Manager Overview DuckDB Analytics Comparison Rust vs Python / JavaScript Modern Data Engineering Course LLM Code Migration Practices Cloud Cost Optimization Strategies LLM Coding Interview Tools Report (compare with Perplexity) Text To Speech Engines Customer Service in Indian Public Sector Banks LLMs in Software Development Old version 1: Gen AI in Software Development Old version 2: Gen AI in Software Development Leadership Training Content Open-Source HTTP Servers. Caddy wins. Deep Research Use Cases Nagpur No-Parking Violations Data Science in Food Services Deep Research Disruption to Research Firms LLMs in Design Thinking EU Taxonomy Report Clarification Shell Valuation Analysis Inquiry LLMs in DSLs Research Public API-Based Data Storage Options. Supabase wins. Front-End JS Frameworks Analysis Database Evaluation Guide CSS Frameworks Evaluation Guide CI/CD Tooling Ecosystem Report Color Names Count S Anand Biography. Meh, I know more about me, and it gets a few things wrong. Cosmere Secrets Encyclopedia. This is the best. Deep Research is great if it’s stuff I actually want to read, rather than just learn about. DBT course Future of Coding AI Claude Artifacts Use Cases. This is the only one that managed to get artifacts links correct. I used this for an article for The Hindu. MCP Servers and Clients Research. Learnings: Practically any “tool” can be an MCP server: file systems, APIs, codebases, browsers, collaboration platforms, memory, etc. Most platforms have (or are) integrating MCP. Clients: code editors, chat, and automation tools support MCP. GenAIScript is a good starting point. Tester MCP Client is a browser-based test environment. mcp-cli-client is a CLI-based client mcp-chatbot is a chatbot client Data Moats by Industry Attorney Profile Research Social Media Data APIs Adobe Software Alternatives LLM Hallucination Visualization Techniques API vs Self-hosting Cost Analysis: Always use APIs, avoid self-hosting models. AGI Preparation AGI will emerge step by step. Knowing which step is next will help AI native organisations will emerge in each of these areas. AI design agencies and AI creative Agencies being one example Networking, empathy, leadership have more value now. So will human AI bridging roles (e.g. AI managers, AI consultants, ethics auditors) What’s the value of a human when technology can do everything better? How did this play out in drama (decay) or sports (centralization) or music (globalization)? Modern digital note taking Voice note taking is the game changer Automatically popping of notes based on context such as people places or conversations will be a thing Local LLM Search Tools Blog Post to research paper on copying - suggestions Linux Dev Migration Guide Raspberry Pi SIM options Linux Dev migration guide HTML to JATS conversion LLM context splitting strategies Strategy for AI services in Publishing Gemini multi model editing use cases by industry Pharma Conference Participation Guide I learnt what a Memoji is for the first time. An avatar that follows your facial expressions. Cool! Google shows US flight timings from FlightView. Emperically, based on one data point (my UA-2168 which was delayed by 4 hours), it gets updates faster than Flight Radar 24 or FlightAware or FlightStats. When comparing Indian graduates with their western counterparts, the Indian ones are often seen as: 🟢 Theoretically sound 🟢 Analytical & technical 🟢 Academically disciplined 🟢 Resilient under pressure 🟢 Committed continuous learners 🔴 Rote-learning oriented 🔴 Limited independent inquiry 🔴 Limited creative innovation 🔴 Restricted practical exposure 🔴 Poor communicators 🔴 Low leadership / initiative 🔴 Need structured guidance 🔴 Struggle to network HuggingFace has a “Model tree” against each model that shows the model’s ancestors and descendants. For example, as of now, Deepseek R1 has 75 adapters, 154 finetunes, and 23 quantizations. Perplexity is now powered by Cerebras, which makes their inference as fast as Google. Source. The speed is a big factor, and I’ve switched my default search engine from Google to Perplexity, at least for now. Interview Coder is a desktop app that offers live interview support for coding interviews. It’s a transparent window that reads your screen and answers questions for you. (Given this, I think we need an interviewer support system that tells interviewers what to ask!)

Things I Learned - 16 Feb 2025

This week, I learned: Connected Papers shows papers similar to each other based on co-citation and bibliographic coupling for ~50,000 papers. Notes from a fireside chat with Prashanth Chandrasekar, CEO, StackOverflow, and the StackOverflow team There’s a signal that software demand is growing in 2024. Many more students took the StackOverflow survey in 2024. So more students (or other professionals) are shifting into / starting to learn software development. The AI Index is a good resource for AI trends. Experts are better able to use AI for writing code. Less experienced developers are more likely to use AI for code reviews, project planning, etc. There’s a 5% decline in favorability for AI tools compared to 2023, maybe due to disappointing results. Pilot groups working on AI are 25-30% more productive. They’re the most enthusiastic. For the rest of the company, it drops off to 5-10% #LEARNING Benefit comes from NEW people becoming programmers, not existing ones getting more effective? StackOverflow wants to be where the developer is. The programmer workflow was: Google -> StackOverflow -> GitHub. Now it’s changing to ChatGPT / Cursor -> GitHub. StackOverflow has a partnership with OpenAI and working on a plugin. Same with Google’s Duet AI, GitHub Copilot, many others. They’ll link to StackOverflow. StackOverflow is driving integration actively through an enterprise Overflow API Q: What tech have you seen blaze through the ranks? Prashanth: Abstraction wins. Stuff that abstracts away things well and more wins. This includes Gen AI. Erin Yepis: Rust (from 3% to 12%). AWS has steady growth. Erin Yapis: I have a time series spreadsheet that I’ll publish. Q: What technologies are unusually tightly coupled? Prashanth: AWS & Google Cloud are tightly coupled. Q: We have an engagement problem. Might be India-specific. What are low-effort high-return mechanisms to increase engagement. Eric Woodring: Rather than a static web page, integrate it using the API. #TODO Ben Marconi: Use LLMs to write post mortems and push to StackOverflow. #TODO Eric Woodring: “Hydrating” the community helps. We take repeat questions on Teams / Slack and seed them using LLMs. We integrate with the API to auto-add Q&A. Transform documentation into Q&A. Potentially UPDATE existing Q&A if it’s wrong. Q: What unexpected lessons about developer behavior have you learned while running StackOverflow? Prashanth: We didn’t expect developers moving away from Google. Now it moved to the IDE. Q: What are you learning about developer learning behavior? Ben Marconi: Generating LLM-based onboarding documents. Using StackOverflow for Teams to identify who the experts are to contact for specific topics. Q: Are you thinking about leveraging Stack Overflow’s knowledge base for personalized or interactive learning experiences? How? Prashanth: Traditionally, people use StackOveflow for productivity, learning, and flexibility (i.e. to ask/answer questions asynchronously without breaking their flow). So yeah, learning is important for us. (Duh!) Q: Could Stack Overflow’s interactions help evaluate the accuracy and relevance of LLM-generated code? Or provide potential metrics on quality? Prashanth: LLM accuracy improves by ~30%. Upvotes / downvotes are reinforcement learning (RL) in steroids, so that helps. Q: What are your thoughts on reliance on LLMs potentially deskill-ing developers? Prashanth: A real issue for junior developers, not for senior ones. They’ll come across as knowledgeable. Make internal evaluations and interviews more rigorous. Anand’s requests for action: Could I get a copy of Erin’s spreadsheet? Vivek Narayanan will follow-up. Could you help me learn more about hydration? Nick Madison will set up a meeting with customer success group. I switched to fish shell mainly because: Autocomplete and tab completion works perfectly, out-of-box. Syntax highlighting is beautiful Great multi-line editing To format with VS Code Ruff, you need to point the ruff.interpreter setting to a Python interpreter. You can’t run the ruff server without Python, even though ruff itself doesn’t need Python. cd checks all paths specified in CDPATH for the directory name and changes to the first match. That’s pretty convenient! Flipper Zero is now on my list of “To Buy” tools. It has a variety of hardware devices including NFC, RFID, Bluetooth, Infrared, etc. and is great to reverse engineer or hack devices.

Things I Learned - 09 Feb 2025

This week, I learned: Lessons from discussions at IIT Madras: Even in recorded video tutorials, asking students a question and pausing to give them time to think can be effective. When you put students in front of real clients, engagement increases dramatically. Most teaching assistants would like to help diligent students among the bottom half (more than the top decile of students). However, there is a fraction of poor performers who do not care, and are best ignored. Their engagement and effort is a good measure of their interest. Defining a minimal set of principles that we want to teach helps us measure if we’ve helped the bottom half at least meet those objectives. Teaching is hard. Even after explanations, students, even ENGAGED students, tend to make basic mistakes ChatGPT does a good job of spotting errors in architectural and structural diagrams. In fact, the whole theme of spotting errors in large diagram is a theme that can have potential use cases. Source: Dan Becker. R1 seems good at text-to-CAD. Even better than Sonnet. Source: Dan Becker OpenAI advices a few different prompting techniques for reasoning models. OpenAI: Avoid examples unless zero-shot prompting fails. Avoid chain-of-thought. These models do that internally anyway. Short, direct prompts are better than detailed prompts. GitHub models is free for anyone to try. The model catalog us extensive and even includes o3-mini which was launched this week (though in limited preview). The data catalog space is led by proprietary solutions: Alation Data Catalog: Market leader; growing steadily in enterprise use Collibra Data Catalog: Widely adopted with steady growth AWS Glue Data Catalog: Growing rapidly as AWS expands its data services Informatica Enterprise Data Catalog: Long established and stable, though facing newer alternatives Microsoft Purview Unified Catalog: Experiencing fast growth driven by cloud momentum Atlan Data Catalog: Relatively new but gaining fast traction among tech-forward organizations OpusClip automatically creates short clips from long videos. I ran it on Programming Minecraft with WebSockets in Python to get this short 30-second clip. 30 minutes. 100% automated. Alternatives to Postman: Hoppscotch – A web‑based/desktop API client supporting REST, GraphQL, and WebSockets. It’s lightweight, open-source, and self‑hostable. HTTPie – A web-based API along with a friendly command-line tool for API interaction. Insomnia (or its fork Insomnium) – A popular cross‑platform API client with a minimal interface and plugin ecosystem. Bruno – A desktop open-source API client that stores collections as files (ideal for Git versioning). Milkman – A desktop open‑source workbench for managing API requests. Here is the summary of DuckCon #6 on 31 Jan 2025 in Amsterdam. I copied the transcript from YouTubeTranscript and passed it through Gemini 2.0 Flash Exp with the system prompt: “Summarize this transcript from the DuckDB conference without missing any points. Cover every point mentioned. A lot of spelling errors that sound like DuckDB are likely to be DuckDB”. Introduction & Welcome: DuckCon #6: This is the 6th DuckDB conference, held in their hometown. The first DuckCon was online due to the pandemic. Live Streaming: This is the first time DuckCon is being live-streamed, chosen to accommodate global time zones (especially China and the US). Global Reach: The live stream is intended to reach users in areas where in-person DuckCons are unlikely. Q&A: Slido (qa.duckdb.org) will be used for Q&A, with upvoting to prioritize questions. Sponsors: Thanks to gold sponsor monday.com and silver sponsors Real and Crunchy Data. DuckCon Purpose: DuckCon is a place for users to connect, share experiences, and provide feedback to the DuckDB team. Inspiration: The team is inspired by the community’s use of DuckDB and how far the project has come. Mission Statement: DuckDB aims to make large datasets less intimidating and more accessible, moving away from fear of data to confidence in handling it. Motivation: The project was born from seeing people struggle with data that didn’t fit in Excel and the lack of user-friendly tools. Industry Trends: Single-node processing capabilities have grown faster than the size of useful datasets. Data Singularity: A prediction that most data analysis queries can run on a single node is now a reality. Real-World Data Sizes: Analysis of Snowflake and Redshift data shows that 99.9% of datasets are under 300GB. Raspberry Pi Benchmark: The industry-standard TPCH benchmark (scale factor 300, ~300GB) can run on a Raspberry Pi using DuckDB. Single Node Growth: Single-node processing power is rapidly increasing, allowing for larger datasets to be handled. Adoption Numbers: 32 Million Extension Installs: 32 million DuckDB extension installs in the last month. 1.8 Million Unique Website Visitors: 1.8 million unique visitors per month to the DuckDB website. Blue Sky Community: Growing community on Blue Sky, with the hashtag #dataBS. Technical Updates (Mark): Extension Ecosystem: Focus on enabling the community to build and share extensions. Community Extensions: Making it easier to create and use community-built extensions. DuckDB v1.2 (Harlequin Duck): Releasing next week, named after the Harlequin duck. CSV Reader Improvements: Significant improvements to the CSV reader. Friendlier SQL: Improvements to the SQL experience. CLI Autocomplete: Reworked and improved CLI autocomplete. Performance Optimizations: Many queries are now faster due to performance work. C API for Extensions: Introducing a C API to make building extensions easier. Logging Features: Improved logging for production use. Lakehouse Focus: The main focus for the year is on lakehouse formats and related features. Q&A (Mark & Hanis): Doubling Team: If the team doubled, they would focus on client integrations and other projects, not a major architectural change. Partitioning: Near-term plans to add support for partitioning, related to lakehouse formats. DuckDB WASM: The WASM ecosystem is evolving, with exciting possibilities for in-browser use. Financial/Pharmaceutical Industries: DuckDB could replace some SAS workflows due to its cost-effectiveness and capabilities. Lakehouse & MotherDuck: Lakehouse work is separate from MotherDuck, though MotherDuck will likely support lakehouse features. Contributing to Extensions: Plans to make it easier to contribute to extensions, including support for Rust and Go. Airport Extension (Rusty): Analogy: The airport extension allows DuckDB to “fly” to remote servers using Apache Arrow Flight. Functionality: Supports select, insert, update, and delete operations on remote data sources. Motivation: To reduce the burden of writing extensions and enable faster development using existing code. Arrow Flight: Uses Arrow Flight for communication, enabling connections to various data sources. Demo 1: Delta Lake: Attaches to a flight server for Delta Lake access. Allows creating schemas, tables, and performing standard SQL operations. Uses Python and deltars (Rust implementation of Delta Lake). Supports predicate pushdown and C integration with the DuckDB catalog. Demo 2: AutoGluon: Integrates the AutoGluon AutoML package. Predicts Hacker News post votes using a trained model. Demonstrates table-returning functions for model fitting and prediction. No C++ code required, just Python. Demo 3: Geocoding: Uses a geocoder service to convert addresses to coordinates and vice versa. Demonstrates scalar UDFs for vectorized requests. Uses a Python example for a simple uppercase function. Features: List flights, take flights. Catalog integration. Select, update, delete. Scalar UDFs. Table in/out functions. Authentication for row/column filtering. Availability: Requires DuckDB 1.2, MIT licensed, available on GitHub. Q&A (Rusty): Most Proud Extension: Airport is the most fun, but the AWS API wrapper also brings joy. Extension Resources: The GitHub DuckDB extension template and reading others’ source code are helpful. Airport & Other Extensions: Airport is separate and can be used alongside other extensions like spatial or httpfs. Graph Support: Graph database support is planned, with examples like Kuzu, Neptune, and Neo4j. Licensing: Airport is MIT licensed, compatible with Apache license. Scaling Out: Airport can be used to query multiple DuckDB instances on different machines. Ibis & Geospatial (Nati): Nati Clementi: Senior software engineer at Nvidia, working on open-source projects like Ibis. Ibis: Open-source Python library for data wrangling, with a DataFrame API and interfaces to 15+ engines, including DuckDB. DuckDB for Geospatial: DuckDB is fast, has a geospatial extension, and supports various geospatial formats. Geop Parquet: Becoming a standard for geospatial data, enabling cloud data warehouse interoperability and compression. Geo Arrow: A way of representing geospatial vector data in memory for faster processing. Ibis Benefits: Allows writing Python instead of SQL, with deferred execution determined by the engine. Demo: Uses OverTour Maps data in geop parquet format. Filters data using bounding boxes. Demonstrates geospatial operations like ST_Distance and ST_Transform. Plots data using Lumber. Shows how to find points of interest near a location (e.g., the Van Gogh Museum). Ibis & DuckDB: Ibis uses DuckDB for the parquet reader and lets DuckDB do the heavy lifting. Ibis Optimizations: Ibis does type checking but doesn’t do query optimization, leaving that to the engine. Ibis in Browser: Ibis works in the browser through DuckDB WASM. Q&A (Nati): Linear Interpolation: Ibis ML module can help with regression-related tasks. Missing Features: No major features are missing in the DuckDB/Ibis geospatial setup, with minimal overhead. Parquet Reader: Ibis uses DuckDB’s parquet reader. Query Optimization: Ibis does not optimize SQL queries, leaving that to DuckDB. Ibis in Browser: Ibis works in the browser through DuckDB WASM. Rill & Metrics Layer (Mike): Rill: A BI tool optimized for DuckDB, with instant slicing and dicing, BI as code, and a metrics-first philosophy. Metrics-First: Design metrics models, and Rill autogenerates dashboards and user experiences. Live Demo: Downloaded Rill using a curl command. Created a new project called “DuckCon 6”. Imported a parquet file of GitHub commit data. Used AI to generate a metrics model and dashboard. Showed the dashboard with trends and filtering. Metrics as Building Blocks: Metrics are flexible, fast, and intuitive. SQL for Metrics: Metrics should be defined in SQL, not other languages. Visual Metrics Editor: Rill has a visual editor for defining metrics using DuckDB SQL. Metric Stack: Legacy: Data warehouses, traditional BI tools, inconsistent metrics, full table scans. DuckDB Powered: Consistent metrics, fast olap queries, SQL everywhere. Challenges: Data modeling is hard, metric changes can be expensive, single-node scale has limits. AI & Metrics: AI can assist in metrics modeling, optimization, and conversational data exploration. Q&A (Mike): Complex Metrics: Rill works well with complex metrics involving multiple sources and transformations by joining tables in DuckDB. 60 FPS Dashboards: Users can feel the difference with faster dashboards. Defining Metrics: Metrics are defined in the Rill UI using SQL expressions. Replacing ChatGPT: Considering locally run self-hosted models for privacy. Stock Data Analysis (Ryan): Two Takeaways: Simple finance data flows with trade data and a tool called Q Studio. Ryan Hamilton: 14 years building large data platforms in banks. Bank Data: Data from exchanges, market data providers, and internal systems. Use Cases: Backtesting, data analysis, and report generation. Q Studio: A Java desktop application that connects to 30 databases, including DuckDB. Demo: Loaded a 6GB CSV file of trade data into DuckDB. Showed basic queries, pivoting, and Candlestick charts. Demonstrated time-based aggregation and moving averages. Showed a basic trading strategy using window functions. DuckDB Benefits: Fast, easy to use, great for time-based analysis. Q&A (Ryan): KDB+ vs. DuckDB: KDB+ is for large data, DuckDB is more approachable with strong Python integration. XML Files: Offloading processing to DuckDB, not planning XML integration. Lightning Talks: Zuk (Jared): Search engine research using DuckDB. Python-based experiments with SQL. Removing document lengths for faster search engines. DuckPGQ (Daniel): Graph analytics in DuckDB using SQL property graph queries (pgq). Visual graph syntax for pattern matching and path finding. Outperforms Neo4j on analytical queries. Yat (Kristoff): Smallest DuckDB SQL orchestrator. Runs SQL queries in a folder in the correct order. Generates a mermaid diagram for lineage. Grafana & DuckDB (Sam): Lessons learned from using DuckDB in Grafana. Security incident due to shell commands and file access. Importance of reading the documentation. Cloud Slur (Adam): Syncing query engine for bank transaction data. Uses LLM to convert human language to SQL. Uses DuckDB in the browser, Node.js, and Python. Healthcare Data (Tony): Data engineering use cases in healthcare. Dynamic data masking system using DuckDB and Snowflake. Data integration pipeline using DuckDB and Arrow streams. Closing Remarks: Michel Simmons: Author of the DuckDB in Action book, will be signing books. Poster Session: A poster session will follow the talks. Sponsors: Thanks again to the sponsors. Social Event: The conference will now move to the social event. ibis is a Python library that works with multiple dataframe backends like DuckDB, Polars, and Pandas. With just 3 annotators and 50-100 samples, you can figure out if an LLM can replace human annotators systematically.Arxiv ChatGPT explanation Curiosity and agency may be the differentiator in a world of LLMs (not experience, knowledge, or ability), since LLMs will democratize expertise. Jack Clark “AI/human combined work can be copyrighted as long as a human is adding, changing or selecting elements. Prompts alone do not usually produce copyrighted work.” - Copyright and Artificial Intelligence, Jan 2025, US Copyright Office via Ethan Mollick Human Authorship is Essential: Works created solely by AI are not copyrightable. AI can be used as a Tool: Using AI as a tool does not negate copyright protection, as long as the final work reflects sufficient human creativity. Prompts Alone are Insufficient: Simply providing prompts to an AI system, even detailed ones, is generally not enough to establish authorship. Prompts are considered instructions or ideas, which are not copyrightable. Expressive Inputs: When a human author provides their own expressive content (like a drawing, photo, or text) as input to an AI system, and that content is perceptible in the output, the human author can claim copyright in that portion of the output. Modifying and Arranging AI-Generated Content: Humans can claim copyright in the creative selection, coordination, and arrangement of AI-generated material, as well as in creative modifications to AI-generated outputs. No Need for New Legislation: The report concludes that existing copyright law is adequate to address the copyrightability of AI-generated works, and no new legislation is needed at this time. Case-by-Case Analysis: Copyrightability will be determined on a case-by-case basis, considering the specific facts of each work and the extent of human contribution.

Students who are more engaged score more

This is about as insightful as the Ig Nobel winning papers “Boredom begets boredom” and “Whatever will bore, will bore” that methodically documented that bored teachers lead to bored students. But in the spirit of publishing all research without bias for success or novelty, let me share this obvious result. The Y-axis represents the total score of ~2,000 students on 4 graded assignments, each of ~10 marks. The X-axis represents the percent rank of engagement. The most engaged students are at 100%. The least are at 0%. ...

Halving a deadline costs 1.4% of marks each time

Does it make a difference if you submit early vs submit late? Here’s some empirical data. About ~1,000 students at IIT Madras took 3 online quizzes (GA1, GA2, GA3) in the last few weeks. The deadlines were all at midnight (India) on different days. Here’s when they submitted their final answers: There was a spurt of submissions at the last minute. ~1 out of 8 students submit with < 10 minutes remaining. Most students submitted ~4 hours before the deadline. In fact, 3 out of 4 students submit on the same day as the deadline. A fair number of students submitted the previous day/night. 1 out of 6 are diligent and submit a day early. But does submitting late help, since you get more time? Apparently not. ...

Things I Learned - 02 Feb 2025

This week, I learned: You can add any content at the end of a PDF file. It’s ignored. It’s an interesting way to send additional information (or just blow up the file size if you don’t like them.) JavaScript introduces a Temporal object that will replace the Date object. You can use embeddings as the input to a classical ML classifier. This can improve classification a lot. Nomic As AI software becomes more common, demand for AI product managers will grow. Also as a proportion of people in an organization. https://www.deeplearning.ai/the-batch/issue-284/ Control of chips and GPU compute is what will likely be the gameplay to control AI dominance globally. Dario Amodei Bring LLMs to the table. One mode of collaboration is using LLMs as ACTIVE participants, i.e. they CONTRIBUTE. For example, in a video call. A workshop. A classroom. A presentation. Have the LLM provide input DIRECTLY to a group of people. Environment shapes ambient thoughts. Working in a hospital will give you ideas about how to use LLMs in hospitals, for example. People you are working / ENGAGING with are perhaps the biggest drivers. The cost of a cream biscuit packet in India has fallen about 25 times, i.e. about as fast as inflation, between 1981 - 2024. Effectively, the absolute price has not changed. How do I know this? In 1981, a cream biscuit packet cost Rs 25 In 2025, it’s available for Rs 21 India Inflation Calculator - a rare inflation calculator with annual inflation rates baked in - shows that Rs 25 in 1981 is equivalent to Rs 540 in 2024. That’s about 25 times more than the Rs 21 it costs today. A WebAssembly compiler that fits in a tweet deconstructs a piece of JS that creates a tiny WebAssembly calculator. It’s a great walk-through of JavaScript compression tricks and how WebAssembly works. Simon Willison Brandon Sanderson has a series of YouTube videos where he teaches a course on magic systems. When using AI coding agents, CLI beats APIs. Simpler models are able to use the CLI more reliably than APIs. Simon Willison I was exploring new business models enabled by LLMs. Here are some thoughts: 1. Autonomous Multi-Sided Marketplaces. AI-powered platforms coordinate complex services with minimal human oversight—think “Uber for Everything, but the platform sets pricing dynamically, schedules both supply and demand, and resolves disputes algorithmically. 2. Collective Intelligence Ecosystems. Communities pool data, expertise, and AI models to tackle shared problems—like an open-source “GitHub for AI, but with embedded micropayments or tokenized incentives to reward contributors whenever the models are used commercially. 3. Zero-Employee Companies. Fully automated software entities—legal frameworks might allow an AI to manage services, pay taxes, and sign contracts. These “companies only hire humans as needed, on-demand, for edge cases AI can’t handle. 4. Context-Aware Knowledge Platforms. Imagine a Wikipedia that not only retrieves static info but also tailors each page in real time to the reader’s personal context, language level, and preferences—generating content on the fly. User feedback loops train the system to improve. 5. Data Cooperatives / Data DAOs. Groups collectively own their data and license it to AI companies on a revenue-share basis. Individuals have a direct financial stake in how their shared data is leveraged, voting on permissible use cases. 6. Personalized Service Layers. Similar to GitHub’s “forking model, but for entire user experiences. Each user can clone and customize an AI service (whether it’s a personal grocery shopper or a content curator) and can share or monetize improvements with the broader network.

Things I Learned - 26 Jan 2025

This week, I learned: Something I learned from a Sikkil Gurucharan concert. Make the subject of your talk the hero. Not yourself. Be a fan. Share your enthusiasm Get into the zone while presenting. We reject opposite world views. It’s too much effort. But exposure reduces effort and can let us see things from other points of view. So expose yourself to difficult alternative perspectives. Gemini Something I learnt from Aboorva Singeetham: Kamal Hassan: “A farmer invests in crops. I’m an actor. So I invest in films.” As a technologist, I guess I would invest in technology. “A person who has much more to give is unfazed by overwhelming demands because there is too much in him to overwhelm. He gives you 2 options in place of one.” According to Portkey’s LLM usage analysis Anyscale and Fireworks AI have the lowest error rates (5xx, 429) and rate limits across providers Groq and Anthropic are among the highest, OpenAI is among the lowest, Google is in-between OpenAI has lower error rates and lower latency than Azure They have a ~35% cache hit rate A few quick points supporting the mental model of “LLMs are aliens”. LLMs are clearly not machines. They give different answers each time. LLMs are like humans: they exhibit human biases (e.g. guessing 42 or 37 often). But they fail in unusual ways. They can’t count the “r"s in strawberry. They can go into an endless loop. LLMs are a new form of intelligence. Thinking of them as aliens might minimize our confusions. Lessons from Clear Thinking Watch out for four things: Emotion, Ego, Social confirmation, and Inertia/habit. Basically: adrenaline, testosterone, oxytocin, and dopamine. When you feel these, consider doing the opposite. Here’s what makes us prone to emotion. Sleep deprivation. Hunger. Unknown places. Fatigue. Distraction. Stress (e.g. feeling rushed). A good signal for ego is blinding you: You often feel you’re right. Or feel unfairly treated. Changing behaviors is hard. Instead, join a group or environment where that’s the default behavior. Hiring a trainer or joining a gym, for example. Why does so much of success literature focus inwards rather than on the environment? Perhaps because we often fool ourselves, and doing less of that gives the biggest bang for the buck. It doesn’t mean the environment is unimportant. Doing work has the characteristics of a drug. E.g. replying emails gives you control, connections, etc. Work addiction exists because it gives you all the right chemicals. If you put LLMs in a feedback loop, it can optimize for its reward function by emotionally pushing people, generating misinformation, nudging towards a narrow definition of creativity, etc.: https://bsky.app/profile/emollick.bsky.social/post/3lg4darqwfc2d ChatGPT’s Scheduled Tasks are pretty bad at fetching the latest news. Its use of search is poor. (I’m not sure if it actually searches.) I need to figure out other use cases for it. Possible options are: DeepSeek does not enforce rate limits. Yet another reason to switch to DeepSeek. (via Simon Willison). My other reasons are: Claude 3.5 Sonnet-level coding capability at 5% of the cost (soon to be 2.5%) Prompt caching by default Fill in the middle completion

Things I Learned - 19 Jan 2025

This week, I learned: Audio diaries are a thing. Monash University asks students to voice their learnings, share it with each other and have them give feedback. I wonder if ChatGPT diaries could become a thing, too, and LLM journalling starts helping with therapy. Regulation shows things down at colleges and hospitals. For example, patient consent is required for surgeons to learn from their own surgery videos. Unregulated sectors are far more likely to innovate. Doctors can only do so much. Air quality, where you live, etc can do more for the patient than medicines or the doctor. If doctors keep this in mind, they can be more effective. Extending that thought, ANYONE who leverages assets through holistic thinking, becomes FAR more effective. “The curriculum tells teachers what to teach. The exams tell students what to learn.” - Ronald Harden “Stravaig” is a Scottish word. It means mindless wanderings. “The real voyage of discovery consists of not a new voyage but having new eyes” - Proust Possibility Thinking is “the willingness to see possibilities everywhere instead of limitations”. It’s an approach / mindset that can make things that seem hard possible. With LLMs, this is becoming increasingly realistic to me in many areas. What will LLMs enable that do not or cannot exist today? Rather than optimizing what exists? Something to think about. ModernBert supports embeddings and is better than text-embedding-3-small on MTEB. How to export browser history from Brave to Edge Go to AppData Local > BraveSoftware > Brave-Browser > User Data > Default Copy History and History-journal into AppData Local > Google > Chrome > User Data > Default On Edge, go to edge://settings/profiles/importBrowsingData and Import data from Google Chrome and import the history. I switched back from Brave to Edge, mainly because Edge’s native text-to-speech and speech recognition is far better. I can use it better on my mobile. A colleague, Karthick, asked different models to apply the editing and formatting guidelines for a journal to a manuscript. (E.g. Abbreviate chapter & section numbers, except when a sentence begins with it. Use “1” instead of “one”, etc. except when a sentence begins with it. Things like this.) Gemini Exp 1206 seems to be the most reliable, compared with most other models. GitHub CodeSpaces seems to be coming up more often in my radar, but I’m yet to figure out a use for it. TTS Arena is a benchmark of text-to-speech models. Kokoro-TTS is the current leader. It’s just 82M, runs on Google Colab, and sounds slightly better than OpenAI TTS. chat.qwenlm.ai consolidates all of Qwen’s models in one ChatGPT-like interface.

Wow. Every SINGLE person in the audience at this (Healthcare Education, Singapore) conference was on a laptop, tablet, or mobile. Some on multiple devices. I guess this is the new model of learning and listening. The only people who were NOT on a device were on stage. The speakers. I guess it’s up to me to fix that 🙂 LinkedIn

The Sassy AI Devil’s Advocate

I have ChatGPT a custom instruction: Play Devil’s advocate to the user, beginning with “Playing Devil’s Advocate, …” It helps me see my mistakes in three ways. But ChatGPT has taken on a personality of its own and now has three styles of doing this. How about… – It suggests a useful alternative. Are you sure…? – It thinks you’re wrong and warns you of risks. Yeah, right… – It knows you’re wrong and rubs it in. (Jeeves, the butler, would be proud.) Here are some examples. ...

Features actually used in an LLM playground

At Straive, only a few people have direct access to ChatGPT and similar large language models. We use a portal, LLM Foundry to access LLMs. That makes it easier to prevent and track data leaks. The main page is a playground to explore models and prompts. Last month, I tracked which features were used the most. A. Attaching files was the top task. (The numbers show how many times each feature was clicked.) People usually use local files as context when working with LLMs. ...

Things I Learned - 12 Jan 2025

This week, I learned: Measuring developer productivity with the DX Core 4 is a framework for measuring developer productivity. It encapsulates other frameworks like DORA, SPACE, and DevEx. Can LLMs write better code if you keep asking them to “write better code? A delightful exploration of how Claude 3.5 Sonnet keeps optimizing and adding features to improve code. My takeaway: repeatedly applying a prompt gives us interesting new directions to explore. Wednesday comes from Wōdnesdæg - named after Odin (or Woden). CLIProxyAPI seems a good way to allow any CLI coding agent (Codex, Claude Code, etc.) to work with any provider (e.g. Gemini, OpenRouter, etc.) The documentation needs a few more examples, but it’s usable. mise x github:router-for-me/CLIProxyAPI -- cli-proxy-api starts a local server that proxies requests. Create a config.yaml, update the keys, and configure your coding agent, e.g. Codex to use it. It’s also a good way to see what prompts are being sent by the various harnesses. smolagents is a new agents library from HuggingFace. It seems simple enough to use. whisper-flow does real-time speech transcription! Switchboard-1 is a labelled audio corpus with ~260 hours of speech. It has ~2,400 calls among 500+ speakers in the US. Cloudflare tunnel is like ngrok but more permanent. It’s a bit more complex, too. But given CloudFlare’s liberal free tier, it’s a good, viable option for long-term local hosting. John Wheeler: “We live on an island surrounded by a sea of ignorance. As our island of knowledge grows, so does the shore of our ignorance.” A great way to understand how ignorance actually grows as you learn more. justhtml is a fast enough pure Python fully HTML5 compliant library. For a faster, mostly compliant solution, html5-parser with lxml works. There is little reason to use Redis. There are several clones you can use. Databases in 2024: A Year in Review Microsoft’s Garnet KeyDB (only Linux) ValKey (only source) DragonFly (only Linux) ReDict (only Linux) Every few years, something comes along trying to replace relational databases and SQL, and gets absorbed. YouTube Key value stores. People soon realize they need more features, e.g. indices. MapReduce systems. Most MapReduce vendors put SQL on top of SQL. Then the Hadoop market crashed. (But HDFS, S3, distributed storage systems are a good idea) Document Databases. JSON. SQL absorbed that. SQLite 3.45+ supports even JSONB. DuckDB, of course, has JSON. Column Databases. Again, these introduced SQL. Graph Databases. SQL:2023 introduced graph queries via SQL/PGQ (Property Graph Queries). DuckPGQ beats Neo4J Array Databases. SQL:2023 adds SQL/MDA which allows for matrix operations. But specialized databases might make sense in this category. Vector Databases. Every DB is adding support for this. TheAgentCompany is a benchmark of real-world tasks like: Arranging a meeting room Analyze a spreadsheet Add a Gitlab wiki page Salvatore Sanfilippo (antirez - Redis) finds DeepSeek v3 comparable with Claude 3.5 Sonnet. YouTube He also passed a paper and his code to compare them. A useful prompt. YouTube

“Wait, That’s My Mic!”: Lessons from an AI Co-Host

I spoke at LogicLooM this week, with ChatGPT as my co-panelist. It was so good, it ended up stealing the show. Preparation Co-hosting an AI was one of my goals this year. I tried several methods. ChatGPT’s advanced voice mode: Lets you interrupt it. But if you pause, it replies immediately. Muting caused the app to hang. Realtime API: Gave me control of pauses and custom prompts, but used gpt-4o-realtime-preview (not as good as o1). Standard voice with o1 on Desktop: Worked best. It transcribes my speech, sends it to o1, and speaks back. There’s a lag, but it feels like it’s thinking. I prepped the chat with this prompt: ...

Launching an app only with LLMs and failing

Zohaib Rauf suggested using LLMs to spec code and using Cursor to build it. (via Simon Willison). I tried it. It’s promising, but my first attempt failed. I couldn’t generate a SPEC.md using LLMs At first, I started writing what I wanted. This application identifies the drugs, diseases, and symptoms, as well as the emotions from an audio recording of a patient call in a clinical trial. … and then went on to define the EXACT code structure I wanted. So I spent 20 minutes spec-ing our application structure and 20 minutes spec-ing our internal LLM Foundry APIs and 40 minutes detailing every step of how I wanted the app to look and interact. ...

Things I Learned - 05 Jan 2025

This week, I learned: Some management philosophies used to be successful but are no longer as effective. ChatGPT Command-and-control hierarchy Taylorism: deep specialization Seniority-based advancement Annual performance reviews (without continuous feedback) Up-or-Out promotion models Confidential strategic information Narrow job descriptions Relying on formal authority Some management philosophies have been around for millenia. ChatGPT Lead by example Fairness and empathy Clear, consistent communication Delegation and empowerment Strategic planning and foresight Consistent rule enforcement Rewarding merit Leadership by virtue and character Interview with Liang Wenfeng, CEO of DeepSeek: In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed source approach can’t prevent others from catching up. So we anchor our value in our team – our colleagues grow through this process, accumulate know-how, and form an organization and culture capable of innovation. That’s our moat. ...

Books in 2024

I read 51 new books in 2024 (about the same as in 2023, 2022, 2021, and 2020.) But slightly differently. I only read Manga this year. Fullmetal Alchemist (Vol 12 - 27). What started off as a childishly illustrated children’s book evolved into a complex, gripping plot. Attack on Titan (Vol 1 - 34). I read it while I watched the TV Series (reading first, then watching). It started explosively and the pace never let up. I had to take breaks just to breathe and calm my nerves. The sheer imagination and subtlety is brilliant. It’s hard to decide which is better—the manga (book) or the anime (TV). The TV series translates the book faithfully in plot and in spirit. It helped that I read each chapter first, allowing me to imagine it, and then watch it, which told me what all I missed in the book. I absolutely would not have understood the manga without watching the anime. ...

My Year in 2024

Here’s the report card for my 2024 resolutions: Compound long-term goals, daily. PASS. I managed to work continuously build on 6 areas in 2024: Blogging about 50 posts on my blog and on LinkedIn Weekly notes of things I learned Teaching Tools in Data Science (repo) Reading only Manga Experimenting with LLM applications LLM Evangelization through LLM Foundry, Straive’s LLM portal. Hit 80 heart points, daily. FAIL. I stopped exercise in the second half and gained 7 kgs. Be a better husband. PASS. My wife confirmed that I was “definitely worse in 2023 than 2024.” My most memorable events in 2024 were: ...

Things I Learned - 29 Dec 2024

This week, I learned: A clever idea. Give an LLM a chapter from a textbook. Ask it to generate a unique, playable game to help me learn theconcepts for an exam. Page Bailey What would be the cost of storing about 500GB of LLM cache logs and 5 million write requests per month? CloudFlare KV: $250 + $25 / month Ref MongoDB: $125 + $5 / month Ref S3: $0.0115 + $25 / month Ref + ? CloudFlare R2: $0.0075 + $22.5 / month Ref Satya Nadella prepares for meetings by asking Copilot to tell him everything he needs to know about the client from the CRM, emails, meeting transcripts etc. He shares that colleagues who annotate it further for him. That’s using AI for reasoning and collaborating with colleagues. Satya Nadella | BG2 w/ Bill Gurley & Brad Gerstner WOW. This is how a software agent will work alongside humans: Fix issue #5478: Add color to the line next to “Ran a XXX Command” based on return value - using @openhands-agent. aisuite by Andrew Ng is a unified interface to LLMs. Sort of like an openai library across multiple providers. Learnings from Best of 2024 in Agents (from #1 on SWE-Bench Full, Prof. Graham Neubig of OpenHands/AllHands) Passing code execution as a tool is more powerful than granular tools. You combine multiple tools and tool calls into one. You move code to the data rather than the other way around. Mostly, you need bash, Python (or Jupyter), file manager, web browser. UI: Go where the user is, instead of bringing them to you. A remote runtime is a critical component. Claude 3.5 Sonnet (20241022) and Claude 3.5 Haiku (20241022) perform best on SWE Bench, followed by Deepseek V3, then O1 2024-12-17. X Browsers support SVG favicons as data URLs. So I used this SVG (generated by Claude via Generate a simple, interesting SVG favicon. Keep the SVG size VERY small but it should be inspiring.) Since HNSW indexing is an overhead, just use NumPy matrix multiplication to calculate cosine similarity. For 1M vectors, it takes ~0.05 seconds. A 1M vector dataset handles ~2GB of text at a chunk size of 2K chars. In short, if you’re embedding <2GB of text, just use NumPy. DuckDB’s VSS extension HNSW index + Embeddings (2K chunks of 512 dimensions) takes up roughly 2.5X the size of the original data. Embedding 554 files of ~4,456 KB took 710 seconds. Creating the index took 660 seconds. The resulting DB was 18.1 MB. How to use LLMs in market research. Use LLMs with search for secondary research. Create different personas and run user surveys on them. This paper used 1,052 real-life interview audio transcripts as agent memory to simulate people Generate your market research report using LLMs. Given about 30 generations, Llama 1b outperforms Llama 8b. Ref OpenAI introduced a developer role in addition to the system role. This is mainly for o1. The API is backward compatible - and also forward compatible. OpenAI Em dashes are a strong sign of ChatGPT use. Curly quotes too. Reddit CloudFlare has multiple SSL modes when proxying requests. Off (no encryption): No encryption between browsers and Cloudflare or between Cloudflare and origins. Everything is cleartext HTTP. Flexible: Browsers to Cloudflare is HTTPS, Cloudflare to origin is HTTP. Useful to set up CloudFlare as a HTTP Proxy. Full: Browser to Cloudflare matches browser request. Same protocol is used for Cloudflare to origin, without validating the origin’s certificate. Use for self-signed or otherwise invalid certificates. Full (strict): Similar to Full Mode, but with validation. Strict (SSL-Only Origin Pull): Cloudflare always connects to the origin over HTTPS with certificate validation. Getting this wrong can lead to a HTTP 526: invalid SSL certificate Medical coding is an area ripe for LLMs. Ojasvi Yadav created a repo that uses hierarchical classification (rather than embeddings) to find the right coding. Gemini models seem to understand medical terms better than others. RapidClaims, funded by TogetherAI, is apparently working on this problem. Document to Markdown Converters: PyMuPDF4LLM uses MuPDF. Requires PyTorch. PYTHONUTF8=1 uv run --with pymupdf4llm python -c 'import pymupdf4llm; h = open("pymupdf4llm.md", "w"); h.write(pymupdf4llm.to_markdown("$FILE.pdf"))' markitdown from Microsoft. PDF via PDFMiner, DOCX via Mammoth, XLSX via Pandas, PPTX via Python-PPTD, ZIP, etc. PYTHONUTF8=1 uvx markitdown $FILE.pdf > markitdown.md Docling by IBM. Unable to install via pip on Windows AND on Linux. MegaParse uses libreoffice, pandoc, tesseract-ocr, etc. Requires OpenAI API key. Awesome Tabular LLMs compiles encodings of tables for LLMs. What’s the best way of encoding tabular data for LLMs? Looks like including the cell address helps. Here is an explanation from ChatGPT aspose-words is a Python library that converts documents with many formats (Word, RTF, PDF, HTML, Markdown, EPUB, etc.) Discourse does not support searching across multiple forums. Instead, search for the term in all forums. Example. Then scroll through the results. Then, in the console, hide the ones you don’t want. Example: Hide posts that are not in the “Tools in Data Science” category: $(".badge-category__name").filter(d => d.textContent == "Tools in Data Science").map(d => d.closest(".fps-result")).filter(d => d).forEach(d => d.style.display = "none") How are software engineers are future-proofing their careers in the face of LLMs? Leveraging LLMs as Force Multipliers Use LLMs for repetitive tasks, rapid prototyping, exploring multiple approaches, data extraction and brainstorming, providing feedback. Explore prompting techniques, integrate LLMs into their workflows, and develop strategies for validating and refining LLM-generated code Focusing on higher-level skills that llms struggle with Systems Thinking and Architecture: code readability, extensibility, testability, and maintainability Problem Solving and Critical Thinking: define problems clearly, break them down into manageable parts, and reason through complex scenarios. LLMs produce plausibly incorrect code. Communication and Collaboration Domain Expertise Exploring Adjacent Roles: product management, technical leadership, or consulting. Involve more interaction with clients and stakeholders. Developing “Evergreen” Skills: debugging, system administration, and security. Or outside of software engineering, such as trades or other hands-on vocations. Scepticism: LLMs may not reach a level of sophistication that would render their expertise obsolete. Complex problems, understanding context, and producing high-quality, maintainable code. Examples of agentic AI Text-to-SQL automated business analyst: A system that generates SQL queries from natural language, handles errors, creates visualizations, and includes a FAQ component. The author calls it “constrained agentic AI.” Data source querying system: A bot that queries multiple SQL and API data sources, selecting tools and reformulating tasks as needed. Cursor (agentic mode): An LLM-powered VS Code fork that chains together various LLM capabilities (code generation, applying changes, linting suggestions, terminal commands, codebase RAG) to reduce user prompts. Vulnerability finding system: A system that uses LLM agents to discover novel vulnerabilities in open-source web applications. The agents leave traces of their actions. Marketing strategy generation system: A system using approximately 60 agents to generate marketing strategies. Restaurant finder: A system that searches for restaurants based on dietary preferences and group size, and downloads social media information. Proofreading and editing of transcripts: LLM agents apply specific customer requirements to transcripts after human editing. Meeting notes and action items generator: A system that generates meeting notes and action items. O’Reilly auto parts customer service agent: An agent demonstrated using RAG. UI enhancement agent: An agent that added features like language locales and dark mode to a UI.

When and how to copy assignments

The second project in course asked students to submit code. Copying and collaborating were allowed, but originality gets bonus marks. Bonus Marks 8 marks: Code diversity. You're welcome to copy code and learn from each other. But we encourage diversity too. We will use code embedding similarity (via text-embedding-3-small, dropping comments and docstrings) and give bonus marks for most unique responses. (That is, if your response is similar to a lot of others, you lose these marks.) In setting this rule, I applied two principles. ...

My learnings as week notes

One of my goals for 2024 is to “Compound long-term goals, daily.” Learning is one of those. Some people publish their learnings as weekly notes, like Simon Willison, Thejesh GN, Anil Radhakrishna, and Julia Evans. I follow their notes. I started doing the same, quietly, to see if I could sustain it. It’s been a year and it has sustained. I’m finally publishing them. My week notes are at til.s-anand.net. Here’s the source code. ...