Things I Learned - 18 Feb 2024

This week, I learned: Fine tuning makes economic sense only if the input tokens SAVED is twice the output token size on each call. Docker container memory usage on WSL2 docker stats frolvlad/alpine-glibc:alpine-3.17: 540KB ubuntu: 1MB (python3: +5MB) nikolaik/python-nodejs:python3.10-nodejs18-bullseye: 1.4MB (python3: +5MB) python:3-alpine: 612KB (python3: +7.5MB) python:3: 500KB (python3: +11.2MB) continuumio/miniconda3: 7.6MB (+6.5MB) Discussion with Vinu Yamunan Databuck by FirstEigen. Autolysis plus monitoring Quality council has the data steward (maintainer of each dataset) coming together with the uses on a weekly basis to understand what quality problems to users are facing. Data owners jaundice at a lower frequency to get an understanding #TODO Automate rules for data quality in our projects and intranet Convert a config rule into business language. Explain SQL. These are good use cases for llm’s Graph DBs are powerful for flexible data structures, but query generation needs AI or expertise. Check the Neo4J language cypher Explore storing SAME data in relational DBs AND in graph DBs / document DBs for different use cases Dallas rocketry challenge. Build a rocket that can take an egg to 800 feet exactly and land without breaking it Discussion with Karthik A #TODO Ask IIT students to do internship tasks. Use advent of code is a qualifying criterion Tata motors unionized DB admins for longevity. No one can take their jobs. Hires people who LIKE their jobs Rust gives me typing. It’s very efficient. Pola.rs is interesting but Pandas as good enough. Explore alerts from CCTV feeds. Karthik sends email alerts with pictures for: “Is the machine on or off”? for productivity “Are people not wearing helmets?” for safety at Cummins #TODO Integrate with WhatsApp. Use LLMs with function calling for responses Use expiring links (to pictures or content). It increases engagement Check Deno licensing. Is there a commercial clause? #ANS No - it’s MIT license Centre or excellence for zero emission tech at IIT. Karthik is part of it Explore auth0. 7000 users are free toml is part of the Python 3.11 standard library! If copilot writes code we don’t understand we are screwed. Hence expertise matters Discussion with Vikas Kedia #TODO Plan an AMA The mind becomes lazy with financial success. Vikas is treating his podcast as a startup Hire a professional videographer for your content Financial RoI in financial markets is the highest. Programming is high too but FS is even better “Performative power” – when you’re forced to perform, you get better ideas Observable 2.0 is an open source static site generator for data Python dataclasses SORA is OpenAI’s video generation model, and is stunning! If Appa comes to Singapore even for a week, he will feel better and can boast to his friends. At over 90, it may be better to move Appa to where I am since many of his friends would be no more and shops, doctors, etc can be managed and getting an independent house nearby is not hard. There is an SEZ in Gujarat where Indians can invest like in Mauritius without forex restraint Shubha: Media sites are moving away from Vickrey auctions to first-price auctions for ads. That’s because they send the auction price forward to a search engine and the winning second-price value can lose even though the owner is willing to pay more. Second-price auctions don’t work unless ALL bidders are in the SAME auction. Ad networks are a hierarchy of auctions! Gemini 1.5 launched. Fly.io offers GPU hosting and auto stop when they have nothing to do. Embeddings in random forest are very effective at classification – much better than dot product. To deploy apps with OAuth + templating support in a small Docker container, use Caddy Deno has native TypeScript, browser APIs, and compiles to multiple OSs Ruff is a MUCH faster flake8 Two pass generation is a clever technique to get multiple SEQUENTIAL answers in a single API request. For example the schema {'code', 'optimized_code'} will generate code and then optimize it. Unions in function calling allows flexible multi-step prompts in a single API.