Things I Learned - 10 Nov 2024
This week, I learned: OpenFreeMap is a free embeddable OpenStreetMap tile server. You can use MapLibre GL (more features) or Leaflet (simpler) to render it. It offers styling and self-hosting. Zapier Actions are an easy way to set up custom actions like GMail / Google Calendar APIs for GPTs, since GPTs’ callback URLs keep changing. But they fail often, and don’t work on mobile. At least for me. LLM Vision Use Cases in manufacturing and earth sciences (via Shivku) Automated geoscience image descriptions Ref Interpret Wind Turbine photos and charts, construction monitoring, equipment maintenance & charts Ref Forecast weather based on cloud photos! Ref Analyze thermal image of solar panels, electroluminescence images for warranty claims, ROI estimates from Google Sunroof rooftop images Ref Corrosion detection in electricity towers, turbines, storage tanks, penstock. Interpret non-destructive test images Ref Google counts auto-completion when saying “25% of all the code is written by AI at Google”. “It’s a helpful productivity tool but it’s not doing any engineering at all. It’s probably about as good, maybe slightly worse, than Copilot.” YCombinator Workflow for AI video creation: Use Meshcapade (meshcapade.com) to generate body movement of a 3D-rendered character. Pass that video to Runway’s video-to-video model to generate any visual. Add music from Suno Ref Someone sorted the X and Y columns independently for regression. Ref Android keyboard learning only sends model changes back to server and not local keywords. Model changes are aggregated! Ref Here is a prompt for audio transcription using Gemini. Ref Transcription: Accurately transcribe the audio clip in the original language. Include all spoken words, fillers, slang, colloquialisms, and any code-switching instances. Pay attention to dialects and regional variations common among immigrant communities. Do your best to capture the speech accurately, and flag any unintelligible portions with [inaudible]. Translation: Translate the transcription into English. Preserve the original meaning, context, idiomatic expressions, and cultural references. Ensure that nuances and subtleties are accurately conveyed. Capture Vocal Nuances: Note vocal cues such as tone, pitch, pacing, emphasis, and emotional expressions that may influence the message. These cues are critical for understanding intent and potential impact. Here are some approaches to large-scale classification of medical codes. ChatGPT Fine-Tuning LLMs on Medical Data: Enhance LLMs by training them on medical datasets, such as clinical notes and discharge summaries, to improve their understanding of medical terminology and context. Multi-Agent Frameworks: Implement a multi-agent system that simulates real-world coding processes with distinct roles (e.g., patient, physician, coder, reviewer, adjuster). Each agent utilizes an LLM to perform specific functions, enhancing interpretability and reliability. ArXiv Retrieve-Rank Systems: Develop a two-stage system where the LLM first retrieves potential ICD-10 codes and then ranks them based on relevance, improving precision in code assignment. ArXiv Embedding-Based Approaches: Use LLMs to generate embeddings for ICD-10 codes and medical texts, facilitating the matching of texts to appropriate codes through similarity measures. GitHub Hierarchical Classification: Leverage the hierarchical structure of ICD-10 codes by first classifying texts into broader categories before assigning specific codes, reducing complexity and improving accuracy. ArXiv Two-Stage Verification Models: Combine LLMs with verification models, such as Long Short-Term Memory (LSTM) networks, to validate and refine the codes suggested by the LLM, balancing recall and precision. ArXiv Also, a mixture of models approach might work. Feed any existing NLP model / rules as a second opinion. GraphRAG is better if data is naturally graph-structured. Else, it’s slow and fills up the context window with even vaguely related stuff. Vigneshbabu, AMAT. ChatGPT for Windows desktop supports real-time voice and a global shortcut (Alt Space). uithub converts GitHub repos to Markdown. Just replace “g” in “github.com/…” with “u”. Example WebContainers are a thing and Bolt.new uses them! Docling by IBM converts PDF, DOCX, etc. to Markdown. Like PyMuPDF4LLM but better. Check out Loom and Cleanshot are the recommended tools for screen recording and screenshotting. But Loom is paid and Cleanshot is Mac only. The Rubik’s cube has a Hamiltonian cycle through every one of its 43 quintillion states. Ref OmniParser is great at parsing screenshots and identifying bounding boxes. Recraft.ai is currently SOTA in text to image. It’s fairly impressive and could be a good alternative to Figma. Zed.dev is an AI code editor by the creators of Atom. It’s written in Rust and is blazing fast. It has native AI integration. Artificial Analysis has a bunch of new leaderboards and arenas. Open AI TTS leads the TTS Leaderboard. ElevenLabs is a bit behind. Recraft V3 > Flux 1.1 leads Text to Image Leaderboard Hertz-Dev is an open source realtime voice chat model. But it doesn’t fit in Google Colab T4’s RAM Chain of Thought reduces performance where thinking makes humans worse. Ref. Specifically: Artificial grammar learning Facial recognition Classifying data that has exceptions Creating a LLM-as-a-Judge That Drives Business Results by Hamel Husain. Get THE domain expert (or approver) as the tester. Create a dataset that is DIVERSE. Covers EACH combination of: Features Scenarios: e.g. multiple matches, no match, ambiguous request, invalid/incomplete input, unsupported feature, system error Persona: e.g. new user, expert user, non-native speaker, busy professional, technophobe, elderly user Generate data using existing data + synthetic data for each SPECIFIC combination of the above Evaluate based only on PASS/FAIL with a CRITIQUE detailed enough for a new employee. Include: Nuances: Something a failed response did well or a passed response didn’t quite do well Improvements: Suggest how model can improve Build an SPA to make it easy for the domain expert to review LLMs can be made to unlearn (copyright material) better by identifying components related to the knowledge to unlearn and applying a larger learning rate to these while leaving other parts unchanged. As opposed to low learning rates for all components. Ref