Things I Learned - 10 Nov 2024

This week, I learned:

OpenFreeMap is a free embeddable OpenStreetMap tile server. You can use MapLibre GL (more features) or Leaflet (simpler) to render it. It offers styling and self-hosting.
Zapier Actions are an easy way to set up custom actions like GMail / Google Calendar APIs for GPTs, since GPTs’ callback URLs keep changing. But they fail often, and don’t work on mobile. At least for me.
LLM Vision Use Cases in manufacturing and earth sciences (via Shivku)
- Automated geoscience image descriptions Ref
- Interpret Wind Turbine photos and charts, construction monitoring, equipment maintenance & charts Ref
- Forecast weather based on cloud photos! Ref
- Analyze thermal image of solar panels, electroluminescence images for warranty claims, ROI estimates from Google Sunroof rooftop images Ref
- Corrosion detection in electricity towers, turbines, storage tanks, penstock. Interpret non-destructive test images Ref
Google counts auto-completion when saying “25% of all the code is written by AI at Google”. “It’s a helpful productivity tool but it’s not doing any engineering at all. It’s probably about as good, maybe slightly worse, than Copilot.” YCombinator
Workflow for AI video creation: Use Meshcapade (meshcapade.com) to generate body movement of a 3D-rendered character. Pass that video to Runway’s video-to-video model to generate any visual. Add music from Suno Ref
Someone sorted the X and Y columns independently for regression. Ref
Android keyboard learning only sends model changes back to server and not local keywords. Model changes are aggregated! Ref
Here is a prompt for audio transcription using Gemini. Ref
- Transcription: Accurately transcribe the audio clip in the original language. Include all spoken words, fillers, slang, colloquialisms, and any code-switching instances. Pay attention to dialects and regional variations common among immigrant communities. Do your best to capture the speech accurately, and flag any unintelligible portions with [inaudible].
- Translation: Translate the transcription into English. Preserve the original meaning, context, idiomatic expressions, and cultural references. Ensure that nuances and subtleties are accurately conveyed.
- Capture Vocal Nuances: Note vocal cues such as tone, pitch, pacing, emphasis, and emotional expressions that may influence the message. These cues are critical for understanding intent and potential impact.
Here are some approaches to large-scale classification of medical codes. ChatGPT
- Fine-Tuning LLMs on Medical Data: Enhance LLMs by training them on medical datasets, such as clinical notes and discharge summaries, to improve their understanding of medical terminology and context.
- Multi-Agent Frameworks: Implement a multi-agent system that simulates real-world coding processes with distinct roles (e.g., patient, physician, coder, reviewer, adjuster). Each agent utilizes an LLM to perform specific functions, enhancing interpretability and reliability. ArXiv
- Retrieve-Rank Systems: Develop a two-stage system where the LLM first retrieves potential ICD-10 codes and then ranks them based on relevance, improving precision in code assignment. ArXiv
- Embedding-Based Approaches: Use LLMs to generate embeddings for ICD-10 codes and medical texts, facilitating the matching of texts to appropriate codes through similarity measures. GitHub
- Hierarchical Classification: Leverage the hierarchical structure of ICD-10 codes by first classifying texts into broader categories before assigning specific codes, reducing complexity and improving accuracy. ArXiv
- Two-Stage Verification Models: Combine LLMs with verification models, such as Long Short-Term Memory (LSTM) networks, to validate and refine the codes suggested by the LLM, balancing recall and precision. ArXiv
- Also, a mixture of models approach might work. Feed any existing NLP model / rules as a second opinion.
GraphRAG is better if data is naturally graph-structured. Else, it’s slow and fills up the context window with even vaguely related stuff. Vigneshbabu, AMAT.
ChatGPT for Windows desktop supports real-time voice and a global shortcut (Alt Space).
uithub converts GitHub repos to Markdown. Just replace “g” in “github.com/…” with “u”. Example
WebContainers are a thing and Bolt.new uses them!
Docling by IBM converts PDF, DOCX, etc. to Markdown. Like PyMuPDF4LLM but better.
Check out Loom and Cleanshot are the recommended tools for screen recording and screenshotting. But Loom is paid and Cleanshot is Mac only.
The Rubik’s cube has a Hamiltonian cycle through every one of its 43 quintillion states. Ref
OmniParser is great at parsing screenshots and identifying bounding boxes.
Recraft.ai is currently SOTA in text to image. It’s fairly impressive and could be a good alternative to Figma.
Zed.dev is an AI code editor by the creators of Atom. It’s written in Rust and is blazing fast. It has native AI integration.
Artificial Analysis has a bunch of new leaderboards and arenas.
- Open AI TTS leads the TTS Leaderboard. ElevenLabs is a bit behind.
- Recraft V3 > Flux 1.1 leads Text to Image Leaderboard
Hertz-Dev is an open source realtime voice chat model. But it doesn’t fit in Google Colab T4’s RAM
Chain of Thought reduces performance where thinking makes humans worse. Ref. Specifically:
- Artificial grammar learning
- Facial recognition
- Classifying data that has exceptions
Creating a LLM-as-a-Judge That Drives Business Results by Hamel Husain.
- Get THE domain expert (or approver) as the tester.
- Create a dataset that is DIVERSE.
- Covers EACH combination of:
  - Features
  - Scenarios: e.g. multiple matches, no match, ambiguous request, invalid/incomplete input, unsupported feature, system error
  - Persona: e.g. new user, expert user, non-native speaker, busy professional, technophobe, elderly user
- Generate data using existing data + synthetic data for each SPECIFIC combination of the above
- Evaluate based only on PASS/FAIL with a CRITIQUE detailed enough for a new employee. Include:
  - Nuances: Something a failed response did well or a passed response didn’t quite do well
  - Improvements: Suggest how model can improve
- Build an SPA to make it easy for the domain expert to review
LLMs can be made to unlearn (copyright material) better by identifying components related to the knowledge to unlearn and applying a larger learning rate to these while leaving other parts unchanged. As opposed to low learning rates for all components. Ref

Related