This week, I learned:
- OpenFreeMap is a free embeddable OpenStreetMap tile server. You can use MapLibre GL (more features) or Leaflet (simpler) to render it. It offers styling and self-hosting.
- Zapier Actions are an easy way to set up custom actions like GMail / Google Calendar APIs for GPTs, since GPTs’ callback URLs keep changing. But they fail often, and don’t work on mobile. At least for me.
- LLM Vision Use Cases in manufacturing and earth sciences (via Shivku)
- Automated geoscience image descriptions Ref
- Interpret Wind Turbine photos and charts, construction monitoring, equipment maintenance & charts Ref
- Forecast weather based on cloud photos! Ref
- Analyze thermal image of solar panels, electroluminescence images for warranty claims, ROI estimates from Google Sunroof rooftop images Ref
- Corrosion detection in electricity towers, turbines, storage tanks, penstock. Interpret non-destructive test images Ref
- Google counts auto-completion when saying “25% of all the code is written by AI at Google”. “It’s a helpful productivity tool but it’s not doing any engineering at all. It’s probably about as good, maybe slightly worse, than Copilot.” YCombinator
- Workflow for AI video creation: Use Meshcapade (meshcapade.com) to generate body movement of a 3D-rendered character. Pass that video to Runway’s video-to-video model to generate any visual. Add music from Suno Ref
- Someone sorted the X and Y columns independently for regression. Ref
- Android keyboard learning only sends model changes back to server and not local keywords. Model changes are aggregated! Ref
- Here is a prompt for audio transcription using Gemini. Ref
- Transcription: Accurately transcribe the audio clip in the original language. Include all spoken words, fillers, slang, colloquialisms, and any code-switching instances. Pay attention to dialects and regional variations common among immigrant communities. Do your best to capture the speech accurately, and flag any unintelligible portions with
[inaudible]. - Translation: Translate the transcription into English. Preserve the original meaning, context, idiomatic expressions, and cultural references. Ensure that nuances and subtleties are accurately conveyed.
- Capture Vocal Nuances: Note vocal cues such as tone, pitch, pacing, emphasis, and emotional expressions that may influence the message. These cues are critical for understanding intent and potential impact.
- Transcription: Accurately transcribe the audio clip in the original language. Include all spoken words, fillers, slang, colloquialisms, and any code-switching instances. Pay attention to dialects and regional variations common among immigrant communities. Do your best to capture the speech accurately, and flag any unintelligible portions with
- Here are some approaches to large-scale classification of medical codes. ChatGPT
- Fine-Tuning LLMs on Medical Data: Enhance LLMs by training them on medical datasets, such as clinical notes and discharge summaries, to improve their understanding of medical terminology and context.
- Multi-Agent Frameworks: Implement a multi-agent system that simulates real-world coding processes with distinct roles (e.g., patient, physician, coder, reviewer, adjuster). Each agent utilizes an LLM to perform specific functions, enhancing interpretability and reliability. ArXiv
- Retrieve-Rank Systems: Develop a two-stage system where the LLM first retrieves potential ICD-10 codes and then ranks them based on relevance, improving precision in code assignment. ArXiv
- Embedding-Based Approaches: Use LLMs to generate embeddings for ICD-10 codes and medical texts, facilitating the matching of texts to appropriate codes through similarity measures. GitHub
- Hierarchical Classification: Leverage the hierarchical structure of ICD-10 codes by first classifying texts into broader categories before assigning specific codes, reducing complexity and improving accuracy. ArXiv
- Two-Stage Verification Models: Combine LLMs with verification models, such as Long Short-Term Memory (LSTM) networks, to validate and refine the codes suggested by the LLM, balancing recall and precision. ArXiv
- Also, a mixture of models approach might work. Feed any existing NLP model / rules as a second opinion.
- GraphRAG is better if data is naturally graph-structured. Else, it’s slow and fills up the context window with even vaguely related stuff. Vigneshbabu, AMAT.
- ChatGPT for Windows desktop supports real-time voice and a global shortcut (Alt Space).
- uithub converts GitHub repos to Markdown. Just replace “g” in “github.com/…” with “u”. Example
- WebContainers are a thing and Bolt.new uses them!
- Docling by IBM converts PDF, DOCX, etc. to Markdown. Like PyMuPDF4LLM but better.
- Check out Loom and Cleanshot are the recommended tools for screen recording and screenshotting. But Loom is paid and Cleanshot is Mac only.
- The Rubik’s cube has a Hamiltonian cycle through every one of its 43 quintillion states. Ref
- OmniParser is great at parsing screenshots and identifying bounding boxes.
- Recraft.ai is currently SOTA in text to image. It’s fairly impressive and could be a good alternative to Figma.
- Zed.dev is an AI code editor by the creators of Atom. It’s written in Rust and is blazing fast. It has native AI integration.
- Artificial Analysis has a bunch of new leaderboards and arenas.
- Open AI TTS leads the TTS Leaderboard. ElevenLabs is a bit behind.
- Recraft V3 > Flux 1.1 leads Text to Image Leaderboard
- Hertz-Dev is an open source realtime voice chat model. But it doesn’t fit in Google Colab T4’s RAM
- Chain of Thought reduces performance where thinking makes humans worse. Ref. Specifically:
- Artificial grammar learning
- Facial recognition
- Classifying data that has exceptions
- Creating a LLM-as-a-Judge That Drives Business Results by Hamel Husain.
- Get THE domain expert (or approver) as the tester.
- Create a dataset that is DIVERSE.
- Covers EACH combination of:
- Features
- Scenarios: e.g. multiple matches, no match, ambiguous request, invalid/incomplete input, unsupported feature, system error
- Persona: e.g. new user, expert user, non-native speaker, busy professional, technophobe, elderly user
- Generate data using existing data + synthetic data for each SPECIFIC combination of the above
- Evaluate based only on PASS/FAIL with a CRITIQUE detailed enough for a new employee. Include:
- Nuances: Something a failed response did well or a passed response didn’t quite do well
- Improvements: Suggest how model can improve
- Build an SPA to make it easy for the domain expert to review
- LLMs can be made to unlearn (copyright material) better by identifying components related to the knowledge to unlearn and applying a larger learning rate to these while leaving other parts unchanged. As opposed to low learning rates for all components. Ref