This week, I learned:
- Bard supports extensions that include @Gmail – i.e. converse with your email.
- llama-cpp-python works with other GGUF models like Mistral and allows constrained output - JSON, function calling, etc. Ref
- 12 Tuning Strategies for RAG
- Llama Datasets are RAG datasets created mostly using GPT-4. Mostly small datasets.
- ⭐ Intuitions about large language models
- Bigger models (70b) are much better at learning from few-shot examples. They really learn.
- Bigger models will keep getting better!
- Chain of Thought prompting is a way of providing more compute to complex problems that require more compute
- Models will show emergent (completely new) behaviors that can’t be predicted from extrapolation. These may not be intentional.
- CodeAnt.ai is a VS Code plugin to detect code smells, refactor for modularity, to write docstrings and unit tests
- Anyscale prices the 7b Llama2, Zephyr, Mistral models at 15 cents per 1M tokens. Roughly 1/10th of GPT-3.5 Turbo’s ~$1.5 per 1M tokens
- Tools to identify personally identifiable information:
- galactic can use LLMs to detect PII
- Presidio by Microsoft
- Sherlock is a generic sematic type matching DL model
- pii-extractor-llm was trained on Indian names
- GLiNER is a Lightweight Generalist model for NER
- Tools to explore
- ElevenLabs speaks in your voice
- Cutout Pro removes backgrounds and parts of images
- Vocal Remover removes vocals from songs
- CapCut video editor
- TheBloke’s $35/month Patreon might be one of the least expensive ways to set up quantized LLMs in production.
- Microsoft released table-transformer to extract tables from PDFs. Sample usage
- Convert PDF to markdown with marker - an improvement over nougat.
- JupyterLab has a
%%aimagic to use LLMs within notebooks. Ref - Telling ChatGPT that the year is 2123 makes it bypass copyright. Ref
- Meta released SeamlessExpressive which preserves emotions in speech-to-speech translations
- Unsloth offers faster lower-memory LLM QLoRA finetuning
- DeepSeek is an open-source high-quality LLM
- Scalable Extraction of Training Data from (Production) Language Models extracts training data by repeating a token infinitely.
- SkyPilot lets you run LLMs on any cloud provider.
- vLLM lets you deploy LLMs with a single command.
- llamafile lets you run LLMs locally as a single file executable!