This week, I learned:

  • **/*.md can search for all Markdown files. Julia Evans
  • Windows 11 2024 Update features: Ref
    • Live captions (via the tray) can transcribe audio and microphone.
    • Cocreator in Paint lets you draw crudely and enhances it with AI. The neat UI is a slider that lets you control how close it should be to your drawing.
    • Voice Clarity automatically cancels echo, reduces background noise, and minimizes reverb.
    • Studio Effects (via the tray) lets you apply camera effects on all apps. Eye contact feature is CLEVER!
    • sudo lets you run commands with admin privileges from the command line. source
  • Roaming RAG is an alternative to RAG without the vector database.
    • Applicable to well structured documents, e.g. technical books, manuals, etc.
    • Create a hierarchical outline of the document. Code
      • Keep the top-level headings.
      • Preserve the first ~100 characters of opening text from each section.
      • Present the second-level headings, but without any subsidiary content.
      • Provide each section a unique 8 digit hex identifier.
      • Each section heading is followed by a guiding comment for the model: Section collapsed - expand with expand_section("{identifier}").
    • Then read the relevant sections as context to answer the question. Code
  • Traffic to StackOverflow has fallen considerably. Especially from young and Indian developers. StackOverflow revenue is down. Via Prashanth. They’re exploring:
    • Licensing their content. (Meta says high quality content improves LLM performance by 30% on HumanEval)
    • Enterprise StackOverflow for system integration
    • Fine-tuned versions of Enterprise Stackoverflow for enterprises
    • Integrate StackOverflow within your IDE. Ask questions, post directly
  • I surveyed the Gramener QA team on how they were using LLMs.
    • 7 used it for code generation (e.g. date extraction, regex generation)
    • 4 used it for learning (e.g. Robot Framework, how to define test cases, API usage)
    • 3 used it for formula generation (e.g. Excel)
    • 2 used it for test scenario identification
    • 2 used it for test data generation
    • 2 used it for comparing expected vs actual datasets
    • 1 used it for data type identification (e.g. given sample values, identify the data type).
    • 1 used it for evaluating resulting (LLM as a judge)
  • I asked the Straive Digitalized Operations team what management techniques they would apply to manage LLMs. Here are the responses:
    • Ask better questions. (Prompt engineering.)
    • Create templates or step-by-step instructions. (Chain of Thought.)
    • Ask for multiple options and pick from the best options. (Agentic approach?)
    • Training. (Fine tuning.)
    • Price weaker responses lower. (Stratified model pricing?)
  • “LLM hallucinations are a good thing. They are a sign of diversity, allowing us to improve the answer by exploring multiple paths.” – A colleague from Straive.
  • Hyperbrowser is a cloud based puppeteer service.
  • Bedrock Llama models can’t be directly called with their model names. You need to use their inference profile names, e.g. us.meta.llama3-2-11b-instruct-v1:0 if the model is in a US region.
  • Hacker News RSS is a good way to get RSS feeds from Hacker News. It’s also a good way to understand how to convert a news source into RSS feeds. BlueSky has RSS feeds too
  • When embedding using a SentenceTransformer.encode(docs) it’s best if we embed with smaller docs and call it multiple times (rather than embedding more at once). On Colab T4, for gte-base-en-v1.5, when embedding 1,000 docs of up to 8K chars each, here is the TOTAL time it took, based on batch sizes (lower is better)
    • 1 doc per call: 10s
    • 2 docs per call: 13s
    • 4 docs per call: 19s
    • 8 docs per call: 23s
    • 16 docs per call: 32s
    • 32 docs per call: 40s
  • Running embeddings without a GPU is extremely slow. It takes ~2.4 seconds per string.