Notes on Large Language Models


  • (Very) High Level Notes on LLM Execution
    • LLMs take a prompt and then calculate probabilities of words (tokens?) that should follow each other
    • For prompt Go is... the LLM may generate these words in descending order of probability:
      1. a
      2. programming
      3. language
    • NOTE: this is very similar to autocomplete
    • LLMs will calculate probabilities for every word (token?) it knows about (DA: scale seem massive)
    • LLM Temperature configuration setting: sounds like it drives some sort of variablity into the output so that the results are not always driven by strict probabilites (this would be boring or tend to lack creativity)
  • Enterprises are gradually migrating to use Open LLMs (e.g. Llama 3, deepseek, Mistral) from Closed LLMs (e.g. OpenAI [ChatGPT], Anthropic [Claude])
  • Daniel Whitenack’s spectrum of AI complexity.
    • Basic Prompting
    • Prompt Engineering (CoT, templates, parameters)
    • Augmentation, Retrieval
    • Agents, Chaining
    • Fine-tuning via a closed API
    • Fine-tuning an open model
    • Training a model from scratch
  • Prompts
  • Accuracy
    • Autocomplete LLMs focus on coherency so they answers may be coherent but not necessary accurate. PredictionGuard uses factual consistency checking models to confirm accuracy.
      • For example: The White House is painted pink. (the sentence is coherent not accurate)
  • Training Models
    • DW: “You should never, ever, ever have to train a model for the rest of your life
      • You should…
        • Use an open model and inject your data into the prompt
        • At most, you may need to fine tune a model

LLM Model API Behavior


  • Daniel Whitenack: most APIs including PredictionGuard (Daniel’s company) will start streaming completion text immediately and essentially stream/spit it out serially
    • NOTE: In the PredictionGuard go code, they use a channel to receive that stream (and then your go program can start print it?)

Embeddings / Vector Representations


Prompts


  • Prompt Formatting: various models are trained to handle prompts with specific text formatting. Structuring prompts in this way should optimize execution (?)

ChatML Format

  • the actual text would go into the curly brace area
  <|im_start|>system
{prompt}<|im_end|>
<|im_start|>user
{context or user message}<|im_end|>
<|im_start|>assistant<|im_end|>
  

Large Language vs. Foundation Models


From ChatGPT Summary of differences between Large Language and Foundation Models

Feature Large Language Models (LLMs) Foundation Models (FMs)
Scope Focused on text-based tasks Can handle text, images, video, and more
Training Data Large text datasets Multimodal datasets (text, images, video, audio)
Use Cases Chatbots, code generation, text summarization Image generation, speech recognition, robotics, multimodal AI
Examples GPT-4, BERT, LLaMA GPT-4V, CLIP, Gemini, DALL·E

Summary

  • LLMs are a subset of Foundation Models that specialize in language processing.
  • Foundation Models are more general-purpose and can handle multiple types of data (text, images, audio, etc.).
  • Many LLMs (like GPT-4) can also be considered Foundation Models because they serve as a base for fine-tuning.

View this page on GitHub