Skip to content

Latest commit

 

History

History
172 lines (150 loc) · 29.4 KB

File metadata and controls

172 lines (150 loc) · 29.4 KB

Large Language Model: Challenges and Solutions

AGI Discussion and Social Impact

OpenAI's Roadmap and Products

OpenAI's roadmap

  • The Timeline of the OpenaAI's Founder Journeys [15 Oct 2024]
  • Humanloop Interview 2023 : doc [29 May 2023]
  • OpenAI’s CEO Says the Age of Giant AI Models Is Already Over ref [17 Apr 2023]
  • Q* (pronounced as Q-Star): The model, called Q* was able to solve basic maths problems it had not seen before, according to the tech news site the Information. ref [23 Nov 2023]
  • Sam Altman reveals in an interview with Bill Gates (2 days ago) what's coming up in GPT-4.5 (or GPT-5): Potential integration with other modes of information beyond text, better logic and analysis capabilities, and consistency in performance over the next two years. ref [12 Jan 2024]

OpenAI o1

  • A new series of reasoning models: The complex reasoning-specialized model, OpenAI o1 series, excels in math, coding, and science, outperforming GPT-4o on key benchmarks. [12 Sep 2024] / ref: Awesome LLM Strawberry (OpenAI o1) GitHub Repo stars
  • A Comparative Study on Reasoning Patterns of OpenAI's o1 Model: 6 types of o1 reasoning patterns (i.e., Systematic Analysis (SA), Method Reuse (MR), Divide and Conquer (DC), Self-Refinement (SR), Context Identification (CI), and Emphasizing Constraints (EC)). the most commonly used reasoning patterns in o1 are DC and SR [17 Oct 2024]
  • OpenAI o1 system card [5 Dec 2024]

GPT-4 details leaked unverified

  • GPT-4V(ision) system card: ref [25 Sep 2023] / ref
  • The Dawn of LMMs: [cnt]: Preliminary Explorations with GPT-4V(ision) [29 Sep 2023]
  • GPT-4 details leaked
    • GPT-4 is a language model with approximately 1.8 trillion parameters across 120 layers, 10x larger than GPT-3. It uses a Mixture of Experts (MoE) model with 16 experts, each having about 111 billion parameters. Utilizing MoE allows for more efficient use of resources during inference, needing only about 280 billion parameters and 560 TFLOPs, compared to the 1.8 trillion parameters and 3,700 TFLOPs required for a purely dense model.
    • The model is trained on approximately 13 trillion tokens from various sources, including internet data, books, and research papers. To reduce training costs, OpenAI employs tensor and pipeline parallelism, and a large batch size of 60 million. The estimated training cost for GPT-4 is around $63 million. ref [Jul 2023]

OpenAI Products

  • ChatGPT can now see, hear, and speak: It has recently been updated to support multimodal capabilities, including voice and image. [25 Sep 2023] Whisper / CLIP GitHub Repo stars GitHub Repo stars
  • ChatGPT Plugin [23 Mar 2023]
  • ChatGPT Function calling [Jun 2023] > Azure OpenAI supports function calling. ref
  • Custom instructions: In a nutshell, the Custom Instructions feature is a cross-session memory that allows ChatGPT to retain key instructions across chat sessions. [20 Jul 2023]
  • GPT-3.5 Turbo Fine-tuning Fine-tuning for GPT-3.5 Turbo is now available, with fine-tuning for GPT-4 coming this fall. [22 Aug 2023]
  • Open AI Enterprise: Removes GPT-4 usage caps, and performs up to two times faster ref [28 Aug 2023]
  • DALL·E 3 : In September 2023, OpenAI announced their latest image model, DALL-E 3 git [Sep 2023] GitHub Repo stars
  • OpenAI DevDay 2023: GPT-4 Turbo with 128K context, Assistants API (Code interpreter, Retrieval, and function calling), GPTs (Custom versions of ChatGPT: ref), Copyright Shield, Parallel Function Calling, JSON Mode, Reproducible outputs [6 Nov 2023]
  • Introducing the GPT Store: Roll out the GPT Store to ChatGPT Plus, Team and Enterprise users GPTs [10 Jan 2024]
  • New embedding models text-embedding-3-small: Embedding size: 512, 1536 text-embedding-3-large: Embedding size: 256,1024,3072 [25 Jan 2024]
  • Sora Text-to-video model. Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt. [15 Feb 2024]
  • ChatGPT Memory: Remembering things you discuss across all chats saves you from having to repeat information and makes future conversations more helpful. [Apr 2024]
  • CriticGPT: a version of GPT-4 fine-tuned to critique code generated by ChatGPT [27 Jun 2024]
  • SearchGPT: AI search [25 Jul 2024] > ChatGPT Search [31 Oct 2024]
  • Structured Outputs in the API: a new feature designed to ensure model-generated outputs will exactly match JSON Schemas provided by developers. [6 Aug 2024]
  • OpenAI DevDay 2024: Real-time API (speech-to-speech), Vision Fine-Tuning, Prompt Caching, and Distillation (fine-tuning a small language model using a large language model). ref [1 Oct 2024]

GPT series release date

  • GPT 1: Decoder-only model. 117 million parameters. [Jun 2018] git GitHub Repo stars
  • GPT 2: Increased model size and parameters. 1.5 billion. [14 Feb 2019] git GitHub Repo stars
  • GPT 3: Introduced few-shot learning. 175B. [11 Jun 2020] git GitHub Repo stars
  • GPT 3.5: 3 variants each with 1.3B, 6B, and 175B parameters. [15 Mar 2022] Estimate the embedding size of OpenAI's gpt-3.5-turbo to be about 4,096
  • ChatGPT: GPT-3 fine-tuned with RLHF. 20B or 175B. unverified ref [30 Nov 2022]
  • GPT 4: Mixture of Experts (MoE). 8 models with 220 billion parameters each, for a total of about 1.76 trillion parameters. unverified ref [14 Mar 2023]
  • GPT-4o: o stands for Omni. 50% cheaper. 2x faster. Multimodal input and output capabilities (text, audio, vision). supports 50 languages. [13 May 2024] / GPT-4o mini: 15 cents per million input tokens, 60 cents per million output tokens, MMLU of 82%, and fast. [18 Jul 2024]
  • OpenAI o1 [12 Sep 2024]

Context constraints

  • Sparse Attention: Generating Long Sequences with Sparse Transformer:💡Sparse attention computes scores for a subset of pairs, selected via a fixed or learned sparsity pattern, reducing calculation costs. Strided attention: image, audio / Fixed attention:text ref / git [23 Apr 2019] GitHub Repo stars
  • Rotary Positional Embedding (RoPE):💡[cnt] / ref / doc [20 Apr 2021]
    • How is this different from the sinusoidal embeddings used in "Attention is All You Need"?
      1. Sinusoidal embeddings apply to each coordinate individually, while rotary embeddings mix pairs of coordinates
      2. Sinusoidal embeddings add a cos or sin term, while rotary embeddings use a multiplicative factor.
      3. Rotary embeddings are applied to positional encoding to K and V, not to the input embeddings.
  • Structured Prompting: Scaling In-Context Learning to 1,000 Examples: [cnt] [13 Dec 2022]
    1. Microsoft's Structured Prompting allows thousands of examples, by first concatenating examples into groups, then inputting each group into the LM. The hidden key and value vectors of the LM's attention modules are cached. Finally, when the user's unaltered input prompt is passed to the LM, the cached attention vectors are injected into the hidden layers of the LM.
    2. This approach wouldn't work with OpenAI's closed models. because this needs to access [keys] and [values] in the transformer internals, which they do not expose. You could implement yourself on OSS ones. cite [07 Feb 2023]
  • Introducing 100K Context Windows: hundreds of pages, Around 75,000 words; [11 May 2023] demo Anthropic Claude
  • Lost in the Middle: How Language Models Use Long Contexts:💡[cnt] [6 Jul 2023]
    1. Best Performace when relevant information is at beginning
    2. Too many retrieved documents will harm performance
    3. Performacnce decreases with an increase in context
  • Ring Attention: [cnt]: 1. Ring Attention, which leverages blockwise computation of self-attention to distribute long sequences across multiple devices while overlapping the communication of key-value blocks with the computation of blockwise attention. 2. Ring Attention can reduce the memory requirements of Transformers, enabling us to train more than 500 times longer sequence than prior memory efficient state-of-the-arts and enables the training of sequences that exceed 100 million in length without making approximations to attention. 3. we propose an enhancement to the blockwise parallel transformers (BPT) framework. git [3 Oct 2023] GitHub Repo stars
  • “Needle in a Haystack” Analysis [21 Nov 2023]: Context Window Benchmarks; Claude 2.1 (200K Context Window) vs GPT-4; Long context prompting for Claude 2.1 adding just one sentence, “Here is the most relevant sentence in the context:”, to the prompt resulted in near complete fidelity throughout Claude 2.1’s 200K context window. [6 Dec 2023] GitHub Repo stars
  • LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning. With only four lines of code modification, the proposed method can effortlessly extend existing LLMs' context window without any fine-tuning. [2 Jan 2024]
  • Giraffe: Adventures in Expanding Context Lengths in LLMs. A new truncation strategy for modifying the basis for the position encoding. ref [2 Jan 2024]
  • Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism. Integrate attention from both local and global attention. [10 Apr 2024]

Numbers LLM

Trustworthy, Safe and Secure LLM

  • NIST AI Risk Management Framework: NIST released the first complete version of the NIST AI RMF Playbook on March 30, 2023
  • Guardrails Hub: Guardrails for common LLM validation use cases
  • NeMo Guardrails: Building Trustworthy, Safe and Secure LLM Conversational Systems [Apr 2023] GitHub Repo stars
  • Political biases of LLMs: [cnt]: From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models. [15 May 2023]
  • Trustworthy LLMs: [cnt]: Comprehensive overview for assessing LLM trustworthiness; Reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. [10 Aug 2023]
  • Red Teaming: The term red teaming has historically described systematic adversarial attacks for testing security vulnerabilities. LLM red teamers should be a mix of people with diverse social and professional backgrounds, demographic groups, and interdisciplinary expertise that fits the deployment context of your AI system. ref
  • The Foundation Model Transparency Index: [cnt]: A comprehensive assessment of the transparency of foundation model developers ref [19 Oct 2023]
  • Hallucinations: [cnt]: A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions [9 Nov 2023]
  • Hallucination Leaderboard: Evaluate how often an LLM introduces hallucinations when summarizing a document. [Nov 2023]
  • Hallucination Index: w.r.t. RAG, Testing LLMs with short (≤5k), medium (5k–25k), and long (40k–100k) contexts to evaluate improved RAG performance [Nov 2023]
  • FactTune: A procedure that enhances the factuality of LLMs without the need for human feedback. The process involves the fine-tuning of a separated LLM using methods such as DPO and RLAIF, guided by preferences generated by FActScore. [14 Nov 2023] FActScore works by breaking down a generation into a series of atomic facts and then computing the percentage of these atomic facts by a reliable knowledge source. GitHub Repo stars
  • OpenAI Weak-to-strong generalization:💡In the superalignment problem, humans must supervise models that are much smarter than them. The paper discusses supervising a GPT-4 or 3.5-level model using a GPT-2-level model. It finds that while strong models supervised by weak models can outperform the weak models, they still don’t perform as well as when supervised by ground truth. git [14 Dec 2023] GitHub Repo stars
  • A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models: A compre hensive survey of over thirty-two techniques developed to mitigate hallucination in LLMs [2 Jan 2024]
  • Anthropic Many-shot jailbreaking: simple long-context attack, Bypassing safety guardrails by bombarding them with unsafe or harmful questions and answers. [3 Apr 2024]
  • The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions. The OpenAI highlights the need for instruction privileges in LLMs to prevent attacks and proposes training models to conditionally follow lower-level instructions based on their alignment with higher-level instructions. [19 Apr 2024]
  • Frontier Safety Framework: Google DeepMind, Frontier Safety Framework, a set of protocols designed to identify and mitigate potential harms from future AI systems. [17 May 2024]
  • Mapping the Mind of a Large Language Model: Anthrophic, A technique called "dictionary learning" can help understand model behavior by identifying which features respond to a particular input, thus providing insight into the model's "reasoning." ref [21 May 2024]
  • Extracting Concepts from GPT-4: Sparse Autoencoders identify key features, enhancing the interpretability of language models like GPT-4. They extract 16 million interpretable features using GPT-4's outputs as input for training. [6 Jun 2024]
  • AI models collapse when trained on recursively generated data: Model Collapse. We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. [24 Jul 2024]
  • LLMs Will Always Hallucinate, and We Need to Live With This: LLMs cannot completely eliminate hallucinations through architectural improvements, dataset enhancements, or fact-checking mechanisms due to fundamental mathematical and logical limitations. [9 Sep 2024]
  • Large Language Models Reflect the Ideology of their Creators: When prompted in Chinese, all LLMs favor pro-Chinese figures; Western LLMs similarly align more with Western values, even in English prompts. [24 Oct 2024]

Large Language Model Is: Abilities