diff --git a/.DS_Store b/.DS_Store index 0ab78ff5..c4574f1e 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/Lectures/.DS_Store b/Lectures/.DS_Store index fe476a35..980a055b 100644 Binary files a/Lectures/.DS_Store and b/Lectures/.DS_Store differ diff --git a/Lectures/S0-L20/.DS_Store b/Lectures/S0-L20/.DS_Store index b73a5977..172046b9 100644 Binary files a/Lectures/S0-L20/.DS_Store and b/Lectures/S0-L20/.DS_Store differ diff --git a/Lectures/S0-L20/images/.DS_Store b/Lectures/S0-L20/images/.DS_Store index df2dde5c..db3fd20a 100644 Binary files a/Lectures/S0-L20/images/.DS_Store and b/Lectures/S0-L20/images/.DS_Store differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide10.png b/Lectures/S0-L20/images/Reasoning/Slide10.png new file mode 100644 index 00000000..13cfb383 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide10.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide11.png b/Lectures/S0-L20/images/Reasoning/Slide11.png new file mode 100644 index 00000000..4f85d2d3 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide11.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide12.png b/Lectures/S0-L20/images/Reasoning/Slide12.png new file mode 100644 index 00000000..8c313149 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide12.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide13.png b/Lectures/S0-L20/images/Reasoning/Slide13.png new file mode 100644 index 00000000..9e30c13a Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide13.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide14.png b/Lectures/S0-L20/images/Reasoning/Slide14.png new file mode 100644 index 00000000..7676a95f Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide14.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide15.png b/Lectures/S0-L20/images/Reasoning/Slide15.png new file mode 100644 index 00000000..fdd38dd3 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide15.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide16.png b/Lectures/S0-L20/images/Reasoning/Slide16.png new file mode 100644 index 00000000..80c5cea2 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide16.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide17.png b/Lectures/S0-L20/images/Reasoning/Slide17.png new file mode 100644 index 00000000..d95f45b8 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide17.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide18.png b/Lectures/S0-L20/images/Reasoning/Slide18.png new file mode 100644 index 00000000..2352a443 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide18.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide19.png b/Lectures/S0-L20/images/Reasoning/Slide19.png new file mode 100644 index 00000000..2c01d21b Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide19.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide2.png b/Lectures/S0-L20/images/Reasoning/Slide2.png new file mode 100644 index 00000000..681fcfa0 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide2.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide3.png b/Lectures/S0-L20/images/Reasoning/Slide3.png new file mode 100644 index 00000000..e58dd54c Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide3.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide4.png b/Lectures/S0-L20/images/Reasoning/Slide4.png new file mode 100644 index 00000000..65b30c21 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide4.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide5.png b/Lectures/S0-L20/images/Reasoning/Slide5.png new file mode 100644 index 00000000..0034f095 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide5.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide6.png b/Lectures/S0-L20/images/Reasoning/Slide6.png new file mode 100644 index 00000000..1f25056e Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide6.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide7.png b/Lectures/S0-L20/images/Reasoning/Slide7.png new file mode 100644 index 00000000..83427247 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide7.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide8.png b/Lectures/S0-L20/images/Reasoning/Slide8.png new file mode 100644 index 00000000..19ade72d Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide8.png differ diff --git a/Lectures/S0-L20/images/Reasoning/Slide9.png b/Lectures/S0-L20/images/Reasoning/Slide9.png new file mode 100644 index 00000000..c010878e Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/Slide9.png differ diff --git a/Lectures/S0-L20/images/Reasoning/img_01.png b/Lectures/S0-L20/images/Reasoning/img_01.png new file mode 100644 index 00000000..1a69559a Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/img_01.png differ diff --git a/Lectures/S0-L20/images/Reasoning/img_02.png b/Lectures/S0-L20/images/Reasoning/img_02.png new file mode 100644 index 00000000..4c2e8abe Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/img_02.png differ diff --git a/Lectures/S0-L20/images/Reasoning/img_03.png b/Lectures/S0-L20/images/Reasoning/img_03.png new file mode 100644 index 00000000..9757afc8 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/img_03.png differ diff --git a/Lectures/S0-L20/images/Reasoning/img_04.png b/Lectures/S0-L20/images/Reasoning/img_04.png new file mode 100644 index 00000000..548d7a72 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/img_04.png differ diff --git a/Lectures/S0-L20/images/Reasoning/img_05.png b/Lectures/S0-L20/images/Reasoning/img_05.png new file mode 100644 index 00000000..be9fa8c6 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/img_05.png differ diff --git a/Lectures/S0-L20/images/Reasoning/img_06.png b/Lectures/S0-L20/images/Reasoning/img_06.png new file mode 100644 index 00000000..9f50ced1 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/img_06.png differ diff --git a/Lectures/S0-L20/images/Reasoning/img_07.png b/Lectures/S0-L20/images/Reasoning/img_07.png new file mode 100644 index 00000000..7923c0a4 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/img_07.png differ diff --git a/Lectures/S0-L20/images/Reasoning/img_08.png b/Lectures/S0-L20/images/Reasoning/img_08.png new file mode 100644 index 00000000..cd2d934d Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/img_08.png differ diff --git a/Lectures/S0-L20/images/Reasoning/img_09.png b/Lectures/S0-L20/images/Reasoning/img_09.png new file mode 100644 index 00000000..66ea30bf Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/img_09.png differ diff --git a/Lectures/S0-L20/images/Reasoning/img_10.png b/Lectures/S0-L20/images/Reasoning/img_10.png new file mode 100644 index 00000000..0c8dd8d9 Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/img_10.png differ diff --git a/Lectures/S0-L20/images/Reasoning/img_11.png b/Lectures/S0-L20/images/Reasoning/img_11.png new file mode 100644 index 00000000..c6cc917a Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/img_11.png differ diff --git a/Lectures/S0-L20/images/Reasoning/img_12.png b/Lectures/S0-L20/images/Reasoning/img_12.png new file mode 100644 index 00000000..b66bdfec Binary files /dev/null and b/Lectures/S0-L20/images/Reasoning/img_12.png differ diff --git a/_contents/.Rhistory b/_contents/.Rhistory new file mode 100644 index 00000000..e69de29b diff --git a/_contents/S0-L20.md b/_contents/S0-L20.md index dff757e9..3de65e6a 100755 --- a/_contents/S0-L20.md +++ b/_contents/S0-L20.md @@ -36,7 +36,7 @@ In this session, our readings cover: # Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review ### Introduction -Models that are built on Large Language Model (LLM) as the backbone are capable of extracting meaningful information that can assist medical diagnosis or creating engaging contents. These models are also referred to as Artificial Intelligence-Generated Content (AIGC). Once the AIGC model is trained, by changing the way we compose the prompts as input to the model, the quality of the models' output can change. In this paper, we focus on techniques of engineering the prompts to achieve higher quality model output from the same AIGC model. +Models that are built on Large Language Model (LLM) as the backbone are capable of extracting meaningful information that can assist medical diagnosis or creating engaging contents. These models are also referred to as Artificial Intelligence-Generated Content (AIGC). Once the AIGC model is trained, by changing the way we compose the prompts as input to the model, the quality of the model's output can change. In this paper, we focus on techniques of engineering the prompts to achieve higher quality model output from the same AIGC model. ### Basic of Prompt Engineering @@ -46,15 +46,15 @@ One basic technique to improve the model output is to **be clear and precise** i -**Few Shot prompting** is also a common prompt engineering technique, where the model is given a few examples with answers in addition to the original question. This relies on the few shot learning ability that is an emergent in large language models, which is can be understood as a form of meta learning. +**Few Shot prompting** is also a common prompt engineering technique, where the model is given a few examples with answers in addition to the original question. This relies on the few shot learning ability that is emergent in large language models, which can be understood as a form of meta learning. -Authors of the paper also note that **adjusting the temperature and top-p** is essential for the prompt engineering. For code generation where standard pattern is valued, a smaller temperature and top-p is preferred, whereas in creative writing, a larger temperature and top-p may help the model produce original responses. +Authors of the paper also note that **adjusting the temperature and top-p** is essential for prompt engineering. For code generation where standard pattern is valued, a smaller temperature and top-p is preferred, whereas in creative writing, a larger temperature and top-p may help the model produce original responses. ### Advanced Prompt Engineering -Chain of Thought prompting induce the model to respond with step by step reasoning, which not only improves the quality of the output, but also shows correct intermediate steps for high stake applications such as medical reasoning. **Zero-shot chain of thought** is a simple yet effective technique, where we only need to include the phrase "Let's think step by step" to the input. **Golden chain of thought** is a technique that utilizes few-shot prompting for chain of thought prompting, by providing ground truth chain of thoughts solutions as examples to the input of the model. Golden chain of thoughts can boost the solve rate from 38% to 83% in the case of GPT-4, but the method is limited by the requirement of ground truth chain of thoughts examples. +Chain of Thought prompting induces the model to respond with step by step reasoning, which not only improves the quality of the output, but also shows correct intermediate steps for high stake applications such as medical reasoning. **Zero-shot chain of thought** is a simple yet effective technique, where we only need to include the phrase "Let's think step by step" to the input. **Golden chain of thought** is a technique that utilizes few-shot prompting for chain of thought prompting, by providing ground truth chain of thoughts solutions as examples to the input of the model. Golden chain of thoughts can boost the solve rate from 38% to 83% in the case of GPT-4, but the method is limited by the requirement of ground truth chain of thoughts examples. **Self-Consistency** is an extension to chain of thought prompting. After chain of thought prompting, by sampling from the language model decoder and choosing the most self-consistent response, Self-Consistency achieves better performance in rigorous reasoning tasks such as doing proofs. @@ -62,13 +62,13 @@ Chain of Thought prompting induce the model to respond with step by step reasoni -**Knowledge Generation** break down the content generation into two step generations: in the first step generation, the model is only prompted to output pertinent information (knowledge) of the original query, then the knowledge is included as prompt in the second step generation. +**Knowledge Generation** breaks down the content generation into two step generations: in the first step generation, the model is only prompted to output pertinent information (knowledge) of the original query, then the knowledge is included as prompt in the second step generation. -**Least-to-most prompting** also take a multi-step generation approach similar to knowledge generation. A given problem is decomposed into numerous sub-problems, and the model will output responses for each sub-problem. These responses will be included in the prompt to help the model answer the original problem. +**Least-to-most prompting** also takes a multi-step generation approach similar to knowledge generation. A given problem is decomposed into numerous sub-problems, and the model will output responses for each sub-problem. These responses will be included in the prompt to help the model answer the original problem. -**Tree of Thoughts reasoning** construct the steps of reasoning in a tree structure. This is particularly helpful when we need to break down a problem into steps, and further break down of each steps into more steps. **Graph of Thoughts** is a generalization of tree of thought structure, where each each contains the relation between each node. Graph of thoughts may be helpful for problems requiring intricate multifaceted resolutions. +**Tree of Thoughts reasoning** constructs the steps of reasoning in a tree structure. This is particularly helpful when we need to break down a problem into steps, and further break down of each steps into more steps. **Graph of Thoughts** is a generalization of tree of thought structure, where each each contains the relation between each node. Graph of thoughts may be helpful for problems requiring intricate multifaceted resolutions. @@ -76,7 +76,7 @@ Chain of Thought prompting induce the model to respond with step by step reasoni **Chain of Verification** corrects a response that may contain false information, by prompting the LLM to ask verification questions for the response. LLM may correct the false information by answering the verification questions. These answers will help LLM to generate a more accurate response for the original query. -In addition to the specific techniques mentioned above, there also exists **Plug-ins** of ChatGPT such as Prompt Enhancer that automatically enhance the prompt for the user. +In addition to the specific techniques mentioned above, there also exist **Plug-ins** of ChatGPT such as Prompt Enhancer that automatically enhance the prompt for the user. @@ -84,19 +84,11 @@ In addition to the specific techniques mentioned above, there also exists **Plug Benchmarking the prompt methods requires evaluating the quality of response from LLM, which can be performed by human or by other metrics. -**Subjective evaluations** requires human evaluators, which has the following pros and cons -Pros: Fluency, Accuracy, Novelty, and Relevance -Cons: Inconsistency Problem, Expensive, Time Consuming +**Subjective evaluations** requires human evaluators, which has the advantage of evaluating fluency, accuracy, novelty, and relevance, and some of its disadvantages are the inconsistency problem, expensive, and time consuming. -**Objective evaluations** relies on metrics to evaluate the response. Some examples includes - - BLEU: BiLingual Evaluation Understudy - - ROUGE: Recall-Oriented Understudy for Gisting Evaluation - - METEOR: Metric for Evaluation of Translation with Explicit ORdering - - BERTScore: BERT Model used for metric +**Objective evaluations** relies on metrics to evaluate the response. Some examples includes BLEU, which is a biLingual evaluation and BERTScore, which relies on a BERT Model for the metric. -Objective evaluations has the following pros and cons -Pros: Automatic Evaluation, Cheap, Quick -Cons: Alignment Problem +Objective evaluations has pros such as automatic evaluation, cheap, quick and cons particularly about the alignment problem. Evaluation results from InstructEval shows that in few shot settings, once the examples are specified, providing additional prompt harms the performance, while in zero shot settings, the expert written prompt improves performance. @@ -116,7 +108,7 @@ Prompt engineering can help **Assessment in teaching and learning**, where tailo - + ### Long context prompting for Claude 2.1 + https://www.anthropic.com/news/claude-2-1-prompting @@ -176,7 +168,7 @@ The authors use parallel point expanding to achieve speed-up than normal decodin For the evaluation, we can assess it from various perspectives. -- **Evaluation Process:** +- **Evaluation Process:** - Present a question and a pair of answers to an LLM judge. @@ -256,7 +248,7 @@ In summary, some strong models have very high-quality answers that are hard to b - Ask the RoBERTa to **classify** if the SoT is suitable for the desired answer. -## SoT-R – Evaluation +## SoT-R – Evaluation Based on the provided figures, we can understand: @@ -294,3 +286,310 @@ Having thoroughly reviewed the paper, we've gained significant insights into the - **Eliciting or improving LLMs’ ability:** - Graph-of-Thoughts + + + +# Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts +## Evolving into Chains of Thought +In the exploration of reasoning and cognitive processes, the paper delves into the intricacies of how thoughts are structured, leading to the conceptualization of reasoning topologies. These topologies provide a framework for understanding the organization and flow of thoughts as individuals tackle various tasks. + +
+ + +This figure presents an evolution of reasoning topologies in language model (LLM) prompting methodologies, showing an increasing complexity in how LLMs process and generate output based on a given input. + +- **Input-Output (IO) prompting**: This is the most basic method where an LLM provides a final reply immediately after receiving the initial prompt from the user, with no intermediate steps in the reasoning process. +- **Chain of Thought (CoT)**: Introduced by Wei et al., this method improves upon IO by incorporating explicit intermediate steps of reasoning, known as "chains of thought," which lead to the final output. +- **Chain-of-Thought with Self-Consistency (CoT-SC)**: Improving upon CoT, CoT-SC introduces several independent reasoning chains originating from the same initial input. The model then selects the best outcome from these final thoughts based on a predefined scoring function. The idea is to utilize the randomness within the LLM to generate multiple possible outcomes. +- **Tree of Thoughts (ToT)**: This method further advances CoT by allowing branches at any point within the chain of thoughts. This branching allows for the exploration of different paths and options during the reasoning process. Each node in the tree represents a partial solution, and based on any given node, the thought generator can create a number of new nodes. Scores are then assigned to these new nodes either by an LLM or human evaluation. The method of extending the tree is determined by the search algorithm used, such as Breadth-First Search (BFS) or Depth-First Search (DFS). +- **Graph of Thoughts (GoT)**: GoT enables complex reasoning dependencies between generated thoughts, allowing for any thought to generate multiple child thoughts and also have multiple parent thoughts, forming an aggregation operation. This method incorporates both branching (where thoughts can generate multiple outcomes) and aggregation (where multiple thoughts can contribute to a single new thought). + +The progression of these topologies indicates a move from linear, single-step reasoning to complex, multi-step, and multi-path reasoning structures, improving the depth and robustness of the reasoning process within LLMs. + +### Thoughts and Reasoning Topologies + +**What is a Thought ?** + +- In CoT, a thought refers to **a statement within a paragraph** that contains a **part of the reasoning process** aimed at **solving the input task**. +- In ToT, in some tasks, such as Game of 24, a thought means **an intermediate or a final solution** to the **initial question**. +- In GoT, a thought contains a **solution of the input task (or of its subtask**). + +Therefore, Paper proposes thought to be "Semantic unit of task resolution, i.e., a step in the process of solving a given task" + +**What is a Reasoning Topology?** + +Authors models thoughts as nodes; edges between nodes correspond to dependencies between these thoughts and a topology can be defined as G =(V,E) + +### Taxonomy of Reasoning Schemes + +**Topology Class** + + + +- This section presents three different classes of topological structures used to represent reasoning steps: Chain, Tree, and Graph. +- **Chain:** Depicted as a linear sequence of nodes connected vertically from an "Input" node at the top to an "Output" node at the bottom, suggesting a step-by-step, sequential reasoning process. +- **Tree:** Shown as a branching structure that starts with a single "Input" node which then divides into multiple pathways, eventually leading to one "Output" node. This illustrates a decision-making process that considers various paths or options before concluding. +- **Graph:** Illustrated as a network of interconnected nodes with one "Input" node and one "Output" node. Unlike the chain or tree, the graph shows multiple connections between the nodes, indicating a complex reasoning process with interdependencies and possible loops. + + + +**Topology Scope**:"Can the topology extend beyond a single prompt?" + + +- **Single-prompt** + + - Describes a structure contained within a single prompt/reply interaction. + + - The visual represents a tree topology where all reasoning nodes are part of one complete exchange, suggesting a condensed reasoning process that occurs in one step. + +- **Multi-prompt** + + - Indicates that one prompt/reply can contain multiple reasoning nodes. + + - The visual here expands the tree topology to show that individual prompts or replies may encompass multiple nodes, which implies a more extensive reasoning process involving several interactions. + +**Topology Representation** + + + +- The question is, "How is the topology structure represented?" indicating a focus on the manner in which the reasoning processes are visually and conceptually depicted. +- **Tree Diagram** + - A tree diagram is shown with a root node labeled "0" at the top, branching out to nodes "1," "2," and "3," which further branch out to nodes "4" through "9". This diagram is a representation of the reasoning structure, likely meant to illustrate the hierarchical and branching nature of thought processes. + +- **Implicit vs. Explicit Representation** + + - On the left, under the heading "Implicit," there is a statement suggesting a less direct method of describing the reasoning process: "The first preliminary solution should be enhanced three times. Each of these three enhanced solutions should be further augmented in two attempts." + + - On the right, under the heading "Explicit," there is a more direct and detailed explanation of the connections between the nodes: "