standardize the blog heading across posts...

qiyanjun · Apr 28, 2024 · 245bace · 245bace
1 parent 9fddfb7
commit 245bace
Show file tree

Hide file tree

Showing 25 changed files with 118 additions and 16 deletions.
diff --git a/_contents/S0-L03.md b/_contents/S0-L03.md
@@ -53,7 +53,12 @@ In this session, our readings cover:
 + https://arxiv.org/abs/2401.07103
 
 
-# Evaluating Large Language Models<a id="evaluating-large-language-models"></a>
+
+
+<!--excerpt.start-->
+
+
+# Blog: Evaluating Large Language Models<a id="evaluating-large-language-models"></a>
 
 ## Section 1: Benchmarking in AI<a id="section-1-benchmarking-in-ai"></a>
 

diff --git a/_contents/S0-L04.md b/_contents/S0-L04.md
@@ -43,7 +43,10 @@ In this session, our readings cover:
 
 <br /><br /><br />
 
-# In this session, our blog covers: 
+<!--excerpt.start-->
+
+
+# Blog:   In this session, our blog covers: 
 
 
 ## Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

diff --git a/_contents/S0-L05.md b/_contents/S0-L05.md
@@ -50,6 +50,10 @@ In this session, our readings cover:
 +  "explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function" |
 
 
+<!--excerpt.start-->
+
+
+# Blog: 
 ## Aligning Language Models with Human Preferences
 
 ### Human Alignment in LLM

diff --git a/_contents/S0-L06.md b/_contents/S0-L06.md
@@ -46,7 +46,10 @@ Language models (LMs) have become ubiquitous in both NLP research and in commerc
 - https://arxiv.org/abs/2101.00027
 - Recent work has demonstrated that increased training dataset diversity improves general cross-domain knowledge and downstream generalization capability for large-scale language models. With this in mind, we present \textit{the Pile}: an 825 GiB English text corpus targeted at training large-scale language models. The Pile is constructed from 22 diverse high-quality subsets -- both existing and newly constructed -- many of which derive from academic or professional sources. Our evaluation of the untuned performance of GPT-2 and GPT-3 on the Pile shows that these models struggle on many of its components, such as academic writing. Conversely, models trained on the Pile improve significantly over both Raw CC and CC-100 on all components of the Pile, while improving performance on downstream evaluations. Through an in-depth exploratory analysis, we document potentially concerning aspects of the data for prospective users. We make publicly available the code used in its construction.
 
-# Section 1: The Pile
+<!--excerpt.start-->
+
+
+# Blog:  Section 1: The Pile
 
 In this section, we are going to introduce a paper: The pile, an open source dataset for diverse text for language modeling.
 

diff --git a/_contents/S0-L07.md b/_contents/S0-L07.md
@@ -68,9 +68,13 @@ In this session, our readings cover:
   +  EU AI Act / GDPR 
 
 
+
 <br /><br /><br />
 
-# AI Risk Framework Blog  
+<!--excerpt.start-->
+
+
+# Blog:  AI Risk Framework Blog  
 
 ## Introduction and Background
 + Large language models have revolutionized natural language understanding and generation.

diff --git a/_contents/S0-L08.md b/_contents/S0-L08.md
@@ -63,6 +63,10 @@ https://aclanthology.org/2023.findings-acl.719/
 
 <br /><br /><br />
 
+<!--excerpt.start-->
+
+
+# Blog: 
 ## In this session, our blog covers papers related to foundation models copyright infringement, founding over five-fold topics.
 1. Foundation Models and Fair Use
 2. Copyright Plug-in Market for The Text-to-Image Copyright Protection

diff --git a/_contents/S0-L09.md b/_contents/S0-L09.md
@@ -43,7 +43,10 @@ The rapid advancement and widespread use of large language models (LLMs) have ra
 
 
 
-# FM Privacy Leakage Issues<a id="fm-privacy-leakage-issues"></a>
+<!--excerpt.start-->
+
+
+# Blog:  FM Privacy Leakage Issues<a id="fm-privacy-leakage-issues"></a>
 
 ## Section 1 Background and Introduction<a id="section-1-background-and-introduction"></a>
 

diff --git a/_contents/S0-L10.md b/_contents/S0-L10.md
@@ -43,7 +43,13 @@ In this session, our readings cover:
   + https://arxiv.org/abs/2308.10149
   + Large language models (LLMs) have shown powerful performance and development prospect and are widely deployed in the real world. However, LLMs can capture social biases from unprocessed training data and propagate the biases to downstream tasks. Unfair LLM systems have undesirable social impacts and potential harms. In this paper, we provide a comprehensive review of related research on fairness in LLMs. First, for medium-scale LLMs, we introduce evaluation metrics and debiasing methods from the perspectives of intrinsic bias and extrinsic bias, respectively. Then, for large-scale LLMs, we introduce recent fairness research, including fairness evaluation, reasons for bias, and debiasing methods. Finally, we discuss and provide insight on the challenges and future directions for the development of fairness in LLMs.
 
-# In this session, our blog covers: 
+
+
+
+<!--excerpt.start-->
+
+
+# Blog:  In this session, our blog covers: 
 ##  Bias and Fairness in Large Language Model
 
 ### 1 &nbsp; &nbsp; Formal Definition of Bias and Fairness (LLM context)

diff --git a/_contents/S0-L11.md b/_contents/S0-L11.md
@@ -49,6 +49,10 @@ In this session, our readings cover:
   + https://arxiv.org/abs/2310.09624
 
 
+<!--excerpt.start-->
+
+
+# Blog: 
 ## HarmBench
 
 ### Background

diff --git a/_contents/S0-L12.md b/_contents/S0-L12.md
@@ -52,7 +52,10 @@ In this session, our readings cover:
 
 <br /><br /><br />
 
-# LLM Multimodal/Multilingual Harm Responses Blog
+<!--excerpt.start-->
+
+
+# Blog:  LLM Multimodal/Multilingual Harm Responses Blog
 
 ## A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion
 Section based on the [paper of the same name](https://ieeexplore.ieee.org/document/10208563)

diff --git a/_contents/S0-L13.md b/_contents/S0-L13.md
@@ -62,7 +62,10 @@ In this session, our readings cover:
   + https://www.csis.org/analysis/managing-existential-risk-ai-without-undercutting-innovation
 
 
+<!--excerpt.start-->
 
+
+# Blog: 
 ## FM Risk
 In this blog, we will cover FM risks of large language model (LLM). In context of LLM, Feature Mimicking (FM) risk refers to the vulnerability of Language Model-based AI systems to adversarial attacks that exploit mimicry of specific features in the input data. It is important to understand and mitigate FM Risk because it ensures the robustness and reliability of Language Models in various applications (e.g., sentiment analysis, content generation, etc,). In this blog post, we present three recent works: $(i)$ On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, $(ii)$ Low-Resource Languages Jailbreak GPT-4, and $(iii)$ A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation.
 

diff --git a/_contents/S0-L14.md b/_contents/S0-L14.md
@@ -48,7 +48,10 @@ In this session, our readings cover:
 - https://arxiv.org/abs/2402.08259
 - Table reasoning, which aims to generate the corresponding answer to the question following the user requirement according to the provided table, and optionally a text description of the table, effectively improving the efficiency of obtaining information. Recently, using Large Language Models (LLMs) has become the mainstream method for table reasoning, because it not only significantly reduces the annotation cost but also exceeds the performance of previous methods. However, existing research still lacks a summary of LLM-based table reasoning works. Due to the existing lack of research, questions about which techniques can improve table reasoning performance in the era of LLMs, why LLMs excel at table reasoning, and how to enhance table reasoning abilities in the future, remain largely unexplored. This gap significantly limits progress in research. To answer the above questions and advance table reasoning research with LLMs, we present this survey to analyze existing research, inspiring future work. In this paper, we analyze the mainstream techniques used to improve table reasoning performance in the LLM era, and the advantages of LLMs compared to pre-LLMs for solving table reasoning. We provide research directions from both the improvement of existing methods and the expansion of practical applications to inspire future research.
 
-# Retrieval-Augmented Generation for AI-Generated Content: A Survey
+<!--excerpt.start-->
+
+
+# Blog:  Retrieval-Augmented Generation for AI-Generated Content: A Survey
 
 ### Motivation and the RAG Process
 Artificial Intelligence Generated Content(AIGC) refers to the texts and code generated by Large Language Model, the images generated by DALL-E and Stable-Diffusion, and video generated by Sora. Besides the recent success of AIGC, it continues to face a number of challenges. For example, it is difficult to maintain up-to-date knowledge for these models, because model training is required in order for the model to generate answers based on new knowledge. In addition, these models suffer from the inability to provide long-tail knowledge, and they are at risk of leaking private training data. Retrieval-Augmented Generation(RAG) serves as a mitigation to these problems, because it has an adaptive data repository. With such data repository, when the new knowledge or long-tail knowledge is included, or when the sensitive private data is encoded, the above challenge can be straightforwardly allievated. 

diff --git a/_contents/S0-L15.md b/_contents/S0-L15.md
@@ -50,7 +50,10 @@ In this session, our readings cover:
 
 
 
-# LLM Hallucination<a id="llm-hallucination"></a>
+<!--excerpt.start-->
+
+
+# Blog:  LLM Hallucination<a id="llm-hallucination"></a>
 
 
 

diff --git a/_contents/S0-L16.md b/_contents/S0-L16.md
@@ -77,7 +77,10 @@ In this session, our readings cover:
 
 
 
-# In this session, our blog covers: 
+<!--excerpt.start-->
+
+
+# Blog:  In this session, our blog covers: 
 ## Large Language Models for Software Engineering: A Systematic Literature Review
 
 ### 1 &nbsp; &nbsp; Overview

diff --git a/_contents/S0-L17.md b/_contents/S0-L17.md
@@ -44,6 +44,12 @@ Comments: EMNLP 2023. Updated with new experiments
   + Alessandro Achille, Michael Kearns, Carson Klingenberg, Stefano Soatto
 Responsible use of data is an indispensable part of any machine learning (ML) implementation. ML developers must carefully collect and curate their datasets, and document their provenance. They must also make sure to respect intellectual property rights, preserve individual privacy, and use data in an ethical way. Over the past few years, ML models have significantly increased in size and complexity. These models require a very large amount of data and compute capacity to train, to the extent that any defects in the training corpus cannot be trivially remedied by retraining the model from scratch. Despite sophisticated controls on training data and a significant amount of effort dedicated to ensuring that training corpora are properly composed, the sheer volume of data required for the models makes it challenging to manually inspect each datum comprising a training corpus. One potential fix for training corpus data defects is model disgorgement -- the elimination of not just the improperly used data, but also the effects of improperly used data on any component of an ML model. Model disgorgement techniques can be used to address a wide range of issues, such as reducing bias or toxicity, increasing fidelity, and ensuring responsible usage of intellectual property. In this paper, we introduce a taxonomy of possible disgorgement methods that are applicable to modern ML systems. In particular, we investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
 
+
+
+<!--excerpt.start-->
+
+
+# Blog: 
 ### Outline
 <img src="{{ site.baseurl }}/Lectures/S0-L17/Slide2.PNG" width="80%" height="80%">
 

diff --git a/_contents/S0-L18.md b/_contents/S0-L18.md
@@ -67,7 +67,10 @@ Mechanistic interpretability takes a bottom-up approach to understanding ML mode
 + https://openai.com/research/language-models-can-explain-neurons-in-language-models
 + Language models have become more capable and more widely deployed, but we do not understand how they work. Recent work has made progress on understanding a small number of circuits and narrow behaviors,[1][2]  but to fully understand a language model, we'll need to analyze millions of neurons. This paper applies automation to the problem of scaling an interpretability technique to all the neurons in a large language model. Our hope is that building on this approach of automating interpretability [3][4][5]  will enable us to comprehensively audit the safety of models before deployment.
 
-# Session Blog
+<!--excerpt.start-->
+
+
+# Blog:  Session Blog
 ## Rethinking Interpretability in the Era of Large Language Models
 Section based on the paper [Rethinking Interpretability in the Era of Large Language Models](https://arxiv.org/abs/2402.01761)
 + In traditional ML interpretability,

diff --git a/_contents/S0-L19.md b/_contents/S0-L19.md
@@ -51,7 +51,10 @@ In this session, our readings cover:
 + Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples in the training data, including complex queries that range from planning trip itineraries to speculating about alternate history. Moreover, the model tends to generalize well to unseen tasks that did not appear in the training data. In a controlled human study, responses from LIMA are either equivalent or strictly preferred to GPT-4 in 43% of cases; this statistic is as high as 58% when compared to Bard and 65% versus DaVinci003, which was trained with human feedback. Taken together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output.
 
 
-# Blog Start 
+<!--excerpt.start-->
+
+
+# Blog:  Blog Start 
 
 ### Paper 1: Efficient Large Language Models: A Survey 
 

diff --git a/_contents/S0-L20.md b/_contents/S0-L20.md
@@ -33,7 +33,10 @@ In this session, our readings cover:
 + The field of natural language processing (NLP) has witnessed significant progress in recent years, with a notable focus on improving large language models' (LLM) performance through innovative prompting techniques. Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the overall LLM reasoning is guided by a structure such as a graph. As illustrated with numerous examples, this paradigm significantly enhances the LLM's capability to solve numerous tasks, ranging from logical or mathematical reasoning to planning or creative writing. To facilitate the understanding of this growing field and pave the way for future developments, we devise a general blueprint for effective and efficient LLM reasoning schemes. For this, we conduct an in-depth analysis of the prompt execution pipeline, clarifying and clearly defining different concepts. We then build the first taxonomy of structure-enhanced LLM reasoning schemes. We focus on identifying fundamental classes of harnessed structures, and we analyze the representations of these structures, algorithms executed with these structures, and many others. We refer to these structures as reasoning topologies, because their representation becomes to a degree spatial, as they are contained within the LLM context. Our study compares existing prompting schemes using the proposed taxonomy, discussing how certain design choices lead to different patterns in performance and cost. We also outline theoretical underpinnings, relationships between prompting and others parts of the LLM ecosystem such as knowledge bases, and the associated research challenges. Our work will help to advance future prompt engineering techniques.
 
 
-# Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review
+<!--excerpt.start-->
+
+
+# Blog:  Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review
 
 ### Introduction
 Models that are built on Large Language Model (LLM) as the backbone are capable of extracting meaningful information that can assist medical diagnosis or creating engaging contents. These models are also referred to as Artificial Intelligence-Generated Content (AIGC). Once the AIGC model is trained, by changing the way we compose the prompts as input to the model, the quality of the model's output can change. In this paper, we focus on techniques of engineering the prompts to achieve higher quality model output from the same AIGC model.

diff --git a/_contents/S0-L21.md b/_contents/S0-L21.md
@@ -70,7 +70,10 @@ Comments:	ACL 2023 Findings, 15 pages
 + Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models on benchmarks like BigBench Hard and AGIEval. In Orca 2, we continue exploring how improved training signals can enhance smaller LMs' reasoning abilities. Research on training small LMs has often relied on imitation learning to replicate the output of more capable models. We contend that excessive emphasis on imitation may restrict the potential of smaller models. We seek to teach small LMs to employ different solution strategies for different tasks, potentially different from the one used by the larger model. For example, while larger models might provide a direct answer to a complex task, smaller models may not have the same capacity. In Orca 2, we teach the model various reasoning techniques (step-by-step, recall then generate, recall-reason-generate, direct answer, etc.). More crucially, we aim to help the model learn to determine the most effective solution strategy for each task. We evaluate Orca 2 using a comprehensive set of 15 diverse benchmarks (corresponding to approximately 100 tasks and over 36,000 unique prompts). Orca 2 significantly surpasses models of similar size and attains performance levels similar or better to those of models 5-10x larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings. make Orca 2 weights publicly available at this http URL to support research on the development, evaluation, and alignment of smaller LMs
 
 
-# Self-Exam LLM and Reasoning<a id="self-exam-llm-and-reasoning"></a>
+<!--excerpt.start-->
+
+
+# Blog:  Self-Exam LLM and Reasoning<a id="self-exam-llm-and-reasoning"></a>
 
 ## Self-Consistency Improves Chain of Thought Reasoning in Language Models<a id="self-consistency-improves-chain-of-thought-reasoning-in-language-models"></a>
 

diff --git a/_contents/S0-L22.md b/_contents/S0-L22.md
@@ -52,7 +52,10 @@ categories:
 +  https://huggingface.co/blog/dialog-agents
 
 
-# In this session, our blog covers: 
+<!--excerpt.start-->
+
+
+# Blog:  In this session, our blog covers: 
 
 ## Position Paper: Agent AI Towards a Holistic Intelligence 
 ### 1 &nbsp; &nbsp; Introduction

diff --git a/_contents/S0-L23.md b/_contents/S0-L23.md
@@ -53,6 +53,11 @@ In this session, our readings cover:
 + Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, Jitendra Malik
 + We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories. To account for the multi-modal nature of the data, we perform prediction in a modality-aligned way, and for each input token predict the next token from the same modality. This general formulation enables us to leverage data with missing modalities, like video trajectories without actions. We train our model on a collection of simulated trajectories coming from prior neural network policies, model-based controllers, motion capture data, and YouTube videos of humans. We show that our model enables a full-sized humanoid to walk in San Francisco zero-shot. Our model can transfer to the real world even when trained on only 27 hours of walking data, and can generalize to commands not seen during training like walking backward. These findings suggest a promising path toward learning challenging real-world control tasks by generative modeling of sensorimotor trajectories.
 
+
+<!--excerpt.start-->
+
+
+# Blog: 
 ### Outline
 <img src="{{ site.baseurl }}/Lectures/S0-L23/images/Slide3.png" width="80%" height="80%">