Skip to content

Commit

Permalink
adding more readings
Browse files Browse the repository at this point in the history
  • Loading branch information
qiyanjun committed Feb 6, 2024
1 parent bdc1726 commit 06463c0
Show file tree
Hide file tree
Showing 8 changed files with 83 additions and 41 deletions.
24 changes: 15 additions & 9 deletions _contents/S0-L06.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: post
title: Open Source LLM - Mistral Data preparation
lecture: S0-Intro
lectureVersion: next
lectureVersion: current
extraContent:
notes: team-4
video: team-6
Expand All @@ -20,19 +20,25 @@ In this session, our readings cover:


## More Readings:
### - Llama 2: Open Foundation and Fine-Tuned Chat Models
+ In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

### The Pile: An 800GB Dataset of Diverse Text for Language Modeling
+ https://arxiv.org/abs/2101.00027
+ Recent work has demonstrated that increased training dataset diversity improves general cross-domain knowledge and downstream generalization capability for large-scale language models. With this in mind, we present \textit{the Pile}: an 825 GiB English text corpus targeted at training large-scale language models. The Pile is constructed from 22 diverse high-quality subsets -- both existing and newly constructed -- many of which derive from academic or professional sources. Our evaluation of the untuned performance of GPT-2 and GPT-3 on the Pile shows that these models struggle on many of its components, such as academic writing. Conversely, models trained on the Pile improve significantly over both Raw CC and CC-100 on all components of the Pile, while improving performance on downstream evaluations. Through an in-depth exploratory analysis, we document potentially concerning aspects of the data for prospective users. We make publicly available the code used in its construction.



### OLMo: Accelerating the Science of Language Models
+ https://arxiv.org/abs/2402.00838

Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs. To this end, this technical report details the first release of OLMo, a state-of-the-art, truly Open Language Model and its framework to build and study the science of language modeling. Unlike most prior efforts that have only released model weights and inference code, we release OLMo and the whole framework, including training data and training and evaluation code. We hope this release will empower and strengthen the open research community and inspire a new wave of innovation.

### Mixtral of Experts
+ https://arxiv.org/abs/2401.04088
+ We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks. We also provide a model fine-tuned to follow instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks. Both the base and instruct models are released under the Apache 2.0 license.




### - Llama 2: Open Foundation and Fine-Tuned Chat Models
+ In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

### The Pile: An 800GB Dataset of Diverse Text for Language Modeling
+ https://arxiv.org/abs/2101.00027
+ Recent work has demonstrated that increased training dataset diversity improves general cross-domain knowledge and downstream generalization capability for large-scale language models. With this in mind, we present \textit{the Pile}: an 825 GiB English text corpus targeted at training large-scale language models. The Pile is constructed from 22 diverse high-quality subsets -- both existing and newly constructed -- many of which derive from academic or professional sources. Our evaluation of the untuned performance of GPT-2 and GPT-3 on the Pile shows that these models struggle on many of its components, such as academic writing. Conversely, models trained on the Pile improve significantly over both Raw CC and CC-100 on all components of the Pile, while improving performance on downstream evaluations. Through an in-depth exploratory analysis, we document potentially concerning aspects of the data for prospective users. We make publicly available the code used in its construction.



16 changes: 10 additions & 6 deletions _contents/S0-L07.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,17 +23,23 @@ In this session, our readings cover:

## More Readings:

### Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition
+ https://arxiv.org/abs/2311.16119
+ Large Language Models (LLMs) are deployed in interactive contexts with direct user engagement, such as chatbots and writing assistants. These deployments are vulnerable to prompt injection and jailbreaking (collectively, prompt hacking), in which models are manipulated to ignore their original instructions and follow potentially malicious ones. Although widely acknowledged as a significant security threat, there is a dearth of large-scale resources and quantitative studies on prompt hacking. To address this lacuna, we launch a global prompt hacking competition, which allows for free-form human input attacks. We elicit 600K+ adversarial prompts against three state-of-the-art LLMs. We describe the dataset, which empirically verifies that current LLMs can indeed be manipulated via prompt hacking. We also present a comprehensive taxonomical ontology of the types of adversarial prompts.


### Even More:

### ACL 2024 Tutorial: Vulnerabilities of Large Language Models to Adversarial Attacks

### https://llm-vulnerability.github.io/
+ https://llm-vulnerability.github.io/


### Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration
+ https://www.tandfonline.com/doi/full/10.1080/15228053.2023.2233814
+


### https://huggingface.co/blog?tag=ethics
+ https://huggingface.co/blog?tag=ethics
+ https://huggingface.co/blog/ethics-diffusers
+ https://huggingface.co/blog/model-cards
+ https://huggingface.co/blog/us-national-ai-research-resource
Expand All @@ -43,9 +49,7 @@ In this session, our readings cover:
+ https://www.nist.gov/itl/ai-risk-management-framework
+ https://airc.nist.gov/AI_RMF_Knowledge_Base/Playbook
+ https://airc.nist.gov/AI_RMF_Knowledge_Base/Roadmap
+

### EU AI Act / GDPR
+ EU AI Act / GDPR



14 changes: 12 additions & 2 deletions _contents/S0-L08.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,19 @@ In this session, our readings cover:

## More Readings:

### Copyright Plug-in Market for The Text-to-Image Copyright Protection
+ https://openreview.net/forum?id=pSf8rrn49H

### Audio Deepfake Detection: A Survey
+ https://arxiv.org/abs/2308.14970
+ Audio deepfake detection is an emerging active topic. A growing number of literatures have aimed to study deepfake detection algorithms and achieved effective performance, the problem of which is far from being solved. Although there are some review literatures, there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences across various types of deepfake audio, then outline and analyse competitions, datasets, features, classifications, and evaluation of state-of-the-art approaches. For each aspect, the basic techniques, advanced developments and major challenges are discussed. In addition, we perform a unified comparison of representative features and classifiers on ASVspoof 2021, ADD 2023 and In-the-Wild datasets for audio deepfake detection, respectively. The survey shows that future research should address the lack of large scale datasets in the wild, poor generalization of existing detection methods to unknown fake attacks, as well as interpretability of detection results.

### Membership Inference Attacks against Language Models via Neighbourhood Comparison
https://aclanthology.org/2023.findings-acl.719/

### Copyright Plug-in Market for The Text-to-Image Copyright Protection
https://openreview.net/forum?id=pSf8rrn49H

### Deepfake Taylor Swift event:
+ https://www.cbsnews.com/news/taylor-swift-artificial-intellignence-ai-4chan/



16 changes: 11 additions & 5 deletions _contents/S0-L09.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,20 +14,26 @@ In this session, our readings cover:

## Required Readings:

### Are Large Pre-Trained Language Models Leaking Your Personal Information?
+ https://arxiv.org/abs/2205.12628
+ Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang
Are Large Pre-Trained Language Models Leaking Your Personal Information? In this paper, we analyze whether Pre-Trained Language Models (PLMs) are prone to leaking personal information. Specifically, we query PLMs for email addresses with contexts of the email address or prompts containing the owner's name. We find that PLMs do leak personal information due to memorization. However, since the models are weak at association, the risk of specific personal information being extracted by attackers is low. We hope this work could help the community to better understand the privacy risk of PLMs and bring new insights to make PLMs safe.

### Privacy Risks of General-Purpose Language Models
+ https://ieeexplore.ieee.org/abstract/document/9152761

+ We find the text embeddings from general-purpose language models would capture much sensitive information from the plain text. Once being accessed by the adversary, the embeddings can be reverse-engineered to disclose sensitive information of the victims for further harassment. Although such a privacy risk can impose a real threat to the future leverage of these promising NLP tools, there are neither published attacks nor systematic evaluations by far for the mainstream industry-level language models. To bridge this gap, we present the first systematic study on the privacy risks of 8 state-of-the-art language models with 4 diverse case studies. By constructing 2 novel attack classes, our study demonstrates the aforementioned privacy risks do exist and can impose practical threats to the application of general-purpose language models on sensitive data covering identity, genome, healthcare and location. For example, we show the adversary with nearly no prior knowledge can achieve about 75% accuracy when inferring the precise disease site from Bert embeddings of patients’ medical descriptions. As possible countermeasures, we propose 4 different defenses (via rounding, different...

## More Readings:

### Privacy in Large Language Models: Attacks, Defenses and Future Directions
+ https://arxiv.org/abs/2310.10383
+ The advancement of large language models (LLMs) has significantly enhanced the ability to effectively tackle various downstream NLP tasks and unify these tasks into generative pipelines. On the one hand, powerful language models, trained on massive textual data, have brought unparalleled accessibility and usability for both models and users. On the other hand, unrestricted access to these models can also introduce potential malicious and unintentional privacy risks. Despite ongoing efforts to address the safety and privacy concerns associated with LLMs, the problem remains unresolved. In this paper, we provide a comprehensive analysis of the current privacy attacks targeting LLMs and categorize them according to the adversary's assumed capabilities to shed light on the potential vulnerabilities present in LLMs. Then, we present a detailed overview of prominent defense strategies that have been developed to counter these privacy attacks. Beyond existing works, we identify upcoming privacy concerns as LLMs evolve. Lastly, we point out several potential avenues for future exploration.

### ProPILE: Probing Privacy Leakage in Large Language Models
+ https://arxiv.org/abs/2307.01881
+ Siwon Kim, Sangdoo Yun, Hwaran Lee, Martin Gubri, Sungroh Yoon, Seong Joon Oh
The rapid advancement and widespread use of large language models (LLMs) have raised significant concerns regarding the potential leakage of personally identifiable information (PII). These models are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage in LLM-based services. ProPILE lets data subjects formulate prompts based on their own PII to evaluate the level of privacy intrusion in LLMs. We demonstrate its application on the OPT-1.3B model trained on the publicly available Pile dataset. We show how hypothetical data subjects may assess the likelihood of their PII being included in the Pile dataset being revealed. ProPILE can also be leveraged by LLM service providers to effectively evaluate their own levels of PII leakage with more powerful prompts specifically tuned for their in-house models. This tool represents a pioneering step towards empowering the data subjects for their awareness and control over their own data on the web.

### Are Large Pre-Trained Language Models Leaking Your Personal Information?
+ https://arxiv.org/abs/2205.12628
+ Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang
Are Large Pre-Trained Language Models Leaking Your Personal Information? In this paper, we analyze whether Pre-Trained Language Models (PLMs) are prone to leaking personal information. Specifically, we query PLMs for email addresses with contexts of the email address or prompts containing the owner's name. We find that PLMs do leak personal information due to memorization. However, since the models are weak at association, the risk of specific personal information being extracted by attackers is low. We hope this work could help the community to better understand the privacy risk of PLMs and bring new insights to make PLMs safe.



16 changes: 13 additions & 3 deletions _contents/S0-L11.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,14 @@ In this session, our readings cover:
## Required Readings:


### A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
+ https://arxiv.org/abs/2305.11391
+ https://huggingface.co/blog/red-teaming
### Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
+ https://arxiv.org/abs/2310.03693


### Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
+ https://www.anthropic.com/news/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training
+ Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept examples of deceptive behavior in large language models (LLMs). For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it). The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning about deceiving the training process, with the persistence remaining even when the chain-of-thought is distilled away. Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.

## More Readings:

### SafeText: A Benchmark for Exploring Physical Safety in Language Models
Expand All @@ -29,6 +32,12 @@ In this session, our readings cover:
### ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation / EMNLP2023



### A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
+ https://arxiv.org/abs/2305.11391
+ https://huggingface.co/blog/red-teaming


### Lessons learned on language model safety and misuse
+ https://openai.com/research/language-model-safety-and-misuse

Expand All @@ -37,6 +46,7 @@ In this session, our readings cover:




### Tracing Model Outputs to the Training Data
+ https://www.anthropic.com/news/influence-functions

Expand Down
Loading

0 comments on commit 06463c0

Please sign in to comment.