Skip to content

Commit

Permalink
deploy: 18e6c54
Browse files Browse the repository at this point in the history
  • Loading branch information
MikeySaw committed May 13, 2024
1 parent fb0ad88 commit c6c7b9f
Show file tree
Hide file tree
Showing 6 changed files with 3 additions and 115 deletions.
12 changes: 0 additions & 12 deletions chapters/09_rlhf/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -86,18 +86,6 @@ <h3 id="additional-resources">Additional Resources</h3>
<div class="chapter_overview">
<ul class="list-unstyled">


<li>
<a class="title" href="/dl4nlp/chapters/09_rlhf/rlhf/">Chapter 9.1: RLHF</a>


<p>Here we cover the basics of RLHF and its related application.
</p>


</li>


</ul>
</div>

Expand Down
2 changes: 1 addition & 1 deletion chapters/09_rlhf/index.xml
Original file line number Diff line number Diff line change
@@ -1 +1 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Chapter 9: Reinforcement Learning from Human Feedback (RLHF) on Deep Learning for Natural Language Processing (DL4NLP)</title><link>https://slds-lmu.github.io/dl4nlp/chapters/09_rlhf/</link><description>Recent content in Chapter 9: Reinforcement Learning from Human Feedback (RLHF) on Deep Learning for Natural Language Processing (DL4NLP)</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://slds-lmu.github.io/dl4nlp/chapters/09_rlhf/index.xml" rel="self" type="application/rss+xml"/><item><title>Chapter 9.1: RLHF</title><link>https://slds-lmu.github.io/dl4nlp/chapters/09_rlhf/rlhf/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/09_rlhf/rlhf/</guid><description>&lt;p>Here we cover the basics of RLHF and its related application.&lt;/p></description></item></channel></rss>
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Chapter 9: Reinforcement Learning from Human Feedback (RLHF) on Deep Learning for Natural Language Processing (DL4NLP)</title><link>https://slds-lmu.github.io/dl4nlp/chapters/09_rlhf/</link><description>Recent content in Chapter 9: Reinforcement Learning from Human Feedback (RLHF) on Deep Learning for Natural Language Processing (DL4NLP)</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://slds-lmu.github.io/dl4nlp/chapters/09_rlhf/index.xml" rel="self" type="application/rss+xml"/></channel></rss>
94 changes: 0 additions & 94 deletions chapters/09_rlhf/rlhf/index.html

This file was deleted.

6 changes: 0 additions & 6 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -232,12 +232,6 @@ <h1>Deep Learning for NLP (DL4NLP)</h1>

<li><a class="title" href="/dl4nlp/chapters/09_rlhf/">Chapter 9: Reinforcement Learning from Human Feedback (RLHF)</a>

<ul>

<li><a class="title" href="/dl4nlp/chapters/09_rlhf/rlhf/">Chapter 9.1: RLHF</a></li>

</ul>

</li>

</ul>
Expand Down
2 changes: 1 addition & 1 deletion index.xml
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@ For XLNet, the basic idea is to overcome the limitations of unidirectional and b
This approach addresses shortcomings of BERT&amp;rsquo;s original design, where different tasks required different output layers and training objectives, leading to a complex multitask learning setup. By unifying tasks under a single text-to-text framework, models can be trained more efficiently and generalize better across diverse tasks and domains.&lt;/p></description></item><item><title>Chapter 06.03: Text-to-Text Transfer Transformer</title><link>https://slds-lmu.github.io/dl4nlp/chapters/06_post_bert_t5/06_03_t5/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/06_post_bert_t5/06_03_t5/</guid><description>&lt;p>T5 (Text-To-Text Transfer Transformer) [1] aims to unify various natural language processing tasks by framing them all as text-to-text transformations, simplifying model architectures and enabling flexible training across diverse tasks.
It achieves this by formulating input-output pairs for different tasks as text sequences, allowing the model to learn to generate target text from source text regardless of the specific task, facilitating multitask learning and transfer learning across tasks with a single, unified architecture.&lt;/p></description></item><item><title>Chapter 07.01: GPT-1 (2018)</title><link>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_01_gpt/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_01_gpt/</guid><description>&lt;p>GPT-1 [1] introduces a novel approach to natural language processing by employing a generative transformer architecture pre-trained on a vast corpus of text data, where task-specific input transformations are performed to adapt the model to different tasks.
By fine-tuning the model on task-specific data with minimal changes to the architecture, GPT-1 demonstrates the effectiveness of transfer learning and showcases the potential of generative transformers in a wide range of natural language understanding and generation tasks.&lt;/p></description></item><item><title>Chapter 07.02: GPT-2 (2019)</title><link>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_02_gpt2/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_02_gpt2/</guid><description>&lt;p>GPT-2 [1] builds upon its predecessor with a larger model size, more training data, and improved architecture. Like GPT-1, GPT-2 utilizes a generative transformer architecture but features a significantly increased number of parameters, leading to enhanced performance in language understanding and generation tasks. Additionally, GPT-2 introduces a scaled-up version of the training data and fine-tuning techniques to further refine its language capabilities.&lt;/p></description></item><item><title>Chapter 07.03: GPT-3 (2020) &amp; X-shot learning</title><link>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_03_gpt3xshot/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_03_gpt3xshot/</guid><description>&lt;p>In this chapter, we&amp;rsquo;ll explore GPT-3 [1]. GPT-3 builds on the successes of its predecessors, boasting a massive architecture and extensive pre-training on diverse text data. Unlike previous models, GPT-3 introduces a few-shot learning approach, allowing it to perform tasks with minimal task-specific training data. With its remarkable scale and versatility, GPT-3 represents a significant advancement in natural language processing, showcasing the potential of large-scale transformer architectures in various applications.&lt;/p></description></item><item><title>Chapter 07.04: Tasks &amp; Performance</title><link>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_04_tasks/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_04_tasks/</guid><description>&lt;p>GPT-3 has X-shot abilities, meaning it is able to perform tasks with minimal or even no task-specific training data. This chapter provides an overview over various different tasks and illustrates the X-shot capabilities of GPT-3. Additionally you will be introduced to relevant benchmarks.&lt;/p></description></item><item><title>Chapter 07.05: Discussion: Ethics and Cost</title><link>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_05_discussion/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_05_discussion/</guid><description>&lt;p>In discussing GPT-3&amp;rsquo;s ethical implications, it is crucial to consider its potential societal impact, including issues surrounding bias, misinformation, and data privacy. With its vast language generation capabilities, GPT-3 has the potential to disseminate misinformation at scale, posing risks to public trust and safety. Additionally, the model&amp;rsquo;s reliance on large-scale pretraining data raises concerns about reinforcing existing biases present in the data, perpetuating societal inequalities. Furthermore, the use of GPT-3 in sensitive applications such as content generation, automated customer service, and decision-making systems raises questions about accountability, transparency, and unintended consequences. As such, responsible deployment of GPT-3 requires careful consideration of ethical guidelines, regulatory frameworks, and robust mitigation strategies to address these challenges and ensure the model&amp;rsquo;s ethical use in society.&lt;/p></description></item><item><title>Chapter 08.01: Instruction Fine-Tuning</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_llm/08_01_instruction_tuning/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_llm/08_01_instruction_tuning/</guid><description>&lt;p>Instruction fine-tuning aims to enhance the adaptability of large language models (LLMs) by providing explicit instructions or task descriptions, enabling more precise control over model behavior and adaptation to diverse contexts.
This approach involves fine-tuning LLMs on task-specific instructions or prompts, guiding the model to generate outputs that align with the given instructions. By conditioning the model on explicit instructions, instruction fine-tuning facilitates more accurate and tailored responses, making LLMs more versatile and effective in various applications such as language translation, text summarization, and question answering.&lt;/p></description></item><item><title>Chapter 08.02: Chain-of-thought Prompting</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_llm/08_02_cot/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_llm/08_02_cot/</guid><description>&lt;p>Chain of thought (CoT) prompting [1] is a prompting method that encourage Large Language Models (LLMs) to explain their reasoning. This method contrasts with standard prompting by not only seeking an answer but also requiring the model to explain its steps to arrive at that answer. By guiding the model through a logical chain of thought, chain of thought prompting encourages the generation of more structured and cohesive text, enabling LLMs to produce more accurate and informative outputs across various tasks and domains.&lt;/p></description></item><item><title>Chapter 08.03: Emergent Abilities</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_llm/08_03_emerging/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_llm/08_03_emerging/</guid><description>&lt;p>Various researchers have reported that LLMs seem to have emergent abilities. These are sudden appearances of new abilities when Large Language Models (LLMs) are scaled up. In this section we introduce the concept of emergent abilities and discuss a potential counter argument for the concept of emergence.&lt;/p></description></item><item><title>Chapter 9.1: RLHF</title><link>https://slds-lmu.github.io/dl4nlp/chapters/09_rlhf/rlhf/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/09_rlhf/rlhf/</guid><description>&lt;p>Here we cover the basics of RLHF and its related application.&lt;/p></description></item><item><title/><link>https://slds-lmu.github.io/dl4nlp/exercises/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/exercises/</guid><description>Exercises Exercise Chapter 1 Exercise Chapter 2 Exercise Chapter 3 Exercise Chapter 4 Exercise Chapter 5 Exercise Chapter 6 Exercise Chapter 7 Exercise Chapter 8 Exercise Chapter 9 Exercise Chapter 10</description></item><item><title/><link>https://slds-lmu.github.io/dl4nlp/references/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/references/</guid><description>References Your markdown comes here!</description></item><item><title>Cheat Sheets</title><link>https://slds-lmu.github.io/dl4nlp/appendix/01_cheat_sheets/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/appendix/01_cheat_sheets/</guid><description>possible coming in the future ..</description></item><item><title>Errata</title><link>https://slds-lmu.github.io/dl4nlp/appendix/02_errata/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/appendix/02_errata/</guid><description>Errata in the slides shown in the videos to be added once videos + updated slides thereafter are available 😉</description></item><item><title>Related Courses</title><link>https://slds-lmu.github.io/dl4nlp/appendix/03_related/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/appendix/03_related/</guid><description>Other ML courses Introduction to Machine Learning (I2ML) Introduction to Deep Learning (I2DL)</description></item></channel></rss>
This approach involves fine-tuning LLMs on task-specific instructions or prompts, guiding the model to generate outputs that align with the given instructions. By conditioning the model on explicit instructions, instruction fine-tuning facilitates more accurate and tailored responses, making LLMs more versatile and effective in various applications such as language translation, text summarization, and question answering.&lt;/p></description></item><item><title>Chapter 08.02: Chain-of-thought Prompting</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_llm/08_02_cot/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_llm/08_02_cot/</guid><description>&lt;p>Chain of thought (CoT) prompting [1] is a prompting method that encourage Large Language Models (LLMs) to explain their reasoning. This method contrasts with standard prompting by not only seeking an answer but also requiring the model to explain its steps to arrive at that answer. By guiding the model through a logical chain of thought, chain of thought prompting encourages the generation of more structured and cohesive text, enabling LLMs to produce more accurate and informative outputs across various tasks and domains.&lt;/p></description></item><item><title>Chapter 08.03: Emergent Abilities</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_llm/08_03_emerging/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_llm/08_03_emerging/</guid><description>&lt;p>Various researchers have reported that LLMs seem to have emergent abilities. These are sudden appearances of new abilities when Large Language Models (LLMs) are scaled up. In this section we introduce the concept of emergent abilities and discuss a potential counter argument for the concept of emergence.&lt;/p></description></item><item><title/><link>https://slds-lmu.github.io/dl4nlp/exercises/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/exercises/</guid><description>Exercises Exercise Chapter 1 Exercise Chapter 2 Exercise Chapter 3 Exercise Chapter 4 Exercise Chapter 5 Exercise Chapter 6 Exercise Chapter 7 Exercise Chapter 8 Exercise Chapter 9 Exercise Chapter 10</description></item><item><title/><link>https://slds-lmu.github.io/dl4nlp/references/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/references/</guid><description>References Your markdown comes here!</description></item><item><title>Cheat Sheets</title><link>https://slds-lmu.github.io/dl4nlp/appendix/01_cheat_sheets/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/appendix/01_cheat_sheets/</guid><description>possible coming in the future ..</description></item><item><title>Errata</title><link>https://slds-lmu.github.io/dl4nlp/appendix/02_errata/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/appendix/02_errata/</guid><description>Errata in the slides shown in the videos to be added once videos + updated slides thereafter are available 😉</description></item><item><title>Related Courses</title><link>https://slds-lmu.github.io/dl4nlp/appendix/03_related/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/appendix/03_related/</guid><description>Other ML courses Introduction to Machine Learning (I2ML) Introduction to Deep Learning (I2DL)</description></item></channel></rss>
Loading

0 comments on commit c6c7b9f

Please sign in to comment.