Slides: ./lecture_chatgpt.pdf
Videos (russian): lecture and optional lecture on task-driven chatbots
Videos (english):
- Hugging Face tutorial on RLHF - https://www.youtube.com/watch?v=2MBJOuVq380
- Optional lecture on conversation systems
Practice assignment: ./practice.ipynb ,
Extra materials (model architecture):
- https://github.com/CarperAI/trlx - an alternative to trl designed for larger models
- A more detailed explanation of the reinforcement learning algorithms used in RLHF: part 1 and part 2
- Antropic's take on aligning LLMs - Constitutional AI
- Earlier works on reinforcement learning for natural language generation:
- task-oriented conversation system
- generating dialogues
- sequential adversarial networks (a.k.a. SeqGAN)
- A large overview for machine translation (touching on RL, including RL failures) - arxiv
- as usual, there are dozens of links in the lecture slides (top of this readme)