Skip to content

Commit

Permalink
new chapter
Browse files Browse the repository at this point in the history
Language Models at Inference Time in Syllabus
  • Loading branch information
Madjakul committed Oct 30, 2024
1 parent 02bed52 commit 1807947
Show file tree
Hide file tree
Showing 13 changed files with 37 additions and 930 deletions.
26 changes: 18 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,55 +3,65 @@
![Banner](static/github_anlp_banner.png)

## Sessions

1. Recap on Deep Learning & basic NLP ([slides](https://github.com/NathanGodey/AdvancedNLP/raw/main/slides/pdf/course1_recap.pdf) / [lab session](https://colab.research.google.com/drive/1_QzQBdP289benS8Uo3yPQmtXoM-f80-n?usp=sharing))
2. Tokenization ([slides](https://github.com/NathanGodey/AdvancedNLP/raw/main/slides/pdf/course2_tokenization.pdf) / [lab session](https://colab.research.google.com/drive/1xEKz_1LcnkfcEenukIGCrk-Nf_5Hb19s?usp=sharing))
3. Language Modeling ([slides](https://github.com/NathanGodey/AdvancedNLP/raw/main/slides/pdf/course3_lm.pdf) / lab session)
4. NLP without 2048 GPUs ([slides](https://github.com/NathanGodey/AdvancedNLP/raw/main/slides/pdf/course4_efficiency.pdf) / lab session)
5. Handling the Risks of Language Models ([slides](https://github.com/NathanGodey/AdvancedNLP/raw/main/slides/pdf/course5_risks.pdf) / lab session)
6. Advanced NLP tasks ([slides](https://github.com/NathanGodey/AdvancedNLP/raw/main/slides/pdf/course6_advanced.pdf) / lab session)
7. Domain-specific NLP ([slides](https://github.com/NathanGodey/AdvancedNLP/raw/main/slides/pdf/course7_specific.pdf) / lab session)
8. Multilingual NLP ([slides](https://github.com/NathanGodey/AdvancedNLP/raw/main/slides/Course%205%20-%20Multilingual%20NLP.pdf) / lab session)
9. Multimodal NLP ([slides](https://docs.google.com/presentation/d/1K2DgnPSOGXB1hQ4FZoUU-5ppJ4dn_sLC41Ecwmxi2Zk/edit?usp=sharing) / lab session)
5. Language Models at Inference Time (slides / lab session)
6. Handling the Risks of Language Models ([slides](https://github.com/NathanGodey/AdvancedNLP/raw/main/slides/pdf/course5_risks.pdf) / lab session)
7. Advanced NLP tasks ([slides](https://github.com/NathanGodey/AdvancedNLP/raw/main/slides/pdf/course6_advanced.pdf) / lab session)
8. Domain-specific NLP ([slides](https://github.com/NathanGodey/AdvancedNLP/raw/main/slides/pdf/course7_specific.pdf) / lab session)
9. Multilingual NLP ([slides](https://github.com/NathanGodey/AdvancedNLP/raw/main/slides/Course%205%20-%20Multilingual%20NLP.pdf) / lab session)
10. Multimodal NLP ([slides](https://docs.google.com/presentation/d/1K2DgnPSOGXB1hQ4FZoUU-5ppJ4dn_sLC41Ecwmxi2Zk/edit?usp=sharing) / lab session)

## Evaluation

The evaluation consists in a team project (3-5 people). The choice of the subject is **free** but needs to follow some basic rules:

- Obviously, the project must be highly related with NLP and especially with the notions we will cover in the course
- You can only use open-source LLM that *you serve yourself*. In other words, no API / ChatGPT-like must be used, except for final comparison with your model.
- You must identify and address a <ins>challenging</ins> problem (e.g. not only *can a LLM do X?*, but *can a LLM <ins>that runs on a CPU</ins> do X?*, or *can I make a LLM <ins>better</ins> at X?*)
- You can only use open-source LLM that _you serve yourself_. In other words, no API / ChatGPT-like must be used, except for final comparison with your model.
- You must identify and address a <ins>challenging</ins> problem (e.g. not only _can a LLM do X?_, but _can a LLM <ins>that runs on a CPU</ins> do X?_, or _can I make a LLM <ins>better</ins> at X?_)
- It must be reasonably doable: you will not be able to fine-tune (even to use) a 405B parameters model, or to train a model from scratch. That's fine, there are a lot of smaller models that should be good enough, like [the Pythia models](https://huggingface.co/collections/EleutherAI/pythia-scaling-suite-64fb5dfa8c21ebb3db7ad2e1), [TinyLLama](https://huggingface.co/collections/TinyLlama/tinyllama-11b-v1-660bb5bfabd8bd25eebbb1ef) or the 1B parameter [OLMo](https://huggingface.co/collections/allenai/olmo-suite-65aeaae8fe5b6b2122b46778).

:alarm_clock: The project follows 3 deadlines:

- **Project announcement (before 25/10/24)**: send an email to `[email protected]` with cc's `[email protected]` and `[email protected]` explaining
- The team members (also cc'ed)
- A vague description of the project (it can change later on)
- **Project proposal (25% of final grade, before 15/11/24)**: following [this template](https://docs.google.com/document/d/1rCWr6p5N0ip7fpNv9e5wjX7gez4oaFGioatYXRRKGR8/edit?usp=sharing), produce a project proposal explaining first attempts (e.g. version alpha), how they failed/succeeded and what you want to do before the delivery.
- **Project delivery (75% of final grade, 13/12/24)**: delivery of a GitHub repo with an explanatory README + oral presentation on **December 13th**

## Inspiring articles

### Tokenization

- A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning (https://arxiv.org/abs/2204.10815)
- BPE-Dropout: Simple and Effective Subword Regularization (https://aclanthology.org/2020.acl-main.170/)
- FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models (https://aclanthology.org/2023.emnlp-main.829/)

### Fast inference

- Efficient Streaming Language Models with Attention Sinks (https://arxiv.org/abs/2309.17453)
- Lookahead decoding (https://lmsys.org/blog/2023-11-21-lookahead-decoding/)
- Efficient Memory Management for Large Language Model Serving with PagedAttention (https://arxiv.org/pdf/2309.06180.pdf)

### Inference-time scaling (OpenAI's o1 model)

- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (https://arxiv.org/abs/2201.11903)
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (https://arxiv.org/abs/2408.03314v1)

### LLM detection

- Detecting Pretraining Data from Large Language Models (https://arxiv.org/abs/2310.16789)
- Proving Test Set Contamination in Black Box Language Models (https://arxiv.org/abs/2310.17623)

### SSMs (off-program)

- Mamba: Linear-Time Sequence Modeling with Selective State Spaces (https://arxiv.org/abs/2312.00752)

### Alignment & Safety

- Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection (https://aclanthology.org/2020.acl-main.647/)
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model (https://arxiv.org/abs/2305.18290)
- Text Embeddings Reveal (Almost) As Much As Text (https://arxiv.org/abs/2310.06816)

Binary file removed imgs/course6/dragon_sampling.PNG
Binary file not shown.
Binary file removed imgs/course6/dragon_training.PNG
Binary file not shown.
19 changes: 19 additions & 0 deletions markdown/course5_inference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
theme: gaia
_class: lead
paginate: true
title: "Course 4: Efficient NLP"
backgroundColor: #fff
marp: true
---

# **Course 5: Language Models at Inference Time**


---
<!--footer: 'Course 4: LMs at Inference Time' -->

### Content

1. Background
2. Decoding Methods & Motivation
File renamed without changes.
File renamed without changes.
File renamed without changes.
331 changes: 0 additions & 331 deletions slides/course5_risks.html

This file was deleted.

281 changes: 0 additions & 281 deletions slides/course6_advanced.html

This file was deleted.

310 changes: 0 additions & 310 deletions slides/course7_specific.html

This file was deleted.

Binary file removed slides/pdf/course5_risks.pdf
Binary file not shown.
Binary file removed slides/pdf/course6_advanced.pdf
Binary file not shown.
Binary file removed slides/pdf/course7_specific.pdf
Binary file not shown.

0 comments on commit 1807947

Please sign in to comment.