Author: Archit Vasan , including materials on LLMs by Varuni Sastri and Carlo Graziani at Argonne, and discussion/editorial work by Taylor Childers, Bethany Lusch, and Venkat Vishwanath (Argonne)
Inspiration from the blog posts "The Illustrated Transformer" and "The Illustrated GPT2" by Jay Alammar, highly recommended reading.
This tutorial covers the some fundamental concepts necessary to to study of large language models (LLMs).
- Scientific applications for language models
- General overview of Transformers
- Tokenization
- Model Architecture
- Pipeline using HuggingFace
- Model loading
- If you are using ALCF, first log in. From a terminal run the following command:
-
Although we already cloned the repo before, you'll want the updated version. To be reminded of the instructions for syncing your fork, click here.
-
Now that we have the updated notebooks, we can open them. If you are using ALCF JupyterHub or Google Colab, you can be reminded of the steps here.
-
Reminder: Change the notebook's kernel to
datascience/conda-2024-08-08
(you may need to change kernel each time you open a notebook for the first time):- select Kernel in the menu bar
- select Change kernel...
- select datascience/conda-2024-08-08 from the drop-down menu
In case you have trouble accessing Sophia, all notebook material can be run in google colab.
Just:
- Go to this link: Colab
- Click on
File/Open notebook
- Nagivate to the
GitHub
tab and findargonne-lcf/ai-science-training-series
- Click on
04_intro_to_llms/IntroLLMs.ipynb
I strongly recommend reading "The Illustrated Transformer" by Jay AlammarAlammar also has a useful post dedicated more generally to Sequence-to-Sequence modeling "Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention), which illustrates the attention mechanism in the context of a more generic language translation model.
Solutions to homework problems are posted in IntroLLMHWSols.ipynb To see BertViz attention mechanisms, simply open the notebook in google colab.