LLaMA

Papers📄

I am reading these papers:
✅ LLaMA: Open and Efficient Foundation Language Models
✅ Llama 2: Open Foundation and Fine-Tuned Chat Models
☑️ OPT: Open Pre-trained Transformer Language Models
✅ Attention Is All You Need
✅ Root Mean Square Layer Normalization
✅ GLU Variants Improve Transformer
✅ RoFormer: Enhanced Transformer with Rotary Position Embedding
✅ Self-Attention with Relative Position Representations
☑️ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
☑️ To Fold or Not to Fold: a Necessary and Sufficient Condition on Batch-Normalization Layers Folding
✅ Fast Transformer Decoding: One Write-Head is All You Need
✅ GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
☑️ PaLM: Scaling Language Modeling with Pathways

Goals 🚀

✅ Understand the concept of dot product of two matrices.
✅ Understand the concept of autoregressive language models.
✅ Understand the concept of attention computation.
✅ Understand the workings of Byte-Pair Encoding (BPE) algorithm and tokenizer.
✅ Read and implement the workings of the SentencePiece library and tokenizer.
✅ Understand the concept of tokenization, input ids and embedding vectors.
✅ Understand & implement the concept of positional encoding.
✅ Understand the concept of single head self-attention.
✅ Understand the concept of scaled dot-product attention.
✅ Understand & implement the concept of multi-head attention.
✅ Understand & implement the concept of layer normalization.
✅ Understand the concept of masked multi-head attention & softmax layer.
✅ Understand and implement the concept of RMSNorm and difference with LayerNorm.
✅ Understand the concept of internal covariate shift.
✅ Understand the concept and implementation of feed-forward network with ReLU activation.
✅ Understand the concept and implementation of feed-forward network with SwiGLU activation.
✅ Understand the concept of absolute positional encoding.
✅ Understand the concept of relative positional encoding.
✅ Understand and implement the rotary positional embedding.
✅ Understand and implement the transformer architecture.
✅ Understand and implement the original Llama (1) architecture.
✅ Understand the concept of multi-query attention with single KV projection.
✅ Understand and implement grouped query attention from scratch.
✅ Understand and implement the concept of KV cache.
✅ Understand and implement the concept of Llama2 architecture.
✅ Test the Llama2 implementation using the checkpoints from Meta.
✅ Download the checkpoints of Llama2 and inspect the inference code and working.
☑️ Documentation of the Llama2 implementation and repo.
✅ Work on implementation of enabling and disabling the KV cache.
✅ Add the attention mask when disabling the KV cache in Llama2.

Blog Posts:

✅ LLAMA: OPEN AND EFFICIENT LLM NOTES
✅ UNDERSTANDING KV CACHE
✅ GROUPED QUERY ATTENTION (GQA)

Related GitHub Works:

🌐 pytorch-llama - PyTorch implementation of LLaMA by Umar Jamil.
🌐 pytorch-transformer - PyTorch implementation of Transformer by Umar Jamil.
🌐 llama - Facebook's LLaMA implementation.
🌐 tensor2tensor - Google's transformer implementation.
🌐 rmsnorm - RMSNorm implementation.
🌐 roformer - Rotary Tranformer implementation.
🌐 xformers - Facebook's implementation.

Articles:

✅ Understanding SentencePiece ([Under][Standing][_Sentence][Piece])
✅ SwiGLU: GLU Variants Improve Transformer (2020)

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
assets		assets
llama		llama
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
llama.png		llama.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLaMA

Papers📄

Goals 🚀

Blog Posts:

Related GitHub Works:

Articles:

About

Releases

Packages

Languages

License

ThinamXx/Meta-llama

Folders and files

Latest commit

History

Repository files navigation

LLaMA

Papers📄

Goals 🚀

Blog Posts:

Related GitHub Works:

Articles:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages