Skip to content

Latest commit

 

History

History
12 lines (10 loc) · 613 Bytes

README.md

File metadata and controls

12 lines (10 loc) · 613 Bytes

Bigram-Level-Language Model : Nano_GPT

This is the model trained on shakespearian dataset using the decoder only transformer architecture utilizing pytorch framework to generate random shakespearian text. This project is for educational purpose and to get deep look into the inner workings of the transofrmer architecture which is used in GPT3.5 and other LLMs.

Model Specifications :

  • Parameters : 408897
  • Dataset trained on size : 1.06 Mb text file
  • Context_length used for predictions in self attention block: 32
  • Multi-Head Attention blocks: 16
  • Layers : 8 (decoder blocks)
  • Learning Rate: 0.02