This is intended to be a step by step guide on how to implement any neural network architecture between linear regression and the transformer. This repo will guide you step by step with incremental changes to the code upto complex architectures. You are free to experiment with them as much as you want and i encourage you to do that.
Some compromises are being made in the interest of speed, but i tried to make this maximally useful and as intuitive as i could.
I think this goes without saying, that you shouldn't use this code in prod. It's optimized for understanding, not stability or performance.