This is a repo illustrating comparitive study of various optimizers available in Pytorch and Tensorflow. The Code here runs on Tf 1.2. Soon the code will be migrated to Tf 2.x and Pytorch.
The acclaimed function ASMGrad didn't perform better the traditional Adam. SGD with Momentum and Adam with Randomized Learning Rate provide best solution for this study.
Findings : SGD is time tested solution for convergence. Adam's generalization after steep convergence aren't great. ASM Grad performance wasnot upto the standard as projected.
The results can be replicated by using the notebook. Changing data and other hyper parameters can alternate the results obtained.
It worked great and converged far better than most of the optimizers.
One of the time tested optimizer