14.12.2020 Presentation KickOff
All coding has been done by at least two people at the same time in Pair-Programming. Therefore, we used Visual Studio Code with Live Share, so everyone could participate and write simultaneously. Because at least two people (sometimes 3 or even all 4) have been coding at the same time, everyone has a basic understanding of A2C and PPO. As requested, we divided the different tasks among us in form of experts. The division can be seen in the following table:
Topic | Name | Info |
---|---|---|
A2C | ||
Split- & Multihead NN | Sofie | |
Activation | Balthasar | Sigmoid, Softplus, Softmax, TanH, ReLu |
Min-Max-Clamping | Balthasar | |
Loss & Entropy | Balthasar | |
Advantages | Sofie | A2C, TD, 3-Step, Reinforce |
Return | Sofie | |
A2C vs A3C | Sofie | |
----- | ----- | ----- |
PPO | ||
Actor & Critic NN | Lukas | |
Memory, Buffer, Batches | Lukas | |
Hyperparameter | Denny | |
Reward | Denny | |
log_prob & prob_ratio | Denny | |
weighted_probs & clipping | Lukas | |
----- | ----- | ----- |
Slurm | Denny | Slurm Runner |
Parameter Search | Sofie | Grid Search, Evolutionary Algorithm |
Environments + Unity | Lukas | |
Ml-Flow | Balthasar | Measures, Artifacts |
Save and Load Models | Balthasar |