Improve gamemode weighting #34

Kaiyotech · 2022-09-13T20:23:22Z

This changes the gamemode weighting to be faster (no redis calls) and more stable so that it doesn't swing around so much. Each training step should have a very, very similar weight of gamemodes, that's very close to the desired weight, even if the training steps are very short (I've tested with 100k steps).

lucas-emery · 2022-09-18T12:10:13Z

rocket_learn/rollout_generator/redis/redis_rollout_worker.py

+        # change weights from percentage of experience desired to percentage of gamemodes necessary (approx)
+        for k in self.gamemode_weights.keys():
+            b, o = k.split("v")
+            self.gamemode_weights[k] /= int(b)
+        weights_sum = sum(self.gamemode_weights.values())
+        self.gamemode_weights = {k: self.gamemode_weights[k] / weights_sum for k in self.gamemode_weights.keys()}


I don't think it's a good idea to reuse a public attribute named as a constructor arg to store values different from the args provided, in other words, i think self.game_mode should be readonly for the sake of clarity and consistency

lucas-emery · 2022-09-18T12:18:48Z

rocket_learn/rollout_generator/redis/redis_rollout_worker.py

+        self.gamemode_weights = {k: max(self.gamemode_weights[k] + diff[k], 0) for k in self.gamemode_weights.keys()}
+        new_sum = sum(self.gamemode_weights.values())
+        self.gamemode_weights = {k: self.gamemode_weights[k] / new_sum for k in self.gamemode_weights.keys()}
+        mode = np.random.choice(list(self.gamemode_weights.keys()), p=list(self.gamemode_weights.values()))


Although this method improves on the previous one in several ways (uses random sampling and reduces worker correlation), they still both suffer from the same downfall: The algorithm is based on a proportional correction based on the error between target and current distribution.
This means that as the current distribution gets closer to the target one, the error goes to zero and so does the correction term, which causes the error to go back up.
Nevertheless, I can see how this new algorithm is more robust than the previous one when multiple workers are sampling in parallel but the empirical distribution will probably never reach the target one.

lucas-emery · 2022-09-18T12:35:35Z

You can use this algorithm to get more stable sampling probabilities over time:

Keep an estimate of the mean experience generated in each gamemode
Calculate empirical distribution weights: Wemp = mean_exp / sum(mean_exp)
Calculate corrected weights based on these estimates: Wcor = Wtarget / Wemp
Calculate corrected sampling probs: P = Wcor / sum(Wcor)

For step 1 you can use an EMA initialized based on agent count or anything other than 0, e.g. mean_exp = {'1v1': 1000, '2v2': 2000, '3v3': 3000}

Rolv-Arild · 2022-12-06T23:17:01Z

What's the conclusion here?

Kaiyotech · 2022-12-06T23:35:45Z

Conclusion is I got busy and didn't finish it, but it's still on my list. I'm going to take the suggestions, just haven't finished yet.

…

On Tue, Dec 6, 2022, 6:17 PM Rolv-Arild ***@***.***> wrote: What's the conclusion here? — Reply to this email directly, view it on GitHub <#34 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AWLB4KVNU7TODTBY6367GRLWL7CPPANCNFSM6AAAAAAQLZBVQ4> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Kaiyotech · 2023-01-23T14:20:21Z

Ok, this is ready and tested. Uses the EMA for the weights, per worker. Generated experience is per actual, which means that if you're using pretrained agents or past models, those percentages naturally come out of the generated experience, which I think is ideal.

Kaiyotech · 2023-02-07T15:06:12Z

added one commit for the 1v0 fixes that is related to this.

Kaiyotech added 3 commits September 12, 2022 17:16

fixing gamemode weighting to be more robust and stable

1711a2d

fixing gamemode weighting to be more robust and stable

ea6690a

removing redis from gamemode weights, just per worker

095f94c

lucas-emery reviewed Sep 18, 2022

View reviewed changes

changing name of attribute.

fefc16a

Kaiyotech added 5 commits December 7, 2022 22:19

WIP

6bdd886

testing

f90a0fe

adding param

90b0c8a

fix weight division 1e-8

4e97717

typo

6b9b8dd

Fix for 1v0

6d7d453

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve gamemode weighting #34

Improve gamemode weighting #34

Kaiyotech commented Sep 13, 2022

lucas-emery Sep 18, 2022

lucas-emery Sep 18, 2022

lucas-emery commented Sep 18, 2022

Rolv-Arild commented Dec 6, 2022

Kaiyotech commented Dec 6, 2022 via email

Kaiyotech commented Jan 23, 2023

Kaiyotech commented Feb 7, 2023

Improve gamemode weighting #34

Are you sure you want to change the base?

Improve gamemode weighting #34

Conversation

Kaiyotech commented Sep 13, 2022

lucas-emery Sep 18, 2022

Choose a reason for hiding this comment

lucas-emery Sep 18, 2022

Choose a reason for hiding this comment

lucas-emery commented Sep 18, 2022

Rolv-Arild commented Dec 6, 2022

Kaiyotech commented Dec 6, 2022 via email

Kaiyotech commented Jan 23, 2023

Kaiyotech commented Feb 7, 2023