-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DIAMBRA Bonus Unit #540
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this new Unit, I would need the toctree change to be able to visualize the course version and being able to make a complete review.
@@ -1,11 +1,64 @@ | |||
# Introduction | |||
# DIAMBRA Arena Overview | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add an introduction.
Welcome to this new bonus unit where you'll learn to use DIambra and train agents to play
At the end of the unit you'll get
illustreation of agent playing
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
||
Sound fun? Let's get started 🔥, | ||
All environments are episodic Reinforcement Learning tasks, with discrete actions (gamepad buttons) and observations composed by screen pixels plus additional numerical data (RAM states like characters health bars or characters stage side). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All environments are episodic Reinforcement Learning tasks, with discrete actions (gamepad buttons) and observations composed by screen pixels plus additional numerical data (RAM states like characters health bars or characters stage side). | |
All environments are **episodic Reinforcement Learning tasks**, with **discrete actions** (gamepad buttons) and observations composed by screen pixels plus additional numerical data (RAM states like characters health bars or characters stage side). |
|
||
Interfaced games have been selected among the most popular fighting retro-games. While sharing the same fundamental mechanics, they provide different challenges, with specific features such as different type and number of characters, how to perform combos, health bars recharging, etc. Whenever possible, games are released with all hidden/bonus characters unlocked. | ||
|
||
In this unit we will focus on Street Fighter III, but other historic games are also available, and the list will continue to grow. Switching between them is very straightforward, so at the end of the unit you will be able to easily target additional titles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this unit we will focus on Street Fighter III, but other historic games are also available, and the list will continue to grow. Switching between them is very straightforward, so at the end of the unit you will be able to easily target additional titles. | |
In this unit **we will focus on Street Fighter III, but other historic games are also available**, and the list will continue to grow. Switching between them is very straightforward, so at the end of the unit you will be able to easily target additional titles. |
|
||
## Preliminary Steps: Download Game ROM(s) and Check Validity | ||
|
||
After completing the installation, you will need to obtain the Game ROM(s) of your interest and check their validity according to the following steps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we can't provide link to find them ?
|
||
### Interaction Basics | ||
|
||
DIAMBRA Arena Environments usage follows the standard RL interaction framework: the agent sends an action to the environment, which process it and performs a transition accordingly, from the starting state to the new state, returning the observation and the reward to the agent to close the interaction loop. The figure below shows this typical interaction scheme and data flow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DIAMBRA Arena Environments usage follows the standard RL interaction framework: the agent sends an action to the environment, which process it and performs a transition accordingly, from the starting state to the new state, returning the observation and the reward to the agent to close the interaction loop. The figure below shows this typical interaction scheme and data flow. | |
DIAMBRA Arena Environments usage **follows the standard RL interaction framework**: the agent sends an action to the environment, which process it and performs a transition accordingly, from the starting state to the new state, returning the observation and the reward to the agent to close the interaction loop. The figure below shows this typical interaction scheme and data flow. |
|
||
The default reward is defined as a function of characters health values so that, qualitatively, damage suffered by the agent corresponds to a negative reward, and damage inflicted to the opponent corresponds to a positive reward. The quantitative, general and formal reward function definition is as follows: | ||
|
||
R_t = \sum_i^{0,N_c}\left(\bar{H_i}^{t^-} - \bar{H_i}^{t} - \left(\hat{H_i}^{t^-} - \hat{H_i}^{t}\right)\right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R_t = \sum_i^{0,N_c}\left(\bar{H_i}^{t^-} - \bar{H_i}^{t} - \left(\hat{H_i}^{t^-} - \hat{H_i}^{t}\right)\right) | |
\\(R_t = \sum_i^{0,N_c}\left(\bar{H_i}^{t^-} - \bar{H_i}^{t} - \left(\hat{H_i}^{t^-} - \hat{H_i}^{t}\right)\right)\\) |
|
||
Where: | ||
|
||
- \bar{H} and \hat{H} are health values for opponent’s character(s) and agent’s one(s) respectively; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- \bar{H} and \hat{H} are health values for opponent’s character(s) and agent’s one(s) respectively; | |
- \\(\bar{H} and \hat{H}\\) are health values for opponent’s character(s) and agent’s one(s) respectively; |
Where: | ||
|
||
- \bar{H} and \hat{H} are health values for opponent’s character(s) and agent’s one(s) respectively; | ||
- t^- and t are used to indicate conditions at ”state-time” and at ”new state-time” (i.e. before and after environment step); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- t^- and t are used to indicate conditions at ”state-time” and at ”new state-time” (i.e. before and after environment step); | |
- \\(t^-\\) and \\(t\\) are used to indicate conditions at ”state-time” and at ”new state-time” (i.e. before and after environment step); |
|
||
- \bar{H} and \hat{H} are health values for opponent’s character(s) and agent’s one(s) respectively; | ||
- t^- and t are used to indicate conditions at ”state-time” and at ”new state-time” (i.e. before and after environment step); | ||
- N_c is the number of characters taking part in a round. Usually is N_c = 1 but there are some games where multiple characters are used, with the additional possible option of alternating them during gameplay, like Tekken Tag Tournament where 2 characters have to be selected and two opponents are faced every round (thus N_c = 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- N_c is the number of characters taking part in a round. Usually is N_c = 1 but there are some games where multiple characters are used, with the additional possible option of alternating them during gameplay, like Tekken Tag Tournament where 2 characters have to be selected and two opponents are faced every round (thus N_c = 2); | |
- \\(N_c\\) is the number of characters taking part in a round. Usually is \\(N_c = 1\\) but there are some games where multiple characters are used, with the additional possible option of alternating them during gameplay, like Tekken Tag Tournament where 2 characters have to be selected and two opponents are faced every round (thus \\(N_c = 2\\) ); |
|
||
DIAMBRA Arena comes with a large number of ready-to-use wrappers and examples showing how to apply them. They cover a wide spectrum of use cases, and also provide reference templates to develop custom ones. | ||
|
||
Environmet wrappers are widely used to tweak the observation and action spaces. In order to activate them, one needs to properly set the `WrapperSettings` class attributes and provide them as input to the environment creation method, as shown in the next code block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Environmet wrappers are widely used to tweak the observation and action spaces. In order to activate them, one needs to properly set the `WrapperSettings` class attributes and provide them as input to the environment creation method, as shown in the next code block. | |
Environment wrappers are **widely used to tweak the observation and action spaces**. In order to activate them, one needs to properly set the `WrapperSettings` class attributes and provide them as input to the environment creation method, as shown in the next code block. |
|
||
<img src="https://github.com/diambra/agents/blob/main/img/agents.jpg?raw=true" alt="DIAMBRA Agents"/> | ||
|
||
For training our model, we will rely on already implemented RL algorithms, leveraging state-of-the-art Reinforcement Learning libraries. There are multiple advantages in doing so: these libraries provide high quality algorithms, efficiently implemented and continuously tested, they allow to focus efforts on higher level aspects such as policy network architecture, features selection, and hyper-parameters tuning, they provide native solutions to parallelize environment execution, and, in some cases, they even support distributed training using multiple GPUs in a single workstation or in cluster contexts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For training our model, we will rely on already implemented RL algorithms, leveraging state-of-the-art Reinforcement Learning libraries. There are multiple advantages in doing so: these libraries provide high quality algorithms, efficiently implemented and continuously tested, they allow to focus efforts on higher level aspects such as policy network architecture, features selection, and hyper-parameters tuning, they provide native solutions to parallelize environment execution, and, in some cases, they even support distributed training using multiple GPUs in a single workstation or in cluster contexts. | |
For training our model, we will rely on already implemented RL algorithms, leveraging state-of-the-art Reinforcement Learning libraries. | |
There are multiple advantages in doing so: | |
- These libraries provide **high quality algorithms**, efficiently implemented and continuously tested, they allow to focus efforts on higher level aspects such as policy network architecture, features selection, and hyper-parameters tuning. | |
- They provide **native solutions to parallelize environment execution**, and, in some cases, they even support distributed training using multiple GPUs in a single workstation or in cluster contexts. |
|
||
### Getting Ready | ||
|
||
We highly recommend using virtual environments to isolate your python installs, especially to avoid conflicts in dependencies. In what follows we use Conda but any other tool should work too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We highly recommend using virtual environments to isolate your python installs, especially to avoid conflicts in dependencies. In what follows we use Conda but any other tool should work too. | |
We highly **recommend using virtual environments to isolate your python installs, especially to avoid conflicts in dependencies**. In what follows we use Conda but any other tool should work too. |
|
||
DIAMBRA Competition Platform allows you to submit your agents and compete with other coders around the globe in epic video games tournaments! | ||
|
||
It features a public global leaderboard where users are ranked by the best score achieved by their agents in the different environments. It also offers you the possibility to unlock cool achievements depending on the performances of your agent. Submitted agents are evaluated and their episodes are streamed on DIAMBRA Twitch channel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It features a public global leaderboard where users are ranked by the best score achieved by their agents in the different environments. It also offers you the possibility to unlock cool achievements depending on the performances of your agent. Submitted agents are evaluated and their episodes are streamed on DIAMBRA Twitch channel. | |
It features a public global leaderboard where users are ranked by the best score achieved by their agents in the different environments. It also offers you the possibility to unlock cool achievements depending on the performances of your agent. **Submitted agents are evaluated and their episodes are streamed on DIAMBRA Twitch channel**. |
|
||
DIAMBRA Competition Platform allows you to submit your agents and compete with other coders around the globe in epic video games tournaments! | ||
|
||
It features a public global leaderboard where users are ranked by the best score achieved by their agents in the different environments. It also offers you the possibility to unlock cool achievements depending on the performances of your agent. Submitted agents are evaluated and their episodes are streamed on DIAMBRA Twitch channel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add Twitch channel link
|
||
This will automatically retrieve the Hugging Face token you saved earlier and will create a new submission on [DIAMBRA Competition Platform](https://diambra.ai). | ||
|
||
You will be able to see it on your dashboard just logging in with your credentials, and watch it being streamed on Twitch! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add link of the twitch channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for your work, I added some update, request changes and advice 🤗
No description provided.