Louis Caubet, Firas Ben Jedidia, Long Van Tran Ha, Léo Feliers, Inès Vignal
2023 Project for the INF581 Advanced Machine Learning course at Ecole Polytechnique.
We recommend using Python 3.9 in a virtual environment to run this project.
-
Install Java JDK 8 (AdoptOpenJDK)
-
Clone this repository:
git clone https://github.com/LouisCaubet/RLMinecraftParkour.git
cd RLMinecraftParkour
- Install python dependencies:
pip install -r requirements.txt
- Install Malmo & MalmoEnv:
git clone https://github.com/Microsoft/malmo.git
cd malmo/Minecraft
(echo -n "malmomod.version=" && cat ../VERSION) > ./src/main/resources/version.properties
Start Minecraft with Malmo in a terminal by running
cd malmo/Minecraft
launchClient.bat -port 9000 -env
Open another terminal to run our code.
You can then run the desired Python script. Make sure it is executed from the root of the project.
python src/test_parkour_env.py
will simply open theparkour_env
in Minecraft.python src/sb3_training.py
will run the training using Stable-Baselines3python src/sb3_testing.py
will run the SB3 trained model in inference mode.
Use the .env
file for configuration. Here's a list of environment variables we use:
MINERL_PARKOUR_MAP
: Path to the CSV defining the map.MALMO_PORT
: Port on which Malmo is running (default: 9000)SB3_ALGO
: Algorithm to use for training. Possible values: DQN, PPO, A2CSB3_TIMESTEPS
: Number of training timestepsS3_TRAINED_MODEL_NAME
: Name under which to save the model after training.SB3_INFERENCE_MODEL_NAME
: Model to use for inference in thesb3_predict
script.SB3_INFERENCE_STEPS
: Number of steps to run inference for.
Trained using PPO with 10k steps.
Action space: Move, Strafe
Rewards:
- +100 for reaching the diamond block
- +10 for each (gold) block towards the goal
- -100 and end of episode when touching the bedrock
Level1.mp4
Action space: Move, Strafe, JumpStrafe
Rewards:
- +100 for reaching the diamond block
- +10 for each (gold) block towards the goal
- -100 and end of episode when touching the bedrock
When training using PPO with 10k timesteps, the agent hacks the game! (Manages to jump for way longer distances that it should be possible)
Level2-hack.mp4
To prevent this, add a minimum delay of 0.1s between actions. To adapt the agent to this new environment, we finetune the previous model for 2k more timesteps. Now, it works!
Level2-success.mp4
Note: Due to the time.sleep, sometimes the +100 reward is not given despite the agent being on the diamond block.