IANNWTF Project WS22/23
Eosandra Grund and Hanna Kirsch
The project was created using Python 3.10.8. The package requirements can be found in requirements.txt. It is our final project for the course 'Implementing Artificial Neural Networks in Tensorflow' given in the winter semester 22/23 at Osnabrück University.
We trained DQN agents to play Tic-Tac-Toe and Connect Four using self-play. And then used them as a base for an adapting agent that creates a more balanced game and hopefully a better playing experience. The adapting agents where created by changing the action decions formula in the method action_choice of Agent.
As you can see in the following plots, our agents learned well.
Training progress of our best Tic-Tac-Toe agent in loss
Training progress of our best Tic-Tac-Toe agent in ratio of rewards. The red line shows which agent we chose as the best one.
The plots for Connect Four can be found in here.
The adapting agents worked well, but not perfect yet. The following is a plot of one method for Tic-Tac-Toe. On the x-axis are different strong opponents from strong to weak and on the y-axis is the percentage for every reward.
- Plots: Folder with different Plots from runs and tests
- best_agents_plots: contains plots from loss and ratio of rewards from the testing during training
- class_diagrams: contains UML class diagrams for our code
- connectfour_testing: contains plots form testing DQNAgent and Adapting agents against different strong opponents for Connect Four
- tiktaktoe_testing:contains plots form testing DQNAgent and Adapting agents against different strong opponents for Tic-Tac-Toe
- Video: contains the script, the slides and the presentation video
- logs: Log-Files created during the runs
- 3agents_linear/20230328-212250: training 3 agents to play tic-tac-toe using epsilon = 0.995
- AgentTanH/20230327-192241: training Tic-Tac-Toe with linear activation function using epsilon = 0.998
- ConnectFour_linear/20230330-174353/: training Connect Four with a linear activation function and epsilon = 0.995
- agent_ConnectFour_tanh/20230328-094318: training Connect Four with tanh activation function (best performance for Connect Four) and epsilon = 0.995
- agent_linear_decay099/20230327-185908: training Tic-Tac-Toe with a linear activation function and epsilon = 0.99 (best performance for Tic-Tac-Toe)
- best_agent_tiktaktoe_0998/20230327-191103: training Tic-Tac-Toe with tanh activation function and epsilon = 0.998
- model: Saved best models of different runs, for a description have a look at the logs description above
- Report: contains the finished report which was part of the project
- selfplaydqn: Entails all Python files
- agentmodule
- agent.py: contains all Agent subclasses and the Agent class
- buffer.py: contains the buffer class and all its functionalities
- model.py: contains the convolutional neural network used as Q-network
- testing.py: contains functions for testing the agents performance
- training.py: contains the training algorithm
- envs
- envwrapper2.py: The wrapper that makes self-play environments out of the envs
- keras_gym_env.py: Contains Connect Four
- keras_gym_env_2wins.py : Connect Four, but 2 horizontal wins, and only wins
- keras_gym_env_novertical.py: Connect Four, but you cannot win vertically
- sampler.py: The sampling algorithm as a class that remembers the opponent and all the envs to reuse them
- tiktaktoe_env.py: Contains Tic-Tac-Toe
- main_testing.py: test one ore default several agents performance against a random agent
- main_testing_adapting.py: test one adapting agent (or DQNAgent) against different strong opponents
- main_train_best.py: trains a DQN
- play_vs_agent.py: lets you play against a trained agent
- play_vs_agent_adapting.py: lets you play against a trained agent using an adapting action decision formula
- play_vs_person.py: lets you play against yourself or another person to test the environments
- plot_tensorboard_data.py: plots csv data from tensorboard, either loss or reward ratio
- agentmodule
- text_info_runs: data created by testing different agents, as well as information about hyperparameters used to train the different agents
- ConnectFour.txt: contains information about hyperparameters for all Connect Four training runs
- best_agent_ConncetFour_comparison.txt: contains results of the comparison of all best Connect Four agents
- best_agent_comparison.txt: contains results of the comparison of all best Tic-Tac-Toe agents
- best_model_tiktaktoe2703.txt: contains information about hyperparameters for all Tic-Tac-Toe training runs