-
Notifications
You must be signed in to change notification settings - Fork 0
3. Genetic Algorithm
Considering how reinforcement learning requires good loss functions and very specific reward functions to work effectively, we will utilize a genetic algorithm to train agents. In addition, there are relatively few constants that the AI relies on to make decisions (there are exactly 30 trained parameters).
This training environment is identical to the HTML5 game playable on the repository website, except with a few key differences:
- The game is seeded, meaning that the random number generator has identical outputs when utilizing the same seed (in fact, asteroids and saucers will spawn in the same locations, and they will have the same splits when broken, meaning that each danger has its own individual seeded random number generator as well).
- The hitboxes are replaced with the targeting and danger radii that the AI assumes (this makes the game more difficult, and makes simulating the game less computationally-intensive).
- The AI only has 1 life and can't gain any more lives through the 10000 point bonus lives.
- Saucers being destroyed doesn't award points, as it can erroneously benefit agents that don't move and just "farm" saucers.
Other than these changes, the game settings are the same as in the playable version in the website.
A central aspect to genetic algorithms is to define a fitness function, which produces a quantitative representation of how well a specific agent performs in a given generation. The fitness function has the following changeable parameters:
- Trial Count: The number of seeds (game instances) we test an agent on (a higher number of seeds means training is slower, but makes the fitness function more accurate)
- Flee Weight: This represents how much we add to the fitness score based on how long our agent spends in flee mode (I found that setting this value to being negative in the first few generations is beneficial to prevent a locally optimal solution where the AI just keeps spinning and moving in an odd manner).
- Time Weight: This represents how much we add to the fitness score based on how long the agent survives.
- Score Weight: This represents how much we add to the fitness score based on how high of a score our agent achieves.
Each generation has many individuals, and in each generation, we test on a new set of seeds to see how generalizable each agent's strategy is. Then after a generation, we analyze the results of the generation to create a new generation through the crossover system. The stopping condition for the algorithm is either hitting a certain maximum number of generations, or hitting a certain fitness threshold.
This section showcases how we combine the genes of previous generations to create newer generations that are intended to be improvements of previous generations (roughly simulating the process of evolution in real-life)
First-off, we will define a variable that represents how many agents we will copy from the previous generation to the current generation. For this overview, we will define this variable as
We will create a "weight" for each individual using a modified Softmax function. This requires us to have an "inclusion limit", which we will call
We are utilizing the Z-scores of the fitness values to standardize the process
For the first generation, we will either load agents from a save file, or pick traits randomly (or with some predefined constant values). For subsequent generations, we will pick two random agents in the probability distribution defined by the weight-values in the previous step, and for each gene, we pick one of the two agents to provide their constant values for each gene. For two agents
With a certain probability
With a certain probability
If the maximum fitness of a generation is close to the average fitness of a generation, we will increase the mutation rate for the next generation, as a small difference between the mean and maximum is a potential sign of premature convergence.
If we get a generation with an average fitness much lower than previous generations, we will prevent the seeds from changing for the next generation until we get the average fitness within a certain margin of the maximum average fitness over all generations.