3. Genetic Algorithm

Genetic Algorithm

Motivation

Considering how reinforcement learning requires good loss functions and very specific reward functions to work effectively, we will utilize a genetic algorithm to train agents. In addition, there are relatively few constants that the AI relies on to make decisions (there are exactly 30 trained parameters).

Training Environment

This training environment is identical to the HTML5 game playable on the repository website, except with a few key differences:

The game is seeded, meaning that the random number generator has identical outputs when utilizing the same seed (in fact, asteroids and saucers will spawn in the same locations, and they will have the same splits when broken, meaning that each danger has its own individual seeded random number generator as well).
The hitboxes are replaced with the targeting and danger radii that the AI assumes (this makes the game more difficult, and makes simulating the game less computationally-intensive).
The AI only has 1 life and can't gain any more lives through the 10000 point bonus lives.
Saucers being destroyed doesn't award points, as it can erroneously benefit agents that don't move and just "farm" saucers.

Other than these changes, the game settings are the same as in the playable version in the website.

Algorithm Overview

Fitness Scores

A central aspect to genetic algorithms is to define a fitness function, which produces a quantitative representation of how well a specific agent performs in a given generation. The fitness function has the following changeable parameters:

Trial Count: The number of seeds (game instances) we test an agent on (a higher number of seeds means training is slower, but makes the fitness function more accurate)
Flee Weight: This represents how much we add to the fitness score based on how long our agent spends in flee mode (I found that setting this value to being negative in the first few generations is beneficial to prevent a locally optimal solution where the AI just keeps spinning and moving in an odd manner).
Time Weight: This represents how much we add to the fitness score based on how long the agent survives.
Score Weight: This represents how much we add to the fitness score based on how high of a score our agent achieves.

Generations

Each generation has many individuals, and in each generation, we test on a new set of seeds to see how generalizable each agent's strategy is. Then after a generation, we analyze the results of the generation to create a new generation through the crossover system. The stopping condition for the algorithm is either hitting a certain maximum number of generations, or hitting a certain fitness threshold.

Crossover

This section showcases how we combine the genes of previous generations to create newer generations that are intended to be improvements of previous generations (roughly simulating the process of evolution in real-life)

Carry-Over System

First-off, we will define a variable that represents how many agents we will copy from the previous generation to the current generation. For this overview, we will define this variable as $x$. We will only carry-over the top $x$ agents (sorted by increasing fitness scores) that are better than the some "inclusion threshold" variable (this allows us to carry $< x$ agents if our distribution is left-skewed).

Weighting

We will create a "weight" for each individual using a modified Softmax function. This requires us to have an "inclusion limit", which we will call $y$. So, for a given agent, we will define their "weight" as $$W(\text{agent})=\frac{e^{\text{fitnessZ}(agent)}}{\sum_{\text{a}\in\text{Agents}}e^{\text{fitnessZ}(a)}}$$

We are utilizing the Z-scores of the fitness values to standardize the process

Agent Creation

For the first generation, we will either load agents from a save file, or pick traits randomly (or with some predefined constant values). For subsequent generations, we will pick two random agents in the probability distribution defined by the weight-values in the previous step, and for each gene, we pick one of the two agents to provide their constant values for each gene. For two agents $A$ and $B$, we have the following probabilities (independent for each gene) to pick from agent $A$ or $B$: $$\mathbb{P}[A]=\frac{W(A)}{W(A)+W(B)},\mathbb{P}[B]=\frac{W(B)}{W(A)+W(B)}$$

Mutation

Multiplicative Mutations

With a certain probability $p$, a constant $C_i$ for the newly-created agent can be mutated as follows (where $z_p$ is some mutation parameter): $$C_i:=C_ie^{Z},Z\sim\mathcal{N}(0,z_p)$$

Shift Mutations

With a certain probability $q$ (where $q < p$ seems to work particularly well), a constant $C_i$ for the newly-created agent can be mutated as follows (where $z_q$ is some mutation parameter): $$C_i:=C_i+Z,Z\sim\mathcal{N}(0,z_q)$$

Optimizations

Exploration vs. Exploitation Mode

If the maximum fitness of a generation is close to the average fitness of a generation, we will increase the mutation rate for the next generation, as a small difference between the mean and maximum is a potential sign of premature convergence.

Trial Halting

If we get a generation with an average fitness much lower than previous generations, we will prevent the seeds from changing for the next generation until we get the average fitness within a certain margin of the maximum average fitness over all generations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly