This task aims to learn how to deal with Reinforcement Learning regarding this reference. In addition to that, it's required to do the following tasks:
-
Turn this code into a module of functions that can use multiple environments.
-
Tune alpha, gamma, and/or epsilon using a decay over episodes.
-
Implement a grid search to discover the best hyperparameters.
#Setup First, It's needed to install the following libraries while dealing with Windows OS:
!pip install cmake "gym[atari]" scipy
Furthermore, It's needed to import some important libraries:
import gym
from IPython.display import clear_output
from time import sleep
import random
from IPython.display import clear_output
import numpy as np
import pandas as pd
#Training starting with giving the environment to the train function then return the Q table which contain the knowledge.
#Evaluation we evaluating the model by providing its env and the Q-table that generated from training to get the penalty score and the timesteps. #Requirements
# "Taxi-v3" Environment
Q,f=train(env,10000,0.8,0.9,0.8)
##########################################
# Episode: 10000
# Training finished.
#
# Results after 100 episodes:
# Average timesteps per episode: 13.24
# Average penalties per episode: 0.0
It's required to change the hyperparameters while training, so we changed the hyperparameters each every 5000. alpha=alpha*(1-0.01) gamma=gamma*(1-0.01) eps=eps*(1-0.6)
It's required to implement Grid Search to find the best combinations of hyper parameters values to get the minimum penalty and minimum steptime.
parameters = {'0.6,0.6,0.7'}
grid(env,alphas,gammas,epsilons)
#########################
after getting "all params" and append all params into list of dictonaries then sorting them.
#Best parameters are: {'alpha': 0.6, 'gamma': 0.6, 'epsilon': 0.7, 'penalty': 0.0, 'time step': 12.89}
##########################################
# Episode: 100000
# Training finished.
#
# Results after 1000 episodes:
# Average timesteps per episode: 13.04
# Average penalties per episode: 0.0
##########################################
Available Environments
1-Taxi-v3
2-FrozenLake-v1
3-CliffWalking-v0
you can choose from them and the defualt one is Taxi
Proplem definition :
There are 4 locations (labeled by different letters), and our job is to pick up
the passenger at one location and drop him off at another. We receive +20
points for a successful drop-off and lose 1 point for every time-step it
takes. There is also a 10 points penalty for illegal pick-up and drop-off
actions.
Introduction
In this project we will implement the Q-leaning algorithm and will see how the decay of the hyperparameter such as learning rate and discount factor and eplison will effect the results and we will implement a grid search to select the best parameters.
It is required to setup this libraries to run the project
!pip install gym
!pip install numpy
The job is to pick up the passenger at one location and drop them off in another. Here are a few things that we'd love our taxi to take care of:
- Drop off the passenger to the right location.
- Save passenger's time by taking minimum time possible to drop off
- Take care of passenger's safety and traffic rules
Trying random actions to see how the agent movements
After the agent has been trained
We can notice the difference and how the agent has been trained
We used a brute force algorithm to get the best hyper parameter
Best hyperparameters :
alpha=0.9 ,gamma=0.9, epsilon=0.9