-
Notifications
You must be signed in to change notification settings - Fork 2
Smart Car Approach
Our next major goal will be to implementing Smart versions of the aforementioned vehicles. We hypothesize a significant increase in car throughput via primary and secondary congestion avoidance. Our intent is to use a common machine learning technique known as Q-Learning in order to teach the vehicles how to behave in specific conditions. This technique uses what’s known as “Reinforcement Learning”.
An example of Reinforcement Learning is when a child learns to walk. The child will take a big step forward, then falls. The next time, it takes a smaller step and is able to hold its balance. The child tries variations like this many times; eventually, it learns the right size of steps to take and walks steadily. It has succeeded.
There are 3 basic concepts in reinforcement learning (state, action, and reward). In our situation the car will learn the best routes to get from point A to point B as fast as possible, it will also learn which lane is the best lane to travel in during certain situations.
Currently Reinforcement Learning is used for Playing the board game Go (finding the most optimal solution), Robot Control (finding the most optimal path, changing directions, etc), and many more.
The Q in Q-Learning stands for the long-term value of an action. Q-learning is about learning Q-values through observations. The procedure for Q-learning starts by initializing Q-values to 0 for every state-action pair: Q(s,a) = 0 for all states s and actions a. Once the car starts learning, it takes an action a in state s and receives a reward r. It also observes that the state has changed to a new state s’. The car will then update its values for state and action. When updating the values Q carries memory from the past and takes into account all future steps. We will use the maximized Q-value for the new state, ultimately resulting in the vehicle taking the optimal path.
Below we can see two different approaches to the Q-Learning algorithm:
Inputs:
The current state of s' and reward signal r'
Persistent Values:
Q, a table of action values indexed by state and action, initially zero
Nsa, a table of frequencies for state-action pairs, initially zero
s, a, r, the previous state, action, and reward, initially null