This repository has been archived by the owner on Mar 22, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 78d65ba
Showing
6 changed files
with
170 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# AI | ||
This repository contains the code for the **artificial intelligence** that I have learned. | ||
|
||
## Artificial Intelligence A-Z | ||
|
||
- [x] [01-bellman-equation](./archive/01/01-bellman-equation/README.md) | ||
- [x] [02-mdp](./archive/01/02-mdp/README.md) | ||
- [x] [03-plan-policy](./archive/01/03-plan-policy/README.md) | ||
- [x] [04-penalty](./archive/01/04-penalty/README.md) | ||
|
||
|
||
### Resources | ||
|
||
* Artificial Intelligence A-Z - [Udemy](https://www.udemy.com/course/artificial-intelligence-az/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# 01 - Bellman Equation | ||
|
||
The Bellman equation is a fundamental equation in dynamic programming. It is used to solve the optimal control problem. The Bellman equation is | ||
|
||
``` | ||
V*(s)= max[R(s,a)+ γ V∗(s′)] | ||
``` | ||
|
||
where, | ||
|
||
> $V^*$ is the value function, | ||
> | ||
> $R(s,a)$ is the reward function, | ||
> | ||
> s is the State, | ||
> | ||
> a is the Action, | ||
> | ||
> $\gamma$ is the discount factor. | ||
|
||
### Maze Example | ||
|
||
| [ - ] | [ - ] | [ - ] | 🟢 | | ||
| ----- | ----- | ----- | ----- | | ||
| [ - ] | | [ - ] | 🔴 | | ||
| [ - ] | [ - ] | [ - ] | [ - ] | | ||
|
||
|
||
Considering $\gamma$ = 0.9, the value function for the maze will be | ||
|
||
| 0.81 | 0.9 | 1 | 🟢 | | ||
| ---- | ---- | ---- | ---- | | ||
| 0.73 | | 0.9 | 🔴 | | ||
| 0.66 | 0.73 | 0.81 | 0.73 | | ||
|
||
--- | ||
|
||
**[Home](../../../README.md)** | [Next](../02-mdp/README.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# 02 - Markov Decision Processes | ||
|
||
This is a collection of resources for learning about Markov Decision Processes (MDPs). MDPs are a powerful tool for modeling sequential decision making problems. They are used in a wide variety of applications, including robotics, game theory, and reinforcement learning. | ||
|
||
|
||
### Markov Property | ||
|
||
A Markov process is a stochastic process that satisfies the Markov property. The Markov property states that the future is independent of the past given the present. In other words, the future is conditionally independent of the past given the present. The Markov property is a fundamental assumption in Markov decision processes. | ||
|
||
|
||
### Markov Decision Process | ||
|
||
A Markov decision process (MDP) is a Markov process that includes decisions. The decision maker chooses an action at each time step. The decision maker's goal is to maximize the expected value of the cumulative reward. The decision maker's goal is to find the optimal policy, which is a mapping from states to actions that maximizes the expected value of the cumulative reward. | ||
|
||
|
||
### Bellman Equation | ||
|
||
The modified Bellman equation from [bellman equation](../01-bellman-equation/README.md) will be used to solve the optimal control problem. The Bellman equation is | ||
|
||
``` | ||
V*(s)= max[R(s,a)+ γ E{P(s'|s,a) V∗(s′)}] | ||
``` | ||
|
||
|
||
where | ||
|
||
> $P(s'|s,a)$ is the transition probability | ||
from previous state $s$ to next state $s'$ given action $a$. | ||
|
||
| 0.81 | 0.9 | 1 | 🟢 | | ||
| ---- | ---- | ---- | ---- | | ||
| 0.73 | | 0.9 | 🔴 | | ||
| 0.66 | 0.73 | 0.81 | 0.73 | | ||
|
||
So new state will be | ||
|
||
| 0.71 | 0.74 | 0.86 | 🟢 | | ||
| ---- | ---- | ---- | ---- | | ||
| 0.66 | | 0.39 | 🔴 | | ||
| 0.55 | 0.46 | 0.36 | 0.22 | | ||
|
||
|
||
--- | ||
[Prev](../01-bellman-equation/README.md) | **[Home](../../../README.md)** | [Next](../03-plan-policy/README.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# 03 - Policy and Plan | ||
|
||
### Policy | ||
|
||
It is a function that maps the current state of the environment to an action. | ||
|
||
It is based on **Bellman equation**. | ||
|
||
Deals with the **getting to Reward**. | ||
|
||
| ➡️ | ➡️ | ➡️ | 🟢 | | ||
| --- | --- | --- | --- | | ||
| ⬆️ | | ⬆️ | 🔴 | | ||
| ⬆️ | ➡️ | ⬆️ | ⬅️ | | ||
|
||
|
||
|
||
### Plan | ||
|
||
It is a sequence of actions that will be executed by the agent. | ||
|
||
It is based on **Markov Decision Process**. | ||
|
||
Deals with the problem of **finding the optimal plan**. | ||
|
||
| ➡️ | ➡️ | ➡️ | 🟢 | | ||
| --- | --- | --- | --- | | ||
| ⬆️ | | ⬅️ | 🔴 | | ||
| ⬆️ | ⬅️ | ⬅️ | ⬇️ | | ||
|
||
|
||
--- | ||
[Prev](../02-mdp/README.md) | **[Home](../../../README.md)** | [Next](../04-penalty/README.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# 04 - Leaving a Penalty | ||
|
||
Here every move is marked as a penalty. The goal is to *reach the end of the maze* or *get to reward* as fast as possible. | ||
|
||
Based on the [optimal plan](../03-plan-policy/README.md) | ||
|
||
| ➡️ | ➡️ | ➡️ | 🟢 | | ||
| --- | --- | --- | --- | | ||
| ⬆️ | | ⬅️ | 🔴 | | ||
| ⬆️ | ⬅️ | ⬅️ | ⬇️ | | ||
|
||
Setting penalty for every move, point distribution will be | ||
|
||
| -0.1 | -0.1 | -0.1 | 🟢 | | ||
| ---- | ---- | ---- | ---- | | ||
| -0.1 | | -0.1 | 🔴 | | ||
| -0.1 | -0.1 | -0.1 | -0.1 | | ||
|
||
So optimal path will become | ||
|
||
| ➡️ | ➡️ | ➡️ | 🟢 | | ||
| --- | --- | --- | --- | | ||
| ⬆️ | | ⬆️ | 🔴 | | ||
| ⬆️ | ⬅️ | ⬆️ | ⬅️ | | ||
|
||
|
||
--- | ||
[Prev](../03-plan-policy/README.md) | [Home](../../../README.md) | [Next](../03-plan-policy/README.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Artificial Intelligence A-Z | ||
|
||
- [x] [01-bellman-equation](./01-bellman-equation/README.md) | ||
- [x] [02-mdp](./02-mdp/README.md) | ||
- [x] [03-plan-policy](./03-plan-policy/README.md) | ||
- [x] [04-penalty](./04-penalty/README.md) | ||
|
||
|
||
### Resources | ||
|
||
* Artificial Intelligence A-Z - [Udemy](https://www.udemy.com/course/artificial-intelligence-az/) |