101 to 104

DevelopmentGuide · Jan 6, 2023 · 78d65ba · 78d65ba
commit 78d65ba
Show file tree

Hide file tree

Showing 6 changed files with 170 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -0,0 +1,14 @@
+# AI
+This repository contains the code for the **artificial intelligence** that I have learned.
+
+## Artificial Intelligence A-Z
+
+- [x] [01-bellman-equation](./archive/01/01-bellman-equation/README.md)
+- [x] [02-mdp](./archive/01/02-mdp/README.md)
+- [x] [03-plan-policy](./archive/01/03-plan-policy/README.md)
+- [x] [04-penalty](./archive/01/04-penalty/README.md)
+
+
+### Resources
+
+* Artificial Intelligence A-Z - [Udemy](https://www.udemy.com/course/artificial-intelligence-az/)
diff --git a/archive/01/01-bellman-equation/README.md b/archive/01/01-bellman-equation/README.md
@@ -0,0 +1,39 @@
+# 01 - Bellman Equation
+
+The Bellman equation is a fundamental equation in dynamic programming. It is used to solve the optimal control problem. The Bellman equation is     
+
+```
+V*(s)= max[R(s,a)+ γ V∗(s′)]
+```
+
+where,
+
+> $V^*$ is the value function, 
+> 
+> $R(s,a)$ is the reward function, 
+> 
+> s is the State,
+>
+> a is the Action,
+> 
+> $\gamma$ is the discount factor. 
+ 
+
+### Maze Example
+
+| [ - ] | [ - ] | [ - ] | 🟢     |
+| ----- | ----- | ----- | ----- |
+| [ - ] |       | [ - ] | 🔴     |
+| [ - ] | [ - ] | [ - ] | [ - ] |
+
+
+Considering $\gamma$ = 0.9, the value function for the maze will be 
+
+| 0.81 | 0.9  | 1    | 🟢    |
+| ---- | ---- | ---- | ---- |
+| 0.73 |      | 0.9  | 🔴    |
+| 0.66 | 0.73 | 0.81 | 0.73 |
+
+---
+
+**[Home](../../../README.md)** | [Next](../02-mdp/README.md)
diff --git a/archive/01/02-mdp/README.md b/archive/01/02-mdp/README.md
@@ -0,0 +1,45 @@
+# 02 - Markov Decision Processes
+
+This is a collection of resources for learning about Markov Decision Processes (MDPs). MDPs are a powerful tool for modeling sequential decision making problems. They are used in a wide variety of applications, including robotics, game theory, and reinforcement learning.
+
+
+### Markov Property
+
+A Markov process is a stochastic process that satisfies the Markov property. The Markov property states that the future is independent of the past given the present. In other words, the future is conditionally independent of the past given the present. The Markov property is a fundamental assumption in Markov decision processes.
+
+
+### Markov Decision Process
+
+A Markov decision process (MDP) is a Markov process that includes decisions. The decision maker chooses an action at each time step. The decision maker's goal is to maximize the expected value of the cumulative reward. The decision maker's goal is to find the optimal policy, which is a mapping from states to actions that maximizes the expected value of the cumulative reward.
+
+
+### Bellman Equation
+
+The modified Bellman equation from [bellman equation](../01-bellman-equation/README.md) will be used to solve the optimal control problem. The Bellman equation is
+
+```
+V*(s)= max[R(s,a)+ γ E{P(s'|s,a) V∗(s′)}]
+```
+
+
+where
+
+> $P(s'|s,a)$ is the transition probability
+
+from previous state $s$ to next state $s'$ given action $a$.
+
+| 0.81 | 0.9  | 1    | 🟢    |
+| ---- | ---- | ---- | ---- |
+| 0.73 |      | 0.9  | 🔴    |
+| 0.66 | 0.73 | 0.81 | 0.73 |
+
+So new state will be
+
+| 0.71 | 0.74 | 0.86 | 🟢    |
+| ---- | ---- | ---- | ---- |
+| 0.66 |      | 0.39 | 🔴    |
+| 0.55 | 0.46 | 0.36 | 0.22 |
+
+
+---
+[Prev](../01-bellman-equation/README.md) | **[Home](../../../README.md)** | [Next](../03-plan-policy/README.md)
diff --git a/archive/01/03-plan-policy/README.md b/archive/01/03-plan-policy/README.md
@@ -0,0 +1,33 @@
+# 03 - Policy and Plan
+
+### Policy
+
+It is a function that maps the current state of the environment to an action. 
+
+It is based on **Bellman equation**.
+
+Deals with the **getting to Reward**.
+
+| ➡️   | ➡️   | ➡️   | 🟢   |
+| --- | --- | --- | --- |
+| ⬆️   |     | ⬆️   | 🔴   |
+| ⬆️   | ➡️   | ⬆️   | ⬅️   |
+
+
+
+### Plan
+
+It is a sequence of actions that will be executed by the agent. 
+
+It is based on **Markov Decision Process**.
+
+Deals with the problem of **finding the optimal plan**.
+
+| ➡️   | ➡️   | ➡️   | 🟢   |
+| --- | --- | --- | --- |
+| ⬆️   |     | ⬅️   | 🔴   |
+| ⬆️   | ⬅️   | ⬅️   | ⬇️   |
+
+
+---
+[Prev](../02-mdp/README.md) | **[Home](../../../README.md)** | [Next](../04-penalty/README.md)
diff --git a/archive/01/04-penalty/README.md b/archive/01/04-penalty/README.md
@@ -0,0 +1,28 @@
+# 04 - Leaving a Penalty
+
+Here every move is marked as a penalty. The goal is to *reach the end of the maze* or *get to reward* as fast as possible.
+
+Based on the [optimal plan](../03-plan-policy/README.md)
+
+| ➡️   | ➡️   | ➡️   | 🟢   |
+| --- | --- | --- | --- |
+| ⬆️   |     | ⬅️   | 🔴   |
+| ⬆️   | ⬅️   | ⬅️   | ⬇️   |
+
+Setting penalty for every move, point distribution will be
+
+| -0.1 | -0.1 | -0.1 | 🟢    |
+| ---- | ---- | ---- | ---- |
+| -0.1 |      | -0.1 | 🔴    |
+| -0.1 | -0.1 | -0.1 | -0.1 |
+
+So optimal path will become
+
+| ➡️   | ➡️   | ➡️   | 🟢   |
+| --- | --- | --- | --- |
+| ⬆️   |     | ⬆️   | 🔴   |
+| ⬆️   | ⬅️   | ⬆️   | ⬅️   |
+
+
+---
+[Prev](../03-plan-policy/README.md) | [Home](../../../README.md) | [Next](../03-plan-policy/README.md)
diff --git a/archive/01/README.md b/archive/01/README.md
@@ -0,0 +1,11 @@
+# Artificial Intelligence A-Z
+
+- [x] [01-bellman-equation](./01-bellman-equation/README.md)
+- [x] [02-mdp](./02-mdp/README.md)
+- [x] [03-plan-policy](./03-plan-policy/README.md)
+- [x] [04-penalty](./04-penalty/README.md)
+
+
+### Resources
+
+* Artificial Intelligence A-Z - [Udemy](https://www.udemy.com/course/artificial-intelligence-az/)