Skip to content
This repository has been archived by the owner on Mar 22, 2024. It is now read-only.

Commit

Permalink
101 to 104
Browse files Browse the repository at this point in the history
  • Loading branch information
pratikkabade committed Jan 6, 2023
0 parents commit 78d65ba
Show file tree
Hide file tree
Showing 6 changed files with 170 additions and 0 deletions.
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# AI
This repository contains the code for the **artificial intelligence** that I have learned.

## Artificial Intelligence A-Z

- [x] [01-bellman-equation](./archive/01/01-bellman-equation/README.md)
- [x] [02-mdp](./archive/01/02-mdp/README.md)
- [x] [03-plan-policy](./archive/01/03-plan-policy/README.md)
- [x] [04-penalty](./archive/01/04-penalty/README.md)


### Resources

* Artificial Intelligence A-Z - [Udemy](https://www.udemy.com/course/artificial-intelligence-az/)
39 changes: 39 additions & 0 deletions archive/01/01-bellman-equation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# 01 - Bellman Equation

The Bellman equation is a fundamental equation in dynamic programming. It is used to solve the optimal control problem. The Bellman equation is

```
V*(s)= max[R(s,a)+ γ V∗(s′)]
```

where,

> $V^*$ is the value function,
>
> $R(s,a)$ is the reward function,
>
> s is the State,
>
> a is the Action,
>
> $\gamma$ is the discount factor.

### Maze Example

| [ - ] | [ - ] | [ - ] | 🟢 |
| ----- | ----- | ----- | ----- |
| [ - ] | | [ - ] | 🔴 |
| [ - ] | [ - ] | [ - ] | [ - ] |


Considering $\gamma$ = 0.9, the value function for the maze will be

| 0.81 | 0.9 | 1 | 🟢 |
| ---- | ---- | ---- | ---- |
| 0.73 | | 0.9 | 🔴 |
| 0.66 | 0.73 | 0.81 | 0.73 |

---

**[Home](../../../README.md)** | [Next](../02-mdp/README.md)
45 changes: 45 additions & 0 deletions archive/01/02-mdp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# 02 - Markov Decision Processes

This is a collection of resources for learning about Markov Decision Processes (MDPs). MDPs are a powerful tool for modeling sequential decision making problems. They are used in a wide variety of applications, including robotics, game theory, and reinforcement learning.


### Markov Property

A Markov process is a stochastic process that satisfies the Markov property. The Markov property states that the future is independent of the past given the present. In other words, the future is conditionally independent of the past given the present. The Markov property is a fundamental assumption in Markov decision processes.


### Markov Decision Process

A Markov decision process (MDP) is a Markov process that includes decisions. The decision maker chooses an action at each time step. The decision maker's goal is to maximize the expected value of the cumulative reward. The decision maker's goal is to find the optimal policy, which is a mapping from states to actions that maximizes the expected value of the cumulative reward.


### Bellman Equation

The modified Bellman equation from [bellman equation](../01-bellman-equation/README.md) will be used to solve the optimal control problem. The Bellman equation is

```
V*(s)= max[R(s,a)+ γ E{P(s'|s,a) V∗(s′)}]
```


where

> $P(s'|s,a)$ is the transition probability
from previous state $s$ to next state $s'$ given action $a$.

| 0.81 | 0.9 | 1 | 🟢 |
| ---- | ---- | ---- | ---- |
| 0.73 | | 0.9 | 🔴 |
| 0.66 | 0.73 | 0.81 | 0.73 |

So new state will be

| 0.71 | 0.74 | 0.86 | 🟢 |
| ---- | ---- | ---- | ---- |
| 0.66 | | 0.39 | 🔴 |
| 0.55 | 0.46 | 0.36 | 0.22 |


---
[Prev](../01-bellman-equation/README.md) | **[Home](../../../README.md)** | [Next](../03-plan-policy/README.md)
33 changes: 33 additions & 0 deletions archive/01/03-plan-policy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# 03 - Policy and Plan

### Policy

It is a function that maps the current state of the environment to an action.

It is based on **Bellman equation**.

Deals with the **getting to Reward**.

| ➡️ | ➡️ | ➡️ | 🟢 |
| --- | --- | --- | --- |
| ⬆️ | | ⬆️ | 🔴 |
| ⬆️ | ➡️ | ⬆️ | ⬅️ |



### Plan

It is a sequence of actions that will be executed by the agent.

It is based on **Markov Decision Process**.

Deals with the problem of **finding the optimal plan**.

| ➡️ | ➡️ | ➡️ | 🟢 |
| --- | --- | --- | --- |
| ⬆️ | | ⬅️ | 🔴 |
| ⬆️ | ⬅️ | ⬅️ | ⬇️ |


---
[Prev](../02-mdp/README.md) | **[Home](../../../README.md)** | [Next](../04-penalty/README.md)
28 changes: 28 additions & 0 deletions archive/01/04-penalty/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# 04 - Leaving a Penalty

Here every move is marked as a penalty. The goal is to *reach the end of the maze* or *get to reward* as fast as possible.

Based on the [optimal plan](../03-plan-policy/README.md)

| ➡️ | ➡️ | ➡️ | 🟢 |
| --- | --- | --- | --- |
| ⬆️ | | ⬅️ | 🔴 |
| ⬆️ | ⬅️ | ⬅️ | ⬇️ |

Setting penalty for every move, point distribution will be

| -0.1 | -0.1 | -0.1 | 🟢 |
| ---- | ---- | ---- | ---- |
| -0.1 | | -0.1 | 🔴 |
| -0.1 | -0.1 | -0.1 | -0.1 |

So optimal path will become

| ➡️ | ➡️ | ➡️ | 🟢 |
| --- | --- | --- | --- |
| ⬆️ | | ⬆️ | 🔴 |
| ⬆️ | ⬅️ | ⬆️ | ⬅️ |


---
[Prev](../03-plan-policy/README.md) | [Home](../../../README.md) | [Next](../03-plan-policy/README.md)
11 changes: 11 additions & 0 deletions archive/01/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Artificial Intelligence A-Z

- [x] [01-bellman-equation](./01-bellman-equation/README.md)
- [x] [02-mdp](./02-mdp/README.md)
- [x] [03-plan-policy](./03-plan-policy/README.md)
- [x] [04-penalty](./04-penalty/README.md)


### Resources

* Artificial Intelligence A-Z - [Udemy](https://www.udemy.com/course/artificial-intelligence-az/)

0 comments on commit 78d65ba

Please sign in to comment.