After about 100 days/games it achive maximum rewards
eg: Trashing no burgers, making no customers wait
- There is a random number of customers coming to a burger shop each 10 minutes
- Each one order randomly one burger
- It's lunch or diner time
- It's a hollyday
- It's sunny
- There is low trafic
- During lunch they tend to order more bigmac
- On hollydays they tend to order more specials
- The burger shop can make between 3 to 30 burgers each 10 minutes
- Only one kind is made each time
- Burgers are trashed after 30 mintes on the shelf
- There should be no customer waiting more than 10 minutes to get his burger
- There should be no trashed burgers
the agent manage to make perfect jujement after 100 days/games using 3 convolution network and 2 hidden layers to make the decision
$ conda create --name mcdo --file requirements_conda.txt
$ conda activate mcdo
$ python ntest.py
$ python htest.py
$ python rtest.py