Simple RL-based selfplay experiments framework

Description

Framework design to perform simple selfplay experiments using only reinforcement learning (no planning - i.e. no MCTS, etc.). At the moment both discrete action space and continuous action space are implemented, however discrete action space is tested a bit more extensively. The framework is quite raw, and hyperparameters are largely untuned, however it does converge to some sane results for simple tests like Cartpole, Acrobot, and Pendulum (that one is contunuous action space). The custom environment - TicTacToe selfplay - learns to play (although quite naive) within 100k iterations.

TODO:

tons of debugging
multithreaded rollout generation
more agents (only clipped-surrogate PPO at the moment)

License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
utils		utils
.gitignore		.gitignore
LICENSE.md		LICENSE.md
actorcritic_net_fc.py		actorcritic_net_fc.py
advantage.py		advantage.py
common_config.py		common_config.py
env_openai_gym.py		env_openai_gym.py
env_tictactoe.py		env_tictactoe.py
env_wrapper.py		env_wrapper.py
policy_play.py		policy_play.py
policy_play_tictac.py		policy_play_tictac.py
policy_train.py		policy_train.py
ppo.py		ppo.py
presets.py		presets.py
readme.md		readme.md
render2d.py		render2d.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple RL-based selfplay experiments framework

Description

License

About

Releases

Packages

Languages

License

avoroshilov/rl-selfplay

Folders and files

Latest commit

History

Repository files navigation

Simple RL-based selfplay experiments framework

Description

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages