Skip to content
/ avlearn Public

Advantage and Values-State functions learning

Notifications You must be signed in to change notification settings

rohey/avlearn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 

Repository files navigation

πŸ‡¦ πŸ‡» πŸ‡± πŸ‡ͺ πŸ‡¦ πŸ‡· πŸ‡³ πŸ€–

Advantage (A) and State-Value (V) functions learning

This is an alternative method to Q-learning, where a state-action value function Q is no longer considered as it is, and instead, a sum of advantage value function A and a state value function V is used (which is basically Q = A + V). Some may think it is superfluous, but we will try experimenting first and see where it gets us.

Also, please note that the only similarity with the work is in representing Q as A + V. The update (learning) algorithm is different here.

The code will be here soon!

gymnasium [MuJoCo]

Ant-V4

Trained for 3 000 000 timesteps

AV ( Ant-V4) SAC (Ant-V1) TD3 (Ant-V1)
7200 6000 6000
Ant-v4.mp4

HalfCheetah-V4

Trained for 3 000 000 timesteps

AV (HalfCheetah-V4) SAC (HalfCheetah-V1) TD3 (HalfCheetah-V1)
17000 16000 12000
HalfCheetah-v4.mp4
HalfCheetah-v4-r2.mp4

Swimmer-V4

Trained for 300 000 timesteps

AV (Swimmer-V4) SAC (Swimmer-V0) TD3 (Swimmer-V0)
270 40 40
Swimmer-v4.mp4

About

Advantage and Values-State functions learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published