🇦 🇻 🇱 🇪 🇦 🇷 🇳 🤖

Advantage (A) and State-Value (V) functions learning

This is an alternative method to Q-learning, where a state-action value function Q is no longer considered as it is, and instead, a sum of advantage value function A and a state value function V is used (which is basically Q = A + V). Some may think it is superfluous, but we will try experimenting first and see where it gets us.

Also, please note that the only similarity with the work is in representing Q as A + V. The update (learning) algorithm is different here.

The code will be here soon!

gymnasium [MuJoCo]

Ant-V4

Trained for 3 000 000 timesteps

AV ( Ant-V4)	SAC (Ant-V1)	TD3 (Ant-V1)
7200	6000	6000

Ant-v4.mp4

HalfCheetah-V4

Trained for 3 000 000 timesteps

AV (HalfCheetah-V4)	SAC (HalfCheetah-V1)	TD3 (HalfCheetah-V1)
17000	16000	12000

HalfCheetah-v4.mp4

HalfCheetah-v4-r2.mp4

Swimmer-V4

Trained for 300 000 timesteps

AV (Swimmer-V4)	SAC (Swimmer-V0)	TD3 (Swimmer-V0)
270	40	40

Swimmer-v4.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🇦 🇻 🇱 🇪 🇦 🇷 🇳 🤖

Advantage (A) and State-Value (V) functions learning

gymnasium [MuJoCo]

Ant-V4

HalfCheetah-V4

Swimmer-V4

About

Releases

Packages

rohey/avlearn

Folders and files

Latest commit

History

Repository files navigation

🇦 🇻 🇱 🇪 🇦 🇷 🇳 🤖

Advantage (A) and State-Value (V) functions learning

gymnasium [MuJoCo]

Ant-V4

HalfCheetah-V4

Swimmer-V4

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages