🇦 🇻 🇱 🇪 🇦 🇷 🇳 🤖

Advantage (A) and State-Value (V) functions learning

This is an alternative method to Q-learning, where a state-action value function Q is no longer considered as it is, and instead, a sum of advantage value function A and a state value function V is used (which is basically Q = A + V). Some may think it is superfluous, but we will try experimenting first and see where it gets us.

Also, please note that the only similarity with the work is in representing Q as A + V. The update (learning) algorithm is different here.

The code will be here soon!

gymnasium [MuJoCo]

Ant-V4

Trained for 3 000 000 timesteps

AV ( Ant-V4)	SAC (Ant-V1)	TD3 (Ant-V1)
7200	6000	6000

Ant-v4.mp4

HalfCheetah-V4

Trained for 3 000 000 timesteps

AV (HalfCheetah-V4)	SAC (HalfCheetah-V1)	TD3 (HalfCheetah-V1)
17000	16000	12000

HalfCheetah-v4.mp4

HalfCheetah-v4-r2.mp4

Swimmer-V4

Trained for 300 000 timesteps

AV (Swimmer-V4)	SAC (Swimmer-V0)	TD3 (Swimmer-V0)
270	40	40

Swimmer-v4.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🇦 🇻 🇱 🇪 🇦 🇷 🇳 🤖

Advantage (A) and State-Value (V) functions learning

gymnasium [MuJoCo]

Ant-V4

HalfCheetah-V4

Swimmer-V4

Files

README.md

Latest commit

History

README.md

File metadata and controls

🇦 🇻 🇱 🇪 🇦 🇷 🇳 🤖

Advantage (A) and State-Value (V) functions learning

gymnasium [MuJoCo]

Ant-V4

HalfCheetah-V4

Swimmer-V4