GitHub - sn2727/sac-lunar-lander: Implementation of the SAC algorithm and training of an actor to land lunar lander from gymnasium environments

Soft Actor Critic

Untrained	Trained Actor

This repository contains an implementation of Soft Actor Critic (SAC) which is an off-policy reinforcement learning algorithm designed for continuous action spaces. It introduces a soft value function and utilizes an entropy regularization term to encourage exploration. SAC aims to optimize both the policy and the value function simultaneously, providing stability and efficient learning in complex environments with high-dimensional state and action spaces.

In entropy-regularized reinforcement learning, the agent gets a bonus reward at each time step proportional to the entropy of the policy at that timestep. Mathematically this renders the RL problem to:

which is explained in more detail in this derivation.

In sac.ipynb SAC is implemented and well explained. It is then used to safely land a lunarlander from Gymnasium environments.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
resources		resources
LICENSE		LICENSE
README.md		README.md
sac.ipynb		sac.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Soft Actor Critic

About

Releases

Packages

Languages

License

sn2727/sac-lunar-lander

Folders and files

Latest commit

History

Repository files navigation

Soft Actor Critic

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages