Skip to content

Implementation of the SAC algorithm and training of an actor to land lunar lander from gymnasium environments

License

Notifications You must be signed in to change notification settings

sn2727/sac-lunar-lander

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Soft Actor Critic

Untrained Trained Actor

This repository contains an implementation of Soft Actor Critic (SAC) which is an off-policy reinforcement learning algorithm designed for continuous action spaces. It introduces a soft value function and utilizes an entropy regularization term to encourage exploration. SAC aims to optimize both the policy and the value function simultaneously, providing stability and efficient learning in complex environments with high-dimensional state and action spaces.

In entropy-regularized reinforcement learning, the agent gets a bonus reward at each time step proportional to the entropy of the policy at that timestep. Mathematically this renders the RL problem to:

original MNIST images

which is explained in more detail in this derivation.

In sac.ipynb SAC is implemented and well explained. It is then used to safely land a lunarlander from Gymnasium environments.

About

Implementation of the SAC algorithm and training of an actor to land lunar lander from gymnasium environments

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published