@@ -12,7 +12,7 @@ This repository contains multiple projects related to Reinforcement Learning (RL
1212| [ Hands-on RL] ( ./动手学强化学习/ ) | ![ Status] ( https://img.shields.io/badge/status-reference-informational ) | ![ Completion] ( https://img.shields.io/badge/completion-100%25-brightgreen ) | ![ Tech] ( https://img.shields.io/badge/tech-DQN%20to%20DDPG-blue ) | [ README] ( ./动手学强化学习/README.md ) |
1313| [ MADDPG_Continous] ( ./MADDPG_Continous/ ) | ![ Status] ( https://img.shields.io/badge/status-completed-success ) | ![ Completion] ( https://img.shields.io/badge/completion-100%25-brightgreen ) | ![ Tech] ( https://img.shields.io/badge/tech-continuous%20MADDPG-blue ) | [ README] ( ./MADDPG_Continous/README_EN.md ) |
1414| [ MATD3_Continous] ( ./MATD3_Continous/ ) | ![ Status] ( https://img.shields.io/badge/status-completed-success ) | ![ Completion] ( https://img.shields.io/badge/completion-100%25-brightgreen ) | ![ Tech] ( https://img.shields.io/badge/tech-continuous%20MATD3-blue ) | [ README] ( ./MATD3_Continous/readme_en.md ) |
15-
15+ | [ HAPPO-MAPPO_Continous_Heterogeneous ] ( ./HAPPO-MAPPO_Continous_Heterogeneous/ ) | ![ Status ] ( https://img.shields.io/badge/status-completed-success ) | ![ Completion ] ( https://img.shields.io/badge/completion-95%25-brightgreen ) | ![ Tech ] ( https://img.shields.io/badge/tech-PPO%20Heterogeneous-blue ) | [ Documentation ] ( ./HAPPO-MAPPO_Continous_Heterogeneous/Readme_en.md ) |
1616## Learning Path and Project Connections
1717
1818The projects in this repository form a complete learning path from basic reinforcement learning to multi-agent reinforcement learning:
@@ -52,8 +52,7 @@ Reproduction of Professor Shiyu Zhao's reinforcement learning course code from W
5252- [ Professor Zhao's Reinforcement Learning Course] ( https://www.bilibili.com/video/BV1sd4y167NS )
5353- [ Mathematical Foundation of Reinforcement Learning] ( https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning )
5454
55- #### Code Location
56- [ Professor Zhao's RL Code Repository: ./RL_Learning-main] ( ./RL_Learning-main/scripts )
55+ #### Code Location [ Professor Zhao's RL Code Repository: ./RL_Learning-main] ( ./RL_Learning-main/scripts )
5756
5857#### Update Log
5958** 2024.6.7**
@@ -78,8 +77,7 @@ Reproduction and expansion of the code from the book "Hands-on Reinforcement Lea
7877#### Learning Path
7978This section demonstrates the learning path from basic DQN to DDPG, and then to MADDPG, laying the foundation for understanding multi-agent reinforcement learning.
8079
81- #### Code Location
82- [ ./动手学强化学习] ( ./动手学强化学习/ )
80+ #### Code Location [ ./动手学强化学习] ( ./动手学强化学习/ )
8381
8482#### References
8583- [ Hands-on Reinforcement Learning] ( https://hrl.boyuai.com/chapter/2/dqn%E7%AE%97%E6%B3%95 )
@@ -107,15 +105,14 @@ Personal implementation of the MADDPG algorithm based on the latest version of t
107105 <p ><strong >Reward convergence curve of MADDPG algorithm in simple_tag_v3 environment</strong ></p >
108106</div >
109107
110- #### Implementation Progress
108+ ##### Implementation Progress
111109| Algorithm | Status | Location | Core Components |
112110| ----------------| --------| ----------------------| ----------------------------------|
113111| MADDPG | ✅ 1.0 | ` agents/maddpg/ ` | MADDPG_agent, DDPG_agent, buffer |
114112| Independent RL | ⏳ Planned | ` agents/independent/ ` | IndependentRL (planned) |
115113| Centralized RL | ⏳ Planned | ` agents/centralized/ ` | CentralizedRL (planned) |
116114
117- #### Code Location
118- [ ./MADDPG_Continous] ( ./MADDPG_Continous )
115+ ##### Code Location [ ./MADDPG_Continous] ( ./MADDPG_Continous )
119116
120117#### 3.2 MATD3_Continous: Multi-Agent Twin Delayed Deep Deterministic Policy Gradient Algorithm
121118
@@ -134,16 +131,37 @@ Multi-agent extension version of the TD3 algorithm (MATD3: Twin Delayed Deep Det
134131 <p ><strong >Reward convergence curve of MATD3 algorithm in simple_tag_v3 environment</strong ></p >
135132</div >
136133
137- #### MATD3 vs MADDPG
134+ ##### MATD3 vs MADDPG
138135MATD3 enhances standard MADDPG with these key improvements:
139136
1401371 . ** Double Q-Network Design** : Reduces overestimation of action values
1411382 . ** Delayed Policy Updates** : Improves training stability
1421393 . ** Target Policy Smoothing** : Prevents overfitting by adding noise to target actions
1431404 . ** Adaptive Noise Adjustment** : Dynamically adjusts exploration noise based on training progress
144141
145- #### Code Location
146- [ ./MATD3_Continous] ( ./MATD3_Continous )
142+ ##### Code Location [ ./MATD3_Continous] ( ./MATD3_Continous )
143+
144+
145+ #### 3.3 HAPPO-MAPPO: Supporting Heterogeneous Agents in Multi-Agent Proximal Policy Optimization
146+
147+ Implementation of two PPO-based multi-agent algorithms: MAPPO (Multi-Agent Proximal Policy Optimization) and HAPPO (Heterogeneous-Agent Proximal Policy Optimization), providing solutions for continuous action spaces and heterogeneous agent environments.
148+
149+ <div align =" center " >
150+ <img src =" ./HAPPO-MAPPO_Continous_Heterogeneous/data/happo_learning_curve_simple_tag_v3_s23.png " alt =" HAPPO Algorithm Performance " width =" 45% " />
151+ <p ><strong >HAPPO Algorithm Features: Supporting heterogeneous agent cooperation and competition, where each agent can have different observation dimensions</strong ></p >
152+ </div >
153+
154+ ##### Advantages of HAPPO/MAPPO
155+
156+ 1 . ** No Need for Deterministic Policies** : Based on PPO, using stochastic policies, reducing overfitting
157+ 2 . ** Heterogeneous Agent Support** : HAPPO specifically supports heterogeneous agents with different observation dimensions and capabilities
158+ 3 . ** Training Stability** : PPO's clipping mechanism provides more stable training process
159+ 4 . ** Sample Efficiency** : Improves sample utilization through multi-epoch updates
160+ 5 . ** Hyperparameter Robustness** : Less sensitive to hyperparameter selection
161+
162+ ##### Code Location [ ` ./MAPPO_Continous_Homogeneous ` ] ( ./MAPPO_Continous_Homogeneous )
163+ ##### Code Location [ ` ./HAPPO-MAPPO_Continous_Heterogeneous ` ] ( ./HAPPO-MAPPO_Continous_Heterogeneous )
164+
147165
148166## Ongoing Projects
149167- ** MARL** : Multi-agent cooperation and coordination based on deep reinforcement learning
0 commit comments