Skip to content


Folders and files

Last commit message
Last commit date

Latest commit

cd9e7ef · Mar 20, 2019
Mar 7, 2019
Mar 19, 2019
Mar 19, 2019
Mar 18, 2019
Mar 17, 2019
Mar 8, 2019
Feb 15, 2019
Mar 8, 2019
Mar 20, 2019
Mar 16, 2019
Mar 16, 2019
Mar 12, 2019

Repository files navigation

Virtual Microgrid Segmentation

Stanford CS234 Final Project, Winter 2019

Instructor: Prof. Emma Brunskill

Class website:

Project team: Bennet Meyers and Siobhan Powell

Contact the authors: bennetm or siobhan.powell at stanford dot edu


Recent work has shown that microgrids can increase both grid flexibility and grid resiliency to unanticipated outages caused by events such as cyber attacks or extreme weather. A subclass of microgrids, known as “virtual islands”, occur when sections of a grid operate in isolation without any powerflow between them and the larger grid, despite remaining physically connected. If a grid can can partition into virtual islands in anticipation of an incoming resiliency event, customers in those islands will be less likely to experience outages.

The goal of this project is to train a deep reinforcement learning (RL) agent to create and maintain as many small virtual islands as possible by operating a grids storage resources. The agent is rewarded for separating nodes from the external grid connection and for splitting the graphs into as many segments as possible.

As our environment is deterministic, we implement PG (policy gradient) and DDPG (deep deterministic policy gradient) algorithms to train the agent, and apply it to a small test network. We find the DDPG performs the best, and it can successfully maintain microgrids even when the loads are time varying and change between episodes.

The DDPG algorithm

The DDPG algorithm was introduced by Lillicrap et al in "Continous control with deep reinforcement learning", available on arXiv at

This algorithm builds on the DPG deterministic actor-critic approach proposed by Silver et al in "Deterministic Policy Gradient Algorithms", available at DDPG combines this approach with the successes of deep learning from DQN. It is model-free, off-policy, and has been shown to learn complex continuous control tasks in high dimensions quite well.

Standard stochastic PG involves taking the expectation over the distribution of actions to calculate the gradient step. DDPG simply moves the policy in the direction of the gradient of Q, removing the need for an integral over the action space, making it much more efficient at learning in our environment.

In DDPG the algorithm builds a critic network to estimate the state action value function, Q(s,a). An actor network is built to learn a behaviour from the critic estimation. The algorithm learns a deterministic policy but implements a stochastic behaviour policy by adding noise to the action choice to properly explore the solution space. The tuning and scheduling of this exploration noise term is crucial to the success of the algorithm.

To help with convergence and stability, the algorithm is implemented with experience replay and with semi-stationary target networks. For more information on the theory and the algorithm applied, please refer to the papers.

Structure of the Code

There are two main sides to the code: the network and the agents.

The network is generated using Pandapower (

The NetModel class in powerflow/ maintains the network object throughout the simulation. It controls how the agent can interact with the network and with the powerflow simulations with methods to step in time, calculate the reward, reset the network, report the state to the agent, and update the network devices. These devices include uncontrollable and controllable devices: loads and static generators are set by an uncontrollable unknown feed; the powers of storage and diesel generators are controlled by the agent.

The initial network is generated by functions in powerflow/ using configurations stored in configs. Each config defines all the parameters behind one test set up, including those of the network and some elements of the agent set up.

The ActorNetwork and CriticNetwork objects are created in agents/ and agents/, and the DDPG object uses them to learn the optimal policy. DDPG manages the training of the actor/critic networks and controls the interactions with the grid network model.

Code organization

The main folder contains scratch notebooks for testing, developing, and interacting with the environments.

The scripts folder contains scripts to run the algorithms. For example, change the environment name or config name in and then run


to start the simulation.

The virtual_microgrids folder contains all the pieces of the simulation. To run you do not need to change anything in here, but to change parameters or change the algorithm you will need to work with these files.

  • The subfolder agents contains the classes to build the actor and critic network objects.
  • The algorithms subfolder classes which run the PG and DDPG implementations.
  • The configs subfolder contains the configuration files for each test case and network. To create a new or altered test case, create a new config file in the style of, for example.
  • The powerflow subfolder contains a class to manage the power network and functions to create the networks from the config files
  • The utils subfolder contains tools used throughout the other methods and functions, including the schedules used to generate the noise

The results folder contains the outputs from running the algorithm. Running the command

tensorboard --logdir [path to results folder]

and then visiting


in your browser will let you inspect the tensorflow setup and see plots of the results.