-
Notifications
You must be signed in to change notification settings - Fork 17
Configuration_Layer
The Configuration Layer in the RL-ADN framework is crucial for setting up the environment in which the DRL agents operate. It integrates key components such as the Data Manager, Data Augmentation module, and Distribution Network Simulator, each playing a significant role in ensuring the robustness and efficiency of the training process.
The Data Manager plays a crucial role in managing time-series data, such as active and reactive power demand (( p^D_{i,t} ), ( q^D_{i,t} )), electricity price (( \rho_t )), and renewable power generation (( p^R_{i,t} ), ( q^R_{i,t} )) for specific epochs (( \mathcal{T}, t\in \mathcal{T} )). Previous research approaches to data management have been case-specific and labor-intensive, adding complexity and potential data quality issues. RL-ADN adopts a streamlined approach, standardizing various data preprocessing tasks, and ensuring data integrity and efficient handling. The workflow of the Data Manager is detailed in Appendix~\ref{sec_data_manager_workflow}.
In RL-ADN, the Data Augmentation module plays a pivotal role in enhancing the robustness and generalizability of the trained policy by artificially expanding the diversity of the historical time-series data. With data augmentation, RL-ADN exposes the model to a broader set of scenarios, promoting adaptability and performance in varied and unforeseen situations.
The Data Augmentation module is designed to generate synthetic time-series data, capturing the stochastic nature of load in the power system and reflecting realistic operational conditions. The module interacts with the Data Manager to retrieve the necessary preprocessed data and then applies its augmentation algorithms to produce an augmented dataset. The output is a synthetic yet realistic dataset that reflects the variability and unpredictability inherent in distribution network systems. This enriched dataset is crucial for training RL agents, providing them with a diverse range of scenarios to learn from and ultimately resulting in a more adaptable and robust decision-making policy. The workflow of the Data Augmentation module is described in Appendix~\ref{sec_data_augmentation_workflow}.
For a distribution network, the node-set ( \mathcal{N} ) and the line set ( \mathcal{L} ) define the topology. Each node ( i \in \mathcal{N} ) and lines ( l_{i,j} \in \mathcal{L} ) specify their attributes. A specific subset ( \mathcal{B}, \mathcal{B} \subset \mathcal{N} ) describes ESSs connected to the distribution network nodes. Importantly, the number of ESSs delineates the resulting state space ( \mathcal{S} ) and action space ( \mathcal{A} ).
The main function of the Distribution Network Simulator is to calculate power flow when a new scenario is fed into the environment, performing as the main part of the state transition function for the formulated MDP task. Based on the provided distribution network configuration data, we offer two modules, PandaPower and GridTensor, to create the Distribution Network Simulator. PandaPower provides the traditional iterative methods while GridTensor integrates a fast Laurent power flow for calculating the distribution network state presented by the voltage magnitudes, currents, and power flowing in the lines.
For more detailed information, refer to the related paper: A fixed-point current injection power flow for electric distribution systems using Laurent series.
For each time step ( t ) in an episode, the agent obtains the current state ( s_t ) and determines an action ( a_t ) to be executed in the environment. Once ( a_t ) is received, the environment will execute the step function to perform power flow, and update the status of ESSs and the distribution network. This action's consequence is observed at the current time step ( t ). Based on these resultant observations, the reward ( r_t ) is calculated by the designed reward calculation block. Next, the Data Manager in the environment samples external time-series data for the next time step ( t+1 ), including demand, renewable energy generation, and price, emulating the stochastic fluctuations of the environment. These external variables are combined with updated internal observations, forming the resultant transition of the environment.
Users can freely design the build-state block to explore how different states influence the performance of algorithms on various tasks. Similarly, the cal-reward block can be tailored to different optimization tasks. For the convenience of users, our framework provides a default state pattern and reward calculation method.