This work aims at using different machine learning techniques in detecting anomalies (including hardware failures, sabotage and cyber-attacks) in SCADA water infrastructure.
The dataset used is published here
If you want to cite the paper please use the following format;
@InProceedings{10.1007/978-3-030-12786-2_1,
author="Hindy, Hanan and Brosset, David and Bayne, Ethan and Seeam, Amar and Bellekens, Xavier",
editor="Katsikas, Sokratis K. and Cuppens, Fr{\'e}d{\'e}ric and Cuppens, Nora and Lambrinoudakis, Costas and Ant{\'o}n, Annie and Gritzalis, Stefanos and Mylopoulos, John and Kalloniatis, Christos",
title="Improving SIEM for Critical SCADA Water Infrastructures Using Machine Learning",
booktitle="Computer Security",
year="2019",
publisher="Springer International Publishing",
address="Cham",
pages="3--19"
}
- Logistic Regression
- Gaussian Naive Bayes
- k-Nearest Neighbours
- Support Vector Machine
- Decision Trees
- Random Forests
Clone this repository
run preprocessing.py [dataset log path]
run classification.py [data processed file path]
run classification-with-confidence.py [data processed file path]
- The output of preprocessing will be saved in the cloned directory as 'dataset_processed.csv'
- The classification outputs is separated in folders for each output (anomaly, affected component, scenario, etc.). Each folder contains a csv for each algorithm having its confusion matrix and a 'CrossValidation.csv' file with the combined results.