UEPREDICT

Method for model prediction and evaluation of DRAM Uncorrected Errors (UEs).

Supplemental code for the SC20 paper "Cost-Aware Prediction of Uncorrected DRAM Errors in the Field".

The project is structured as follows:

The folder ue_predict is a module containing the necessary files and methods for training and evaluation.
The scripts train_test.py and evaluation.py provide a way of executing the 'ue_predict' module by allowing the configuration of specific parameters, such as the length of the prediction window or the prediction frequency.
The folder data contains the files needed for training and evaluation.
The folder synthetic_ues contains a synthetic log file of UEs based on the production MareNostrum 3 error logs. We provide this log file in order to enable future studies to quantify the real-world impact of DRAM uncorrected errors and any proposed resiliency techniques.

The UEPREDICT code is released under the BSD-3 License.

Running scripts

Scripts can be executed directly from the shell as python 3 files. First, execute the train_test.py in order to compute the predictions using walk-forward validation. Scripts' arguments are discribed in next section.

python3 train_test.py --verbose

If executed with the verbose argument, it prints information at each training/testing step, such as the confusion matrix values for train and test sets or the best hyperparamters.

After its execution, it will generate a file with the probabilities calculated by the model of each instance belonging to class 1 (i.e. having an Uncorrected Error in the next prediction window), alongside the correct class label. This file is stored as data/predictions.csv by default.

Once the predictions file is generated, it can be evaluated by executing the evaluation.py script, which gives information such as the number of impact mitigations performed or the number of correctly predicted UEs.

python3 evaluation.py --verbose

Arguments

Train/test

Switch	Long switch	Description
-pw	--prediction-window	Specifies the prediction window. It has to be defined in the form of pandas time offset aliases, see offset aliases for more information.
-pf	--prediction-frequency	Specifies the prediction frequency. It has to be defined in the form of pandas time offset aliases, see offset aliases for more information.
-tf	--train-frequency	Specifies the training frequency. It has to be defined in the form of pandas time offset aliases, see offset aliases for more information.
-i-fts	--input-features	Path to the input features file, a CSV containing the features data.
-i-ues	--input-ues	Path to the input Uncorrected Errors file, a CSV containing the UEs data.
-o	--output-predictions	Path of the output file, a CSV containing the predictions data after the train/test iterations.
-clf	--classifier	Classifier to use for training and predicting. Choices: RF, GBDT, LR, GNB, SVM or NN.
-ru	--random-undersampling	Ratio of the number of samples in the minority class over the number of samples in the majority class after resampling. The ratio is expressed as N_m/N_rM, where N_m is the number of samples in the minority class and N_rM is the number of samples in the majority class after resampling. If the ratio is 0, do not sample.
	--verbose	If specified, shows information during the execution, such as performance, training times, etc. at each train/test split.

Evaluation

Switch	Long switch	Description
-pw	--prediction-window	Specifies the prediction window. It has to be defined in the form of pandas time offset aliases, see offset aliases for more information.
	--mitigation-time	Specifies the time needed for performing an impact mitigation. It has to be defined in the form of pandas time offset aliases, see offset aliases for more information.
-i-preds	--input-predictions	Path to the input predictions file, a CSV containing the predictions data generated by the train_test script.
-i-ues	--input-ues	Path to the input Uncorrected Errors file, a CSV containing the UEs data.
-o	--output-evaluations	Path of the output file, a CSV containing the evaluations data.
	--verbose	If specified, shows the information regarding split the evaluation of the predictions.

Acknowledgment

This work was supported by the Spanish Ministry of Science and Technology (project PID2019-107255GB), Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272) and the European Union’s Horizon 2020 research and innovation programme and EuroEXA project (grant agreement No 754337). Paul Carpenter and Marc Casas hold the Ramon y Cajal fellowship under contracts RYC2018-025628-I and RYC2017-23269, respectively, of the Ministry of Economy and Competitiveness of Spain.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
data		data
environment		environment
synthetic_ues		synthetic_ues
ue_predict		ue_predict
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluation.py		evaluation.py
requirements.txt		requirements.txt
train_test.py		train_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UEPREDICT

Running scripts

Arguments

Train/test

Evaluation

Acknowledgment

About

Releases 2

Packages

Contributors 3

Languages

License

bsc-mem/UEPREDICT

Folders and files

Latest commit

History

Repository files navigation

UEPREDICT

Running scripts

Arguments

Train/test

Evaluation

Acknowledgment

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages