It will be helpful to read the accompanying file "Detecting Infected Hosts Using Machine Learning.pdf".
Description
During our work we note through manual analysis that several hosts we are responsible for are infected with malware. However, we are responsible for hundreds of hosts and this threat is too pressing to conduct a manual review of each one individually. As such we construct an automated solution which can identify the probability of a host being infected by filtering data about their CPU and memory usage. We test a small sample of hosts manually and create the file “CPU and Memory Data.csv”; we derive all other data from this initial file.
The solution to this problem is constructed of three parts. Part 1 is the data extraction and mapping phase. Part 2 is a binary solution implemented in Excel using a simplistic machine learning algorithm. Part 3 is a probabilistic solution implemented in Python using machine learning based Logistic Regression.