-
Notifications
You must be signed in to change notification settings - Fork 1
Data Analysers
The Modaclouds-SDAWeka provides two statistical analysers:
Forecasting SDAs are another important component of the MODAClouds Monitoring Platform. At runtime, the system will face varying workload like burstiness and the system should react automatically to deal with it such as starting a new virtual machine to balance the requests. Therefore it is necessary for the system to predict the incoming workload so that there is enough time for it to actuate.
In addition to forecast the workload, other metrics could also be predicted. For instance, if there is a seasonal pattern in the CPU utilization the future usage could be predicted based on the pattern. This applies to all the metrics we have. As long as a certain value is needed, the time series forecasting could be employed.
We have implemented a number of forecasting methods including both time series forecasting and machine learning based forecasting algorithms. In the Modaclouds SDAWeka, we provide three different machine learning algorithms, there are Linear Regression, Gaussian Process and SMO regression methods from the Weka library.
Correlation SDAs are a non negligible component of the Monitoring Platform. It is mainly used to obtain estimations for metrics that haven’t been monitored. For instance, to obtain the response time at run time requires looking at both the arrival and departure timestamps of the request, which poses overhead to the system. Instead, a machine learning model could be trained offline with a benchmark to correlate metrics like CPU utilization, throughput and the response time to extract the potential patterns between them. Besides the value of the response time, the Correlation SDA also supports estimating classes. From the last example, we could classify the response time as ”violation” or ”success” based on a predefined threshold. Then the training model will report if the response time has violated or not at run time with CPU utilization and throughput.
We use Weka to implement the machine learning methods. For the correlation methods, we provide two functionalities. One is to correlate the values of the metric, the other one is to correlate the classes of the metric. For instance, the response time could be classified as true or false given that if is above a predefined threshold. For different functionalities, different algorithms are provided.
- Correlate classes: Naive Bayes and SMO methods are provided.
- Correlate values: Linear Regression and SMO regression are provided.