Skip to content

Classification

Simone Maurizio La Cava edited this page Jul 23, 2020 · 4 revisions

The classification

The classification is the machine learning problem of identifying the class (category) to which a new observations belong, on basis of a set of observations whose class membership is known (training set).

Since the class of each observation of the training set is known, the classification is considered as a supervised learning technique.

The classifier is an algorithm that implements classification and, after a fit phase in which it is trained with the training set, is able to classify unknown data and assign them some category.

An observation is called sample (or example) and consist of a set of features, variariables which compose the pattern of this one, and has a label which identify the class to which it belongs to.


There are various methods which can be used in order to evaluate the performance of a classifier, as well as various parameters.

To have a look of these techniques and parameters, you can read the Performance page of the wiki.





The dataset

Athena composes the dataset used for the classification with a combination of the measures extracted, and each sample correspond to a subject, while the classes correspond to the different groups of subjects.

Before selecting the classifier type, you have to insert the main data directory (if you did not have inserted it before in a previous step), and to select the wished measures and their spatial parameters: the toolbox will automatically use the previously extracted measures which corresponds to these choices, and will create the dataset composed by a table where the first column represents the class label of each sample, while the other columns represents a different feature.

So, in each row there will be the class label as first element, while the other elements represent the pattern when they are considered together:

group feature 1 feature 2 ... feature n
1 0.101032 0.450403 ... 0.342202
0 0.213402 0.481221 ... 0.483222
0 0.983832 0.130399 ... 0.102393
... ... ... ... ...
1 0.100193 0.091128 ... 0.109132

You can also choose if you want to consider all the values relative to all the selected measures, or only the ones resulted as significant in previous statistical analysis, because they generally show a better discriminant capability.

Now, you can RUN the creation of the dataset, and on this one you can perform a PCA (Principal Component Analysis) in order to reduce the number of features: Athena will show you the resulting graph, and then you can select a percentage of the whole variability of the dataset, in order to select enough features to obtain it and create the final dataset with them.

After this operation (or after you jumped it), you can press the Classification > button in order to go on and select the wished classifier.





The classifiers

After you click on the Classification > button, Athena will open another interface where you can select the wished classifier.

Currently, you can select:

  • The Random Forest classifier, an ensamble classifier obtained by aggregating more decision tree classifiers (here, you can also use a decision tree classifier)
  • The Neural Network classifier, a classifier composed of a multilayer artificial neural network, resulting by a set of preceptrons.

Finally, you can try them by changing their parameters, or you can return to the analysis list.

Clone this wiki locally