You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here I have in mind those metrics that compare a clustering model with independent ground truth (as opposed to "internal" measures of quality, such as the Calinski-Harabasz index). The following look like good candidates:
Rand index
Hubert & Arabie Adjusted Rand index
Mirkin's index
Hubert's index
variation of information
V-measure
mutual information
The Clustering.jl package already has implementations, which assumes the clusters are labelled with integers. The first four are combined into one function, which returns a tuple instead of a single measurement, which deviates from the StatisticalMeasures.jl idiom. These could either be separate measures, or we could add a field for the desired variation.
Given that the definition of these measures are pretty simple, I think it's more trouble than it's worth to write and maintain interfaces for the existing code, which also requires making Clustering.jl a (conditional) dependency. I therefore propose new implementations here. The vanilla Rand index would make a great start.
Here's what traits would look like for these measures:
consumes_multiple_observations = true
kind_of_proxy = LearnAPI.LabelAmbiguous()
observation_scitype = Union{Missing, ScientificTypesBase.Finite}
orientation = StatisticalMeasuresBase.Score() # all except variation of information
orientation = StatisticalMeasuresBase.Loss() # variation of information
human_name = ... <string>
For others not mentioned above, the fallbacks suffice.
The text was updated successfully, but these errors were encountered:
Here I have in mind those metrics that compare a clustering model with independent ground truth (as opposed to "internal" measures of quality, such as the Calinski-Harabasz index). The following look like good candidates:
The Clustering.jl package already has implementations, which assumes the clusters are labelled with integers. The first four are combined into one function, which returns a tuple instead of a single measurement, which deviates from the StatisticalMeasures.jl idiom. These could either be separate measures, or we could add a field for the desired variation.
Given that the definition of these measures are pretty simple, I think it's more trouble than it's worth to write and maintain interfaces for the existing code, which also requires making Clustering.jl a (conditional) dependency. I therefore propose new implementations here. The vanilla Rand index would make a great start.
Here's what traits would look like for these measures:
For others not mentioned above, the fallbacks suffice.
The text was updated successfully, but these errors were encountered: