[Tracking] Add clustering metrics #26

ablaom · 2024-04-29T21:22:00Z

Here I have in mind those metrics that compare a clustering model with independent ground truth (as opposed to "internal" measures of quality, such as the Calinski-Harabasz index). The following look like good candidates:

The Clustering.jl package already has implementations, which assumes the clusters are labelled with integers. The first four are combined into one function, which returns a tuple instead of a single measurement, which deviates from the StatisticalMeasures.jl idiom. These could either be separate measures, or we could add a field for the desired variation.

Given that the definition of these measures are pretty simple, I think it's more trouble than it's worth to write and maintain interfaces for the existing code, which also requires making Clustering.jl a (conditional) dependency. I therefore propose new implementations here. The vanilla Rand index would make a great start.

Here's what traits would look like for these measures:

consumes_multiple_observations = true
kind_of_proxy = LearnAPI.LabelAmbiguous()
observation_scitype = Union{Missing, ScientificTypesBase.Finite}
orientation = StatisticalMeasuresBase.Score() # all except variation of information
orientation = StatisticalMeasuresBase.Loss() # variation of information
human_name = ... <string>

For others not mentioned above, the fallbacks suffice.

The text was updated successfully, but these errors were encountered:

ablaom added good first issue Good for newcomers bucket list labels Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking] Add clustering metrics #26

[Tracking] Add clustering metrics #26

ablaom commented Apr 29, 2024 •

edited

Loading

[Tracking] Add clustering metrics #26

[Tracking] Add clustering metrics #26

Comments

ablaom commented Apr 29, 2024 • edited Loading

ablaom commented Apr 29, 2024 •

edited

Loading