Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking] Add clustering metrics #26

Open
7 tasks
ablaom opened this issue Apr 29, 2024 · 0 comments
Open
7 tasks

[Tracking] Add clustering metrics #26

ablaom opened this issue Apr 29, 2024 · 0 comments
Labels

Comments

@ablaom
Copy link
Member

ablaom commented Apr 29, 2024

Here I have in mind those metrics that compare a clustering model with independent ground truth (as opposed to "internal" measures of quality, such as the Calinski-Harabasz index). The following look like good candidates:

  • Rand index
  • Hubert & Arabie Adjusted Rand index
  • Mirkin's index
  • Hubert's index
  • variation of information
  • V-measure
  • mutual information

The Clustering.jl package already has implementations, which assumes the clusters are labelled with integers. The first four are combined into one function, which returns a tuple instead of a single measurement, which deviates from the StatisticalMeasures.jl idiom. These could either be separate measures, or we could add a field for the desired variation.

Given that the definition of these measures are pretty simple, I think it's more trouble than it's worth to write and maintain interfaces for the existing code, which also requires making Clustering.jl a (conditional) dependency. I therefore propose new implementations here. The vanilla Rand index would make a great start.

Here's what traits would look like for these measures:

consumes_multiple_observations = true
kind_of_proxy = LearnAPI.LabelAmbiguous()
observation_scitype = Union{Missing, ScientificTypesBase.Finite}
orientation = StatisticalMeasuresBase.Score() # all except variation of information
orientation = StatisticalMeasuresBase.Loss() # variation of information
human_name = ... <string>

For others not mentioned above, the fallbacks suffice.

@ablaom ablaom added good first issue Good for newcomers bucket list labels Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant