[Main & MIEB] potential issues for multi-label classification #1835

gowitheflow-1998 · 2025-01-19T12:35:19Z

@isaac-chung and I were debugging the weird scores for VOC2007 in mieb (#1792) and found the following potential issues for both mieb and main.

lrap computation. Think lrap is supposed to operate on continuous scores instead of discrete predicted labels which I fixed for MIEB here [mieb] fixing lrap computation for multi-label classification #1834. This change largely smooths the large performance range across models when the samples_per_label is small.
Was trying to fix the same for main but got a chain of failed tests. Note that a difference is main is using KNeighborsClassifier() without a MultiOutputClassifier which I am not sure works in the expected way either; mieb instead is using MultiOutputClassifier(estimator=LogisticRegression()) which treats each label as a separate binary cls problem. Will be great if someone can confirm the same for main! Also, why KNeighbors?
Logic for under sampling for samples_per_label. Both main and mieb are undersampling with a per single-label logic instead of a per set-label logic which I am not sure is what we want either. i.e.,

if any((label_counter[label] < samples_per_label) for label in y[i]):
    sample_indices.append(i)

With a samples_per_label=8 we can get something like the following, where some labels have much more number of examples than the other. e.g., for an example of label (18, 14), if the number of 18th < 8, even though the 14th is already 40+, this example will still be sampled and add to the training set.

defaultdict(int,
            {14: 49,
             13: 8,
             4: 12,
             10: 8,
             5: 9,
             2: 8,
             11: 8,
             6: 11,
             7: 9,
             18: 8,
             12: 8,
             19: 9,
             17: 9,
             8: 8,
             0: 8,
             3: 8,
             15: 8,
             9: 8,
             16: 8,
             1: 8})

Think this is less of a problem if using MultiOutputClassifier() because it only means that certain binary classifiers are trained with more examples while others are not affected.

The number of unique predicted labels are much fewer than unique groundtruth labels, which can make accuracy much lower because it's currently assessing in a strict set-label perfect matching logic. e.g., [0, 0, 0, 1, 0] will score 0 for [0, 0, 0, 1, 1] which is probably not optimal.

The text was updated successfully, but these errors were encountered:

gowitheflow-1998 · 2025-01-19T12:42:09Z

@isaac-chung @KennethEnevoldsen @Samoed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Main & MIEB] potential issues for multi-label classification #1835

[Main & MIEB] potential issues for multi-label classification #1835

gowitheflow-1998 commented Jan 19, 2025

gowitheflow-1998 commented Jan 19, 2025

[Main & MIEB] potential issues for multi-label classification #1835

[Main & MIEB] potential issues for multi-label classification #1835

Comments

gowitheflow-1998 commented Jan 19, 2025

gowitheflow-1998 commented Jan 19, 2025