You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@isaac-chung and I were debugging the weird scores for VOC2007 in mieb (#1792) and found the following potential issues for both mieb and main.
lrap computation. Think lrap is supposed to operate on continuous scores instead of discrete predicted labels which I fixed for MIEB here [mieb] fixing lrap computation for multi-label classification #1834. This change largely smooths the large performance range across models when the samples_per_label is small.
Was trying to fix the same for main but got a chain of failed tests. Note that a difference is main is using KNeighborsClassifier() without a MultiOutputClassifier which I am not sure works in the expected way either; mieb instead is using MultiOutputClassifier(estimator=LogisticRegression()) which treats each label as a separate binary cls problem. Will be great if someone can confirm the same for main! Also, why KNeighbors?
Logic for under sampling for samples_per_label. Both main and mieb are undersampling with a per single-label logic instead of a per set-label logic which I am not sure is what we want either. i.e.,
With a samples_per_label=8 we can get something like the following, where some labels have much more number of examples than the other. e.g., for an example of label (18, 14), if the number of 18th < 8, even though the 14th is already 40+, this example will still be sampled and add to the training set.
Think this is less of a problem if using MultiOutputClassifier() because it only means that certain binary classifiers are trained with more examples while others are not affected.
The number of unique predicted labels are much fewer than unique groundtruth labels, which can make accuracy much lower because it's currently assessing in a strict set-label perfect matching logic. e.g., [0, 0, 0, 1, 0] will score 0 for [0, 0, 0, 1, 1] which is probably not optimal.
The text was updated successfully, but these errors were encountered:
@isaac-chung and I were debugging the weird scores for
VOC2007
inmieb
(#1792) and found the following potential issues for bothmieb
andmain
.lrap
computation. Thinklrap
is supposed to operate on continuous scores instead of discrete predicted labels which I fixed for MIEB here [mieb] fixing lrap computation for multi-label classification #1834. This change largely smooths the large performance range across models when thesamples_per_label
is small.Was trying to fix the same for
main
but got a chain of failed tests. Note that a difference ismain
is usingKNeighborsClassifier()
without aMultiOutputClassifier
which I am not sure works in the expected way either;mieb
instead is usingMultiOutputClassifier(estimator=LogisticRegression())
which treats each label as a separate binary cls problem. Will be great if someone can confirm the same for main! Also, why KNeighbors?Logic for under sampling for
samples_per_label
. Bothmain
andmieb
are undersampling with a per single-label logic instead of a per set-label logic which I am not sure is what we want either. i.e.,With a
samples_per_label=8
we can get something like the following, where some labels have much more number of examples than the other. e.g., for an example of label (18, 14), if the number of 18th < 8, even though the 14th is already 40+, this example will still be sampled and add to the training set.Think this is less of a problem if using
MultiOutputClassifier()
because it only means that certain binary classifiers are trained with more examples while others are not affected.accuracy
much lower because it's currently assessing in a strict set-label perfect matching logic. e.g.,[0, 0, 0, 1, 0]
will score0
for[0, 0, 0, 1, 1]
which is probably not optimal.The text was updated successfully, but these errors were encountered: