Questions about uncertainty based implementation #11

Data-reindeer · 2023-04-21T13:09:40Z

Hi, Chengcheng Guo and Bo Zhao:

Thanks for your thorough research and clean codes. However, I have some questions about uncertainty based implementation.

As mentioned in the DeepCore paper, samples with lower confidence may have a greater impact on model optimization than those with higher confidence, and should therefore be included in the coreset. But the implementation here actually calculate the inverse scores of uncertainty.

Take entropy as an example, np.log(preds + 1e-6) * preds is the negative of the entropy, so np.argsort(scores)[::-1][:self.coreset_size] select the samples with low entropy (uncertainty). This confused me a lot, which shows inconsistant implementation with the statement in the paper. Is there some bugs in the implementation?

Data-reindeer

The text was updated successfully, but these errors were encountered:

Chengcheng-Guo · 2023-04-24T15:06:27Z

The code selects samples with large entropy. Code np.argsort(scores)[::-1][:self.coreset_size]select samples with smaller scores, where scores are negative of the entropy. Smaller score means larger entropy.

For example, there are two samples with predicted probability [0.1, 0.9] and [0.4, 0.6], respectively. We can calculate their entropies, which are $e_1=-0.1ln(0.1)-0.9ln(0.9)=0.325$ and $e_2=-0.4ln(0.4)-0.6ln(0.6)=0.673$. The algorithm prefers sample 2 rather than sample 1 (because of larger entropy). I think this is consistent with what is stated in the paper.

Data-reindeer · 2023-04-25T01:36:12Z

Thanks for your reply. I think there are some misunderstanding about function np.argsort()
I have tried the following codes
scores = np.array(range(10))
result = np.argsort(scores)[::-1]
result2 = np.argsort(scores)
and print the results
result = [9 8 7 6 5 4 3 2 1 0]
result2 = [0 1 2 3 4 5 6 7 8 9]
The np.argsort sorts the values in ascending order by default, and [::-1] returns the values in descending order. So it seems that the code np.argsort(scores)[::-1][:self.coreset_size] actually select samples with larger scores.

lizekai-richard · 2024-04-18T06:09:03Z

Hi, may I ask if this issue has been resolved? I agree with @Data-reindeer. It seems we are selecting "more confident" samples instead of "less confident" ones as mentioned in the report.

…503) We had our own version of PatrickZH/DeepCore#11 because our version of their implementation confused where the inversion is placed. I thought it through and think we don't need to do any inversion. I added some comments explaining the thoughts. Note that this does not address PatrickZH/DeepCore#13!

MaxiBoether mentioned this issue Jun 13, 2024

fix: Uncertainty based downsampler choosing the most certain samples eth-easl/modyn#503

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about uncertainty based implementation #11

Questions about uncertainty based implementation #11

Data-reindeer commented Apr 21, 2023

Chengcheng-Guo commented Apr 24, 2023

Data-reindeer commented Apr 25, 2023

lizekai-richard commented Apr 18, 2024

Questions about uncertainty based implementation #11

Questions about uncertainty based implementation #11

Comments

Data-reindeer commented Apr 21, 2023

Chengcheng-Guo commented Apr 24, 2023

Data-reindeer commented Apr 25, 2023

lizekai-richard commented Apr 18, 2024