You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I think you understand Diffusion better than me, so I want to discuss with you and see if you can solve my problem.
Here is the situation:
when I run this diffusion project for segmentation tasks, I find that: the local host (22080ti pytorch=1.8) loading the checkpoint trained on the server side (2*4090 pytorch=1.9) cannot achieve the same performance as the server (dice 84 vs 81), this problem puzzles me a lot. In order to find the best checkpoint, I follow the approach of using DDIM accelerated sampling to make inferences after training 2 epochs, and then test to get results.
The problem is: during training on the server side, the test metrics can reach 84, but when loading the checkpoint separately on the server side for testing, it is 82. More strangely, when testing on the local host, it is 81. The difference in these metrics makes it hard for me to understand.
I suspect it may be because Diffusion needs to initialize random noise? However, I still encounter this problem even when setting the same random seed. In fact, regarding DDIM, choosing a different iteration batch size each time also leads to slightly different performance, which puzzles me as well.
I look forward to your reply.
The text was updated successfully, but these errors were encountered:
Hello, I think you understand Diffusion better than me, so I want to discuss with you and see if you can solve my problem. Here is the situation: when I run this diffusion project for segmentation tasks, I find that: the local host (22080ti pytorch=1.8) loading the checkpoint trained on the server side (2*4090 pytorch=1.9) cannot achieve the same performance as the server (dice 84 vs 81), this problem puzzles me a lot. In order to find the best checkpoint, I follow the approach of using DDIM accelerated sampling to make inferences after training 2 epochs, and then test to get results. The problem is: during training on the server side, the test metrics can reach 84, but when loading the checkpoint separately on the server side for testing, it is 82. More strangely, when testing on the local host, it is 81. The difference in these metrics makes it hard for me to understand. I suspect it may be because Diffusion needs to initialize random noise? However, I still encounter this problem even when setting the same random seed. In fact, regarding DDIM, choosing a different iteration batch size each time also leads to slightly different performance, which puzzles me as well. I look forward to your reply.
@LiuTingWed Hi, TingWed.
Could you mind sharing your method of testing process? The published code dose not have the testing script.
Hello, I think you understand Diffusion better than me, so I want to discuss with you and see if you can solve my problem.
Here is the situation:
when I run this diffusion project for segmentation tasks, I find that: the local host (22080ti pytorch=1.8) loading the checkpoint trained on the server side (2*4090 pytorch=1.9) cannot achieve the same performance as the server (dice 84 vs 81), this problem puzzles me a lot. In order to find the best checkpoint, I follow the approach of using DDIM accelerated sampling to make inferences after training 2 epochs, and then test to get results.
The problem is: during training on the server side, the test metrics can reach 84, but when loading the checkpoint separately on the server side for testing, it is 82. More strangely, when testing on the local host, it is 81. The difference in these metrics makes it hard for me to understand.
I suspect it may be because Diffusion needs to initialize random noise? However, I still encounter this problem even when setting the same random seed. In fact, regarding DDIM, choosing a different iteration batch size each time also leads to slightly different performance, which puzzles me as well.
I look forward to your reply.
The text was updated successfully, but these errors were encountered: