Is the forward results difference and gradient difference normal when use cuda graph to accelerate and without it? #397
-
I notice some forward results difference (very small difference) when using cuda graph to accelerate and without using it. Is this a normal case? Actually I feel that the gradient from the cuda graph is better (is it possible in this way the forward model is preciser?) Potentially because it reduces some data transferring effort and reduce some "errors"? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi @Jianghanxiao, there shouldn't be numerical differences when running with and without CUDA graphs. I think what might be happening is that some input array (or gradient) is different or maybe isn't getting reset between launches. It's hard to diagnose without more info, are you able to share a small repro? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Thanks for the answer! Hmmmm, it's a bit hard to write a small demo for this, but potentially I feel the small error may come from the atomic_add when I do my physics simulation. Actually even after 600 substeps, the most error is just 2e-7, so basically very small, and may just be realted to different order in atomic_add? And actually this may also be undeterministic even if we run the same setting twice. |
Beta Was this translation helpful? Give feedback.
Indeed, the small numerical differences could be caused by different ordering of operations. If running with the same setting (with/without graph) multiple times also produces such differences, then the issue is probably unrelated to CUDA graphs.