[MJX] When using jax.sharding over batch axis, getting different results than on single device #1454

areiner222 · 2024-02-09T21:28:53Z

areiner222
Feb 9, 2024

Hi,

Has anyone successfully used the jax sharding api along a batch dimension for mjx? When I batch my mjx.Model, I find that I have slightly different outputs when I run mjx.forward and subsequently step forward. With no ctrl, I find that the qpos results differ slightly and diverge over the course of stepping through. When I use ctrl (position actuators), I get nan values in the qfrc_actuator and therefore nans in qpos.

Happy to provide further context (don't have a repro on hand) but was wondering if anyone had encountered this already?

Thanks!

btaba · 2024-02-20T18:35:16Z

btaba
Feb 20, 2024
Maintainer

Hey @areiner222 , can you send a minimal repro? I haven't used jax sharding, so have not encountered this error before

0 replies

areiner222 · 2024-02-20T19:37:35Z

areiner222
Feb 20, 2024
Author

Hey @btaba I don't have a simple repro on hand right now, but I was able to fix the nan issue - when initializing mjx.Data, I was using make_data + forward with some initial qpos /qvel inside vmap along the sharded batch dimension of the mjx.Model originally. I tried to split make_data and forward into two separate vmap calls and this seemed to fix the nans. I still see small divergences in state vs single device simulation which is very small at first and accumulates over time (but still minimal).

Can close for now since the issue is resolved on my end, and happy to engage again if this comes up for anyone else.

0 replies

btaba · 2024-02-20T20:09:13Z

btaba
Feb 20, 2024
Maintainer

Thanks @areiner222 , glad you found a workaround. In the past I have seen numerical precision differences depending on the way ops get called/compiled. You may want to mess with this flag jax.config.update('jax_default_matmul_precision', jax.lax.Precision.HIGH) and see what that gives you

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MJX] When using jax.sharding over batch axis, getting different results than on single device #1454

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

[MJX] When using jax.sharding over batch axis, getting different results than on single device #1454

areiner222 Feb 9, 2024

Replies: 3 comments

btaba Feb 20, 2024 Maintainer

areiner222 Feb 20, 2024 Author

btaba Feb 20, 2024 Maintainer

areiner222
Feb 9, 2024

btaba
Feb 20, 2024
Maintainer

areiner222
Feb 20, 2024
Author

btaba
Feb 20, 2024
Maintainer