You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
use pin_memory for ModelInput and minor refactoring (#2910)
Summary:
# context
* this is some BE work and minor refactoring when working on pipeline optimization
* major change is to add "pin_memory" option to the test_input file for `ModelInput` generation:
```
The `pin_memory()` call for all KJT tensors are important for training benchmark, and
also valid argument for the prod training scenario: TrainModelInput should be created
on pinned memory for a fast transfer to gpu. For more on pin_memory:
https://pytorch.org/tutorials/intermediate/pinmem_nonblock.html#pin-memory
```
* minor refactoring includes
(1) default parameters for TrainPipeline benchmark so that the embedding size, batch size, etc. are resonable.
(2) fix the batch index error in trace, previously used (curr_index+1)
(3) split the `EmbeddingPipelinedForward` __call__ function into two parts.
* trace comparison: the `pin_memory()` for the ModelInput is critical for a non_blocking cpu to gpu data copy
before copy_batch_to_gpu is the same size as gpu data transfer {F1977324224}
after: copy_batch_to_gpu is hardly seen in trace {F1977324220}
Reviewed By: aporialiao
Differential Revision: D73514639
0 commit comments