Revisit and maybe optimize Collectors #1069

MischaPanch · 2024-03-04T15:12:05Z

          The main assumption Tianshou holds is that batch-style data transfer can reduce a lot of overhead, so we can improve GPU utilization by sending batch data and the overall system throughput. That's why the initial version of the collector is in batch style.

There are some constraints in front of this assumption:

We cannot sequentially send data to GPU to achieve the same throughput as batch-style easily
The model is relatively small, and it's not memory-bound
The Environment step function takes a small amount of time (including reward calculation), at least shorter than policy forward

These are very strong constraints. If either is not true, we can switch to full async rollout implementation to get better throughput, i.e., achieving shorter wall-clock collector.collect time. For example, in RLHF case:

LLM's completion function can be implemented in a fully-async style to achieve the same throughput as batch completion, as long as you provide enough thread/process to handle per request. That invalids (1) (2);
The environment needs a reward model to calculate rewards. If we do things in batch-style, we have to do all policy sampling first, sync, and do reward calculation. The system might be environment throughput bound by not investigating enough compute for reward. But if you can do policy/reward calculation in a fully async way, you can remove all bubbles. That invalids (3).

Originally posted by @Trinkle23897 in #1058 (comment)

The text was updated successfully, but these errors were encountered:

MischaPanch added tentative Up to discussion, may be dismissed optimization Performance optimization (throughout, memory, processing speed) labels Mar 4, 2024

MischaPanch added this to the Release 2.0.0 milestone Mar 20, 2024

github-project-automation bot added this to Overall Tianshou Status Aug 28, 2024

github-project-automation bot moved this to To do in Overall Tianshou Status Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit and maybe optimize Collectors #1069

Revisit and maybe optimize Collectors #1069

MischaPanch commented Mar 4, 2024

Revisit and maybe optimize Collectors #1069

Revisit and maybe optimize Collectors #1069

Comments

MischaPanch commented Mar 4, 2024