Data-Tiling: Migrate round_dims_to to iteration_sizes after encoding specialization is on by default #19897
Labels
codegen
Shared code generation infrastructure and dialects
enhancement ➕
New feature or request
good first issue 🌱
Good for newcomers
The
round_dims_to
field in the encoding was useful for data-tiling late materialization path, because it provides the hint for both host and device code. The host side can allocate the storage buffer based on the hint, and the device gets the limitation of padding space. (Otherwise, the device could access the buffer out of bounds.)However, it is not the ideal solution because the device could request larger tile sizes for some cases (e.g., matvec), which leads to inefficient strategy. Also, the host could allocate a huge buffer that is not fully used by the device. Sometimes the device just need a little more storage buffer, but not unconditionally pad each dimension to large size.
Today, we have encoding specialization, which is not yet on by default. The encoding implements the interface methods, that it can propagate the request from the executable target to the host. I.e., the host can allocate exact storage buffer for the tensor encoding. Once we turn the pass on by default, we no longer need the
round_dims_to
field in the encoding. Then the next question is that what information we want to encode in the encodings. I think the answer is the iteration size of each dimension. On CPU, it can generate more efficient code if we recognize that there is a narrow matrix (e.g., matvec/vecmat/etc). Today, we abuse theround_dims_to
field to provide such information (which is bad). If we are going to deprecate theround_dims_to
field, we'll need to introduceiteration_sizes
field to carry the information.Note, the task depends on the encoding specialization pass. We should implement this after we make the encoding specialization pass on by default.
The text was updated successfully, but these errors were encountered: