Architecture-specific configuration

See https://docs.nvidia.com/cuda/cutile-python/performance.html

<img width="646" height="573" alt="Image" src="https://github.com/user-attachments/assets/143672e6-52b6-4ee8-9782-96230909d0e6" />

x-ref https://github.com/JuliaGPU/cuTile.jl/pull/111#issuecomment-4040525522

iirc, the optimization hints (#25, e.g. `latency` and `allow_tma`) should also support this but I don't know if it's being used anywhere.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture-specific configuration #112

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Architecture-specific configuration #112

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions