cuda: extended MMF_ROWS_PER_BLOCK #17051

zhang-hui-yulo · 2025-11-06T13:07:59Z

extended MMF_ROWS_PER_BLOCK in mmf to more than warp_size, just keep MMF_ROWS_PER_BLOCK to the old value as I don't do the performance tuning.

Has been tested MMF_ROWS_PER_BLOCK = 64 on my 3080, no enough shared memory when MMF_ROWS_PER_BLOCK = 128

am17an · 2025-11-06T17:02:14Z

You can use this PR as a base for other PRs, as such there is no use for this in it's current form right, i.e. this path is not exercised?

JohannesGaessler · 2025-11-06T18:20:45Z

Please keep this in the PR for WMMA support, I cannot evaluate these changes in a vacuum.

zhang-hui-yulo · 2025-11-07T04:33:08Z

You can use this PR as a base for other PRs, as such there is no use for this in it's current form right, i.e. this path is not exercised?

Yep, the path is not exercised unless you set MMF_ROWS_PER_BLOCK to 64, as the performance might be increased or decreased in test-backend-ops due to the different shape, so I just keep the old 32 value.

extended MMF_ROWS_PER_BLOCK

afec3b4

zhang-hui-yulo requested review from JohannesGaessler and am17an as code owners November 6, 2025 13:08

zhang-hui-yulo changed the title ~~extended MMF_ROWS_PER_BLOCK~~ cuda: extended MMF_ROWS_PER_BLOCK Nov 6, 2025

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 6, 2025

DajanaV mentioned this pull request Nov 6, 2025

UPSTREAM PR #17051: cuda: extended MMF_ROWS_PER_BLOCK auroralabs-loci/llama.cpp#104

Open

Merge branch 'ggml-org:master' into mmf_extended_tile

c6c87bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cuda: extended MMF_ROWS_PER_BLOCK #17051

cuda: extended MMF_ROWS_PER_BLOCK #17051

Uh oh!

zhang-hui-yulo commented Nov 6, 2025

Uh oh!

am17an commented Nov 6, 2025

Uh oh!

JohannesGaessler commented Nov 6, 2025

Uh oh!

zhang-hui-yulo commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cuda: extended MMF_ROWS_PER_BLOCK #17051

Are you sure you want to change the base?

cuda: extended MMF_ROWS_PER_BLOCK #17051

Uh oh!

Conversation

zhang-hui-yulo commented Nov 6, 2025

Uh oh!

am17an commented Nov 6, 2025

Uh oh!

JohannesGaessler commented Nov 6, 2025

Uh oh!

zhang-hui-yulo commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants