-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Loading status checks…
Reduce BM/BN/BK to 64/32/64 to 48/12/48
We initially chose 64/32/64 to make batch processing faster on an NVIDIA A100 but when the code was run on a $300 AMD Radeon RX 6800 it destroyed performance, slowing LLaVA image processing down by 10x, possible due to this card having a small L1 cache or very few registers per thread. This change is meant as a stopgap. It causes a modest slowdown in performance for batched operations on more expensive graphics cards in order to gain the benefit of cheaper graphics cards being possible to use. Until there exists a better way to determine the optimal behavior at runtime, anyone who's seriously interested in performance should consider cuBLAS/hipBLAS This change also fixes the tinyblas header so builds work on all systems
Showing
3 changed files
with
211 additions
and
179 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters