Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R1.15 rocm61 albm add asan hipblaslt #2672

Open
wants to merge 6 commits into
base: r1.15-rocm61-albm-add-asan
Choose a base branch
from

Conversation

jayfurmanek
Copy link

No description provided.

pemeliya and others added 6 commits September 16, 2024 13:28
WIP: ensure compilation of amdhipblaslt_plugin

WIP: cublaslt_matmul_thunk builds

WIP: added cublaslt thunk and gemm_rewriter support for cublaslt

fixing gemm_rewriter

added BufferComparator and in-process kernels support

improved buffer comparator: relative_tol

added autotuning + redzone and buffercomparator enabled

build fixes

autotuner fixes

last autotuner & gemm rewriter fixes

WIP: integrating grouped gemm into gpublas-lt

WIP: added grouped gemm to Stream

added setting groupgemm algorithm

CAS patch for rocm

adding workspace to gemm_rewriter and optimizing gpublaslt cache

silence unnoying llvm-link warning

disabled grouped gemm again, and fixes on gemm_rewriter (no workspace for cublas yet)

added indicator matmul unit test

make sure gcc driver recognizes warning suppression commands

WIP: alternative impl of indicator mul for ROCM

indicator_matmul_op with hipblas-lt, extended respective op test

fixing/suppressing annoying warnings and simplifying build_whl

GemmAlgorithmPicker: inrterface changes for non-XLA autotuning

indicator matmul op: cleaned up
added indicator matmul benchmark

oncoming tuning changes + added TF32 env flag

group gemm ongoing changes

enabled autotuning and workspace allocator for blas-lt gemm runner
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants