gpu_blas_lt_gemm_runner #2766

ScXfjiang · 2024-11-20T23:36:36Z

No description provided.

tensorflow/compiler/xla/service/computation_placer.cc

tensorflow/compiler/xla/stream_executor/stream.cc

tensorflow/compiler/xla/stream_executor/gpu/gpu_blas_lt.cc

tensorflow/compiler/xla/stream_executor/stream.cc

i-chaochen · 2024-11-22T12:30:14Z

we also have this TF_USE_CUBLASLT for enable hipblaslt
https://github.com/ROCm/tensorflow-upstream/blob/r2.13-rocm-enhanced-hipblaslt/tensorflow/core/kernels/gpu_utils.cc#L98-L107

and it seems for CUDA-only https://github.com/ROCm/tensorflow-upstream/blob/r2.13-rocm-enhanced-hipblaslt_gpu_blas_lt_runner/tensorflow/core/kernels/matmul_op_impl.h#L411-L413

I'm thinking maybe we should unify these flags as one for non-xla @ScXfjiang @pemeliya

ScXfjiang · 2024-11-22T12:34:11Z

we also have this TF_USE_CUBLASLT for enable hipblaslt https://github.com/ROCm/tensorflow-upstream/blob/r2.13-rocm-enhanced-hipblaslt/tensorflow/core/kernels/gpu_utils.cc#L98-L107

and it seems for CUDA-only https://github.com/ROCm/tensorflow-upstream/blob/r2.13-rocm-enhanced-hipblaslt_gpu_blas_lt_runner/tensorflow/core/kernels/matmul_op_impl.h#L411-L413

I'm thinking maybe we should unify these flags as one for non-xla @ScXfjiang @pemeliya

I prefer flags in debug_options_flags.cc to env vars in general. The latter is hard to manage. How do you think about it? @pemeliya

i-chaochen · 2024-11-22T12:39:58Z

debug_options_flags.cc is for XLA specific stuff.

If this is TF and non-xla, these flags are more messy.

gpu_blas_lt_gemm_runner

e538a1c

ScXfjiang requested a review from pemeliya November 20, 2024 23:36

ScXfjiang marked this pull request as ready for review November 20, 2024 23:37

i-chaochen reviewed Nov 20, 2024

View reviewed changes

tensorflow/compiler/xla/service/computation_placer.cc Show resolved Hide resolved

pemeliya requested changes Nov 21, 2024

View reviewed changes

tensorflow/compiler/xla/stream_executor/stream.cc Outdated Show resolved Hide resolved

change the location of gemm runner for Batched GEMM

b7d31bd

ScXfjiang requested a review from pemeliya November 21, 2024 16:15

pemeliya requested changes Nov 22, 2024

View reviewed changes

tensorflow/compiler/xla/stream_executor/gpu/gpu_blas_lt.cc Show resolved Hide resolved

tensorflow/compiler/xla/stream_executor/stream.cc Show resolved Hide resolved

use xla flags to enable hipblaslt instead of env vars

7630bad

pemeliya approved these changes Nov 22, 2024

View reviewed changes

non-xla hipblaslt for ThenBlasGemmStridedBatched

8ee6c6d

ScXfjiang changed the base branch from r2.13-rocm-enhanced-hipblaslt to r2.13-rocm-enhanced-upate-llvm November 25, 2024 13:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu_blas_lt_gemm_runner #2766

gpu_blas_lt_gemm_runner #2766

ScXfjiang commented Nov 20, 2024

i-chaochen commented Nov 22, 2024 •

edited

Loading

ScXfjiang commented Nov 22, 2024

i-chaochen commented Nov 22, 2024

gpu_blas_lt_gemm_runner #2766

Are you sure you want to change the base?

gpu_blas_lt_gemm_runner #2766

Conversation

ScXfjiang commented Nov 20, 2024

i-chaochen commented Nov 22, 2024 • edited Loading

ScXfjiang commented Nov 22, 2024

i-chaochen commented Nov 22, 2024

i-chaochen commented Nov 22, 2024 •

edited

Loading