Skip to content

[SYCL][Clang] Add support for device image compression#15124

Merged
againull merged 60 commits into
intel:syclfrom
uditagarwal97:compress_img
Oct 22, 2024
Merged

[SYCL][Clang] Add support for device image compression#15124
againull merged 60 commits into
intel:syclfrom
uditagarwal97:compress_img

Conversation

@uditagarwal97

@uditagarwal97 uditagarwal97 commented Aug 18, 2024

Copy link
Copy Markdown
Contributor

This PR adds support for device image compression for the old offloading model. I'll make another follow-up PR to extend support for the new offload model.

Design summary:

ZSTD (compression algo)   ----> LLVMSupport (Interface)  ------> clang-offload-wrapper (For compression)
 |
 ----------------------------------------------------- --------> SYCL RT (For decompression)

This PR introduces ZSTD (https://github.com/facebook/zstd) as a 3rd party dependency of DPCPP. Similar to upstream LLVM, we expect user to have zstd-dev package installed on their machine - we won't be installing zstd from sources.

How to use
To compress device images, add --offload-compress CLI option to your clang invocation. Note that we compress device images only if the size of device images exceeds a threshold, which is 512 bytes by default. Moreover, by default, we use ZSTD level 10 for compression. ZSTD compression levels provides a tradeoff between (de)compression time and compression ratio, and the compression level can be changed using --offload-compression-level=<int> CLI option.

@uditagarwal97 uditagarwal97 self-assigned this Aug 18, 2024
@uditagarwal97

uditagarwal97 commented Aug 21, 2024

Copy link
Copy Markdown
Contributor Author

Some initial performance stats:

zstd_spirv_smol-v_dataset

Dataset: https://github.com/aras-p/smol-v/tree/master/tests/spirv-dumps
Dataset size: 275 SPIR-V files

Conclusion:
Overall, for SPIR-V files < 50KB, the decompression time is below 0.1ms, compression time <0.15ms, and compression ratio is ~3 (compressed image is 1/3 the original size).
For very small images (<512 bytes), I don't see much benefit of image compression.

Note:- Most of the SPIR-V files I have in the dataset are <50KB. I'm working on extending the performance evaluation to larger workloads. Also, the (de)compression performance will vary with the format of the file being compressed, so for AOT, where device images consists of target assembly, the performance stats might differ.

@jbrodman

Copy link
Copy Markdown
Contributor

What happens with the PTX and AMDGPU targets? Are they covered by the "native" binary image format? Do we need additional formats?

@jbrodman

Copy link
Copy Markdown
Contributor

Also guessing this feature may not make sense when combined with the native cpu device, but need to think more about that.

@uditagarwal97

uditagarwal97 commented Aug 21, 2024

Copy link
Copy Markdown
Contributor Author

What happens with the PTX and AMDGPU targets? Are they covered by the "native" binary image format? Do we need additional formats?

I think they are covered by the "none" binary image format. This is because clang driver (in SYCL offload mode) never specifies the image format in call to clang-offload-wrapper. So, by default, the BinaryImageFormat is "none" and it is upto the SYCL runtime to determine the format (https://github.com/intel/llvm/blob/sycl/sycl/source/detail/device_binary_image.cpp#L170).

I tested my changes with PTX, and they seem to work fine, so, we'd likely not require additional formats.

@uditagarwal97

Copy link
Copy Markdown
Contributor Author

Ping @premanandrao @mdtoguchi @bso-intel

Comment thread sycl/source/detail/compression.hpp
Comment thread sycl/source/detail/device_binary_image.cpp Outdated
Comment thread sycl/test-e2e/Compression/compression_multiple_tu.cpp Outdated
Comment thread sycl/test-e2e/Compression/compression_seperate_compile.cpp
// REQUIRES: zstd, opencl-aot, cpu, linux

////////////////////// Compile device images
// RUN: %clangxx -fsycl -fsycl-targets=spir64_x86_64 -fsycl-host-compiler=clang++ -fsycl-host-compiler-options='-std=c++17 -Wno-attributes -Wno-deprecated-declarations -fPIC -DENABLE_KERNEL1' -DENABLE_KERNEL1 -c %s -o %t_kernel1_aot.o

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you specifically wanted to test compilation with a 3rd-party host compiler you don't need -fsycl-host-compiler and -fsycl-host-compiler-options flags

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to have a test that mimics the compilation toolchain that PyTorch team use (as described here: https://github.com/intel/torch-xpu-ops/blob/main/cmake/BuildFlags.cmake#L63). They use gcc for final linkage. The problem with using gcc here in E2E test is that we'd have to explicitly provide path to sycl headers and library (See the older version of this test: https://github.com/intel/llvm/blob/1d8181335fb188aa4ae0ad39b3826a4162b200d2/sycl/test-e2e/Compression/compression_seperate_compile.cpp). AFAIK, we don't have LIT substitutions to get path to SYCL headers and library, and so, I ended up using clang++ as "3rd party" compiler, with which I can just use -fsycl to get the include directory and SYCL library.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to have a test that mimics the compilation toolchain that PyTorch team use

Then I think it worth noting that in a comment within the test, or otherwise it seems like an unnecessary overcomplication

Comment thread sycl/test-e2e/Compression/compression_seperate_compile.cpp Outdated
Comment thread sycl/test-e2e/Compression/no_zstd_warning.cpp Outdated
Comment thread sycl/test-e2e/lit.cfg.py
if sp[0] == 0:
config.available_features.add("preview-breaking-changes-supported")

# Check if clang is built with ZSTD and compression support.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a way simpler way. Use lit.site.cfg.py.in to propagate a value of CMake variable into this python script.

I would also explore how LLVM propagates that. There are tests in LLVM which require zstd feature, so I wonder if we can call some LIT helper to get this feature automatically propagated into LIT for us

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, passing LLVM_ENABLE_ZSTD from CMake to LIT won't work here because E2E tests can be built standalone, like what we do in CI.
LLVM seems to pass CMake Variables to LIT: https://github.com/llvm/llvm-project/blob/main/llvm/test/lit.site.cfg.py.in#L37

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. I still wonder if there is a simpler way (i.e. some existing helper for running an executable and getting its output, but if no, then we will have to leave with what we have.

BTW, I'm sure that compiler is able to read the program from stdin, so you can maybe save on file operations here

Comment thread sycl/test-e2e/Compression/Inputs/single_kernel.cpp
Comment thread clang/lib/Driver/ToolChains/Clang.cpp
Comment thread clang/test/Driver/sycl-offload-wrapper-compression.cpp Outdated

@mdtoguchi mdtoguchi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for driver

@uditagarwal97

Copy link
Copy Markdown
Contributor Author

@bso-intel @intel/llvm-reviewers-runtime ping!

Comment on lines +1135 to +1139
llvm::compression::zstd::compress(
ArrayRef<unsigned char>(
(const unsigned char *)(Bin->getBufferStart()),
Bin->getBufferSize()),
CompressedBuffer, OffloadCompressLevel);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the crash occurs when compression failed, the error message that the SYCL end-user will receive may not be useful.
For example, "Failed to create ZSTD_CCtx", ""Failed to set ZSTD_c_compressionLevel", etc.
It would be much better user experience if they get a message like "Device image compression failed." + e.what().

Comment on lines +1117 to +1119
"'--offload-compress' option is specified but zstd "
"is not available. The device image will not be "
"compressed.");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of error message is good since the SYCL end-user can understand what went wrong.

@uditagarwal97

Copy link
Copy Markdown
Contributor Author

@bso-intel

When the crash occurs when compression failed, the error message that the SYCL end-user will receive may not be useful.
For example, "Failed to create ZSTD_CCtx", ""Failed to set ZSTD_c_compressionLevel", etc.
It would be much better user experience if they get a message like "Device image compression failed." + e.what().

In 946a738, I've wrapped zstd::compress in try/catch to throw a more meaningful error message. Note that this will only work if DPC++ is built with LLVM_ENABLE_EH.

Comment thread sycl/source/detail/program_manager/program_manager.cpp Outdated
@uditagarwal97

Copy link
Copy Markdown
Contributor Author

@intel/llvm-gatekeepers The PR is ready to be merged. All the downstream infrastructure, along with intel/llvm CI machines, is ready with zstd installed.

@againull

againull commented Oct 22, 2024

Copy link
Copy Markdown
Contributor

@uditagarwal97 Could you please take a look at post-commit failures: https://github.com/intel/llvm/actions/runs/11468699849/job/31915329901


Failed Tests (2):
SYCL :: Compression/compression.cpp
SYCL :: Compression/compression_multiple_tu.cpp

On AMD/HIP

Comment thread buildbot/configure.py
"-DLLVM_ENABLE_PROJECTS={}".format(llvm_enable_projects),
"-DSYCL_BUILD_PI_HIP_PLATFORM={}".format(sycl_build_pi_hip_platform),
"-DLLVM_BUILD_TOOLS=ON",
"-DLLVM_ENABLE_ZSTD=ON",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't turn on these by default? We should just turn on them in if args.ci_defaults:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind turning them on by default is to build the compiler with device image compression support if the user has zstd-dev package installed. If zstd is not found, there shouldn't be any build error.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are configurate failures.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake Error at lib/Support/CMakeLists.txt:327 (get_property):

get_property could not find TARGET zstd::libzstd_static. Perhaps it has

not yet been created.

CMake Error at lib/Support/CMakeLists.txt:330 (get_property):

get_property could not find TARGET zstd::libzstd_static. Perhaps it has

not yet been created.

@sarnex sarnex Oct 23, 2024

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah local build is cooked for me too, we need to revert the on by zstd default part at least

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR to temporarily turn off zstd by default: #15833

@uditagarwal97

Copy link
Copy Markdown
Contributor Author

@uditagarwal97 Could you please take a look at post-commit failures: https://github.com/intel/llvm/actions/runs/11468699849/job/31915329901

Failed Tests (2): SYCL :: Compression/compression.cpp SYCL :: Compression/compression_multiple_tu.cpp

On AMD/HIP

PR to disable the failing tests on HIP: #15830

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.