-
Notifications
You must be signed in to change notification settings - Fork 3
CodeXLGpuProfiler hangs with -C options #1
Comments
Hi Li, Would it be possible for you to share your application so we can investigate why it hangs when collecting performance counters? Thanks, |
Hi Chris, Here is the application I run. As the README states, only the hsaco version hangs. The README has more information of this benchmark. Thanks, |
Hi Li, Do you have the most recent version of cloc? When I try to build your sample, I get warnings/errors that I need to use the cl_khr_fp64 extension. I have to execute:
However, when I do that I get the following errors:
|
Hi, Chris I use cloc 1.0.10. I get this warning too but ignoring the pragma warning can also pass the CPU side verification. This pragma can be found in several OpenCL kernels that use double type but none of these benchmarks I run fail the CPU-side verification. I try the command [EDIT] After Upgrading the ROCm toolchain, I try compiling the case with the cloc 1.0.11, and it can't be compiled. |
Hi Li, I will talk to the folks who work on cloc to see if they can help here. In the meantime, if you find a workaround for the cloc compiler problems, please let me know, so I can look at the profiler issue. |
Hi Chris, I tried the CLOC 1.0.13 but in vain. Would it be viable to use CLOC 1.0.10 to investigate this issue? I guess the reason why the latest compiler toolchain fails to compile this case is the decision to switching to code object v2 and use ld.lld as linker. The previous toolchain use amdphdrs (now it is obsolete, mentioned in another repo, LLVM-AMDGPU-Assembler-Extra). |
Thanks Li, I'll see if I can track down 1.0.10 to try out. In the meantime, would it be possible for you to share your executable (and any dependencies it might need at runtime)? I may be able to investigate the profiler issue if I have the executable. |
Hi Chris, Thanks for your reply. Here is the executable I run. The README and Environment has more information. In short,
|
Hi Li, There is a new version of cloc available on the CLOC GitHub repo (1.0.14). This version added a new switch to cloc.sh (-noshared) that allows the hsaco version of the kernel to be built. If I update your makefile to include the -noshared switch on the cloc.sh command line, I can build your application. However, when I run the application I get a warning displayed at the terminal: Warning: sizeof(KernelArg) => 48 kernarg_segment_size => 40 Also, if I manually run the application using "./MonteCarloAsian-hsaco -deseriealize MonteCarloAsianDO_Kernels_hsaco.hsaco" instead of the "make run_hsaco" command, then many times I see a SEGFAULT when "rand()" is called from line 659. I have also seen the SEGFAULT reported in a different location. I haven't looked into the source code, but it feels like there might be some uninitialized data somewhere. So you see a SEGFAULT when running the following command: "./MonteCarloAsian-hsaco -deseriealize MonteCarloAsianDO_Kernels_hsaco.hsaco" In cases where I don't see the segfault, I am able to collect performance counters without a problem. Thanks, |
Hi Chris, I think The host side program is OK. The So, may I ask that When it comes to the new CLOC toolchain, the program receives SEGFAULT after it does the first kernel launch, and when it prepares the random data for the second kernel. That is, the program launched the kernel and the program caught the SEGFAULT after a short time no matter what it does. (In this case, it happens to prepare the random data by calling The only thing changed is the way to generating hsaco (CLOC 1.0.10, 1.0.11, 1.0.14). I still try to work around the issue by modifying the kernel code to let the case be compiled by the latest CLOC 1.0.14 and executes on the machine. Thanks, |
Hi Chris, I investigate on the benchmark and (finally) find out at least one (maybe there is other issue that I haven't found) builtin math function giving me wrong answer when I use Thanks, |
Hi developer,
My environment is Ubuntu 14.04.4, ROCm platform on an Carrizo APU.
Recently, I use CodeXLGpuProfiler to profile a program that will ends in about 14 seconds (without profiling). When I use -C option, it hangs over 20 minutes. With the -A option, the profiler could finish the task for about 15 seconds. The atp file is attached.
MonteCarloAsianDP.txt
Thanks,
Li
The text was updated successfully, but these errors were encountered: