You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* minor changes to kernel arg maps
* add more capture replay controls
* simplify capture replay controls
* move image metadata capturing
* fix capture replay scripts
* fix CL_PROGRAM_BINARIES query
* verified image capture and playback is working
* fix copyright date after rebase
* fix docs and tidy up a few more things
* remove stale comment
* disable logging in several cases when capture is skipped
These were a little too verbose in common cases.
* move buffer and image dumping for replay back into replay directory
Copy file name to clipboardExpand all lines: docs/capture_single_kernels.md
+33-24Lines changed: 33 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,52 +17,61 @@ To replay the captured kernels, you will need the following Python packages:
17
17
18
18
## Step by Step for Automatic Capturing
19
19
20
-
* Set one of the two controls:
21
-
*`DumpReplayKernelName`, if you want to capture a kernel by its name.
22
-
*`DumpReplayKernelEnqueue`, if you want to capture a kernel by its enqueue number.
23
-
* Then, simply run the program as usual!
24
-
* Example on Linux: `CLI_DumpReplayKernelName=${NameOfKernel} cliloader /path/to/executable`
20
+
1. Set the top-level control to enable kernel capturing and replay: `CaptureReplay`
21
+
2. Set any additional controls to capture a specific range of kernels, or specific kernel names. For example:
22
+
*`CaptureReplayMinEnqueue` and `CaptureReplayMaxEnqueue`, to capture a specific range of kernel enqueues.
23
+
*`CaptureReplayKernelName`, to capture a specific kernel name.
24
+
*`CaptureReplayUniqueKernels`, to capture only unique kernel and dispatch parameter combinations.
25
+
*`CaptureReplayNumKernelEnqueuesSkip`, to skip initial captures.
26
+
*`CaptureReplayNumKernelEnqueuesCapture`, to capture a limited number of kernel enqueues.
27
+
3. Then, simply run the program as usual!
28
+
29
+
For more details, please see the Capture and Replay Controls section in the [controls](controls.md) documentation.
25
30
26
31
## Step by Step for Automatic Capturing and Validation
27
32
28
-
* Copy the [capture_and_validate.py](../scripts/capture_and_validate.py) script to the place where you run the app from.
29
-
* Not strictly necessary, but makes life easier.
30
-
* Run this script with the following arguments:
31
-
- One of `--num EnqueueNumberToBeCaptured` or `--name NameOfKernelToBeCaptured`
32
-
-`-cli "/path/to/cliloader"`
33
-
-`--p "/path/to/program"`
34
-
-`--a ArgsForProgram`
33
+
Use the [capture_and_validate.py](../scripts/capture_and_validate.py) script to capture a workload and validate that the replayed results match.
34
+
35
+
Arguments for the capture and validate script are:
35
36
36
-
Please make sure to follow this order of arguments!
37
+
*`-c` or `--cliloader`: Path to `cliloader`. This can be a full path, or a relative path, or just `cliloader` if `cliloader` is already in the system path.
38
+
*`-p` or `--program`: The command to execute the program to capture.
39
+
*`-a` or `--args`: Any optional arguments to pass to the program to capture.
40
+
* Either one of:
41
+
*`-k` or `--kernel_name`: The kernel name to capture.
42
+
*`-n` or `--enqueue_number`: The enqueue number that should be captured.
37
43
38
-
This will then run the program using `cliloader` with the given arguments, capture the the specified kernel, and verify that the buffers calculated by the standalone replay agree with the buffers calculated by the original program.
44
+
The capture and validate script will then run the program using `cliloader` with the given arguments to capture the the specified kernel or enqueue number.
45
+
The script will then verify that the buffers calculated by the standalone replay agree with the buffers calculated by the original program.
39
46
If the buffers don't agree, it will show a message in the terminal.
40
47
41
48
## Supported Features
42
49
43
50
* OpenCL Buffers
44
51
* These may be aliased, then only one buffer is used.
45
52
* Only true if the buffers use the same memory address, so not when using sub-buffers and having offsets.
46
-
*`__local` kernel arguments, i.e. those set by `clSetKernelArg(kernel, arg_index, local_size, nullptr)`.
53
+
*`__local` kernel arguments, i.e. those set by `clSetKernelArg(kernel, arg_index, local_size, NULL)`.
47
54
* Device only buffers, i.e. those with `CL_MEM_HOST_NO_ACCESS`. When kernel capture is enabled, any device-only access flags are removed.
48
55
* OpenCL Images
56
+
* 2D, and 3D images are supported.
49
57
* OpenCL Samplers
50
-
*Build/replay from source
51
-
*Build/replay from a device binary
58
+
*OpenCL Kernels from source or IL
59
+
*OpenCL Kernels from device binary
52
60
53
61
## Limitations (incomplete)
54
62
55
-
* Does not work with OpenCL pipes
56
-
* Untested for out-of-order queues
57
-
* Sub-buffers are not dealt with explicitly, this may affect the results for both debugging and performance
58
-
* The capture and validate script doesn't work with GUI apps
63
+
* Does not work with OpenCL SVM or USM.
64
+
* Does not work with OpenCL pipes.
65
+
* Untested for out-of-order queues.
66
+
* Sub-buffers are not dealt with explicitly, this may affect the results for both debugging and performance.
67
+
* The capture and validate script may not work with some GUI apps.
59
68
60
69
## Advice
61
70
62
-
* Use the following environment variables for `pyopencl`: `PYOPENCL_NO_CACHE=1` and `PYOPENCL_COMPILER_OUTPUT=1`
63
-
* Minimize usage of other controls, to prevent unexpected behavior.
71
+
* Use the following environment variables for `pyopencl`: `PYOPENCL_NO_CACHE=1` and `PYOPENCL_COMPILER_OUTPUT=1`.
72
+
* Minimize usage of other controls, to prevent unexpected behavior, however:
64
73
* Consider enabling `InitializeBuffers` for more predictable results between runs.
65
-
* Only set one of `DumpReplayKernelName` and `DumpReplayKernelEnqueue`.
74
+
* When executing the capture and validate script consider removing any other kernel captures, or verifying that the validate script is using the correct capture.
66
75
* Always make sure to check if your results make sense.
67
76
* For some apps using `cliloader` doesn't work properly. If this happens for your application, please try other [install](install.md) options.
Copy file name to clipboardExpand all lines: docs/controls.md
+30-8Lines changed: 30 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -477,14 +477,6 @@ If set to a nonzero value, the Intercept Layer for OpenCL Applications will dump
477
477
478
478
If set to a nonzero value, the Intercept Layer for OpenCL Applications will dump kernel ISA binaries for every kernel, if supported. Currently, kernel ISA binaries are only supported for Intel GPU devices. Kernel ISA binaries can be decoded into ISA text with a disassembler. The filename will have the form "CLI\_\<Program Number\>\_\<Unique Program Hash Code\>\_\<Compile Count\>\_\<Unique Build Options Hash Code\>\_\<Device Type\>\_\<Kernel Name\>.isabin".
479
479
480
-
##### `DumpReplayKernelEnqueue` (int)
481
-
482
-
If set to a positive value, the Intercept Layer for OpenCL Applications will dump in /Replay/Enqueue\_*/ a standalone (i.e. runs completely independent from the original program from which is was captured) playable set of files for the specified enqueue number which can be used for debugging or profiling. When a program was build from source code, it will dump that one, otherwise it will dump the device binary. It is advised to not use this setting directly, but use /scripts/capture\_and\_validate.py.
483
-
484
-
##### `DumpReplayKernelName` (string)
485
-
486
-
If set, the Intercept Layer for OpenCL Applications for dump the specified kernel the first time it is encountered so that it can be replayed independently. It is advised to not use this setting directly, but use /scripts/capture\_and\_validate.py
487
-
488
480
### Controls for Emulating Features
489
481
490
482
##### `Emulate_cl_khr_extended_versioning` (bool)
@@ -613,6 +605,36 @@ If set to a nonzero value, the Intercept Layer for OpenCL Applications will try
613
605
614
606
If set to a nonzero value, the Intercept Layer for OpenCL Applications will try to automatically partition parent devices into sub-devices with the specified number of compute units.
615
607
608
+
### Capture and Replay Controls
609
+
610
+
##### `CaptureReplay` (bool)
611
+
612
+
This is the top-level control for kernel capture and replay.
613
+
614
+
##### `CaptureReplayMinEnqueue` (cl_uint)
615
+
616
+
The Intercept Layer for OpenCL Applications will only enable kernel capture and replay when the enqueue counter is greater than this value, inclusive.
617
+
618
+
##### `CaptureReplayMaxEnqueue` (cl_uint)
619
+
620
+
The Intercept Layer for OpenCL Applications will stop kernel capture and replay when the encounter is greater than this value, meaning that only enqueues less than this value, inclusive, will be captured.
621
+
622
+
##### `CaptureReplayKernelName` (string)
623
+
624
+
If set, the Intercept Layer for OpenCL Applications will only enable kernel capture and replay when the kernel name equals this name.
625
+
626
+
##### `CaptureReplayUniqueKernels` (bool)
627
+
628
+
If set, the Intercept Layer for OpenCL Applications will only enable kernel capture and replay if the kernel signature (i.e. hash + kernelname) has not been seen already.
0 commit comments