Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Imex support #149

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Draft: Imex support #149

wants to merge 7 commits into from

Conversation

guptaNswati
Copy link
Contributor

No description provided.

Copy link

copy-pr-bot bot commented Oct 24, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@guptaNswati guptaNswati marked this pull request as draft October 24, 2024 22:42
@guptaNswati guptaNswati requested a review from klueska October 24, 2024 22:42
# https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/sbsa/nvidia-imex_560.28.03-1_arm64.deb
RUN DRIVER_MAJOR_VERSION=$(echo "$DRIVER_VERSION" | cut -d '.' -f 1)
RUN if [ "$DRIVER_MAJOR_VERSION" -ge 560 ] && [ "$TARGETARCH" = "arm64" ]; then \
curl -fsSL -o /tmp/nvidia-imex_$DRIVER_VERSION-1_$TARGETARCH.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/sbsa/nvidia-imex_560.28.03-1_arm64.deb && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be required.

We should be installing the IMEX package this way

apt-get install nvidia-imex-550=550.127.05-1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay sure. but i don't know which all versions and arches will it be supported for now.

@guptaNswati guptaNswati force-pushed the imex-support branch 3 times, most recently from 3a062c1 to 3f36077 Compare October 28, 2024 22:40
@guptaNswati guptaNswati marked this pull request as ready for review October 29, 2024 18:29
@guptaNswati
Copy link
Contributor Author

Tested memcpy in non-DRA mode by setting hostNetwork: true for driver container.

$ kubectl get pod -n gpu-operator
NAME                                                          READY   STATUS      RESTARTS   AGE
gpu-feature-discovery-krcdd                                   1/1     Running     0          110m
gpu-feature-discovery-s2bft                                   1/1     Running     0          108m
gpu-feature-discovery-w4bgn                                   1/1     Running     0          107m
gpu-feature-discovery-zcpfl                                   1/1     Running     0          114m
gpu-operator-d48c8dc97-6mf72                                  1/1     Running     0          23h
gpu-operator-node-feature-discovery-gc-7f546fd4bc-w9rs4       1/1     Running     0          23h
gpu-operator-node-feature-discovery-master-8448c8896c-gnqkm   1/1     Running     0          23h
gpu-operator-node-feature-discovery-worker-cll2k              1/1     Running     0          23h
gpu-operator-node-feature-discovery-worker-cqstj              1/1     Running     0          23h
gpu-operator-node-feature-discovery-worker-gdd8b              1/1     Running     0          23h
gpu-operator-node-feature-discovery-worker-wm9zh              1/1     Running     0          23h
nvidia-container-toolkit-daemonset-4cslh                      1/1     Running     0          107m
nvidia-container-toolkit-daemonset-cvvh5                      1/1     Running     0          114m
nvidia-container-toolkit-daemonset-pqktp                      1/1     Running     0          108m
nvidia-container-toolkit-daemonset-qbhjj                      1/1     Running     0          110m
nvidia-cuda-validator-95kbz                                   0/1     Completed   0          105m
nvidia-cuda-validator-9hxsk                                   0/1     Completed   0          107m
nvidia-cuda-validator-9rrsv                                   0/1     Completed   0          109m
nvidia-cuda-validator-t2wbr                                   0/1     Completed   0          112m
nvidia-dcgm-exporter-g55kg                                    1/1     Running     0          114m
nvidia-dcgm-exporter-jdjzr                                    1/1     Running     0          107m
nvidia-dcgm-exporter-tw5jw                                    1/1     Running     0          110m
nvidia-dcgm-exporter-xxkvn                                    1/1     Running     0          108m
nvidia-device-plugin-daemonset-gdbm9                          1/1     Running     0          108m
nvidia-device-plugin-daemonset-q5g9d                          1/1     Running     0          114m
nvidia-device-plugin-daemonset-r9dms                          1/1     Running     0          110m
nvidia-device-plugin-daemonset-sw9tk                          1/1     Running     0          107m
nvidia-driver-daemonset-2j6n9                                 1/1     Running     0          111m
nvidia-driver-daemonset-9cf22                                 1/1     Running     0          109m
nvidia-driver-daemonset-cn4c6                                 1/1     Running     0          115m
nvidia-driver-daemonset-sl6kt                                 1/1     Running     0          108m
nvidia-mig-manager-59x4g                                      1/1     Running     0          110m
nvidia-mig-manager-5gglj                                      1/1     Running     0          107m
nvidia-mig-manager-gkztg                                      1/1     Running     0          114m
nvidia-mig-manager-x528k                                      1/1     Running     0          108m
nvidia-operator-validator-4p72k                               1/1     Running     0          108m
nvidia-operator-validator-d8rmw                               1/1     Running     0          110m
nvidia-operator-validator-x9gnk                               1/1     Running     0          107m
nvidia-operator-validator-xs66r                               1/1     Running     0          114m

$ kubectl logs nvidia-driver-daemonset-9cf22 -n gpu-operator
Copying imex nodes_config.cfg to /etc/nvidia-imex
Starting NVIDIA imex daemon...

$ ls /run/nvidia/driver/etc/nvidia-imex/
config.cfg        nodes_config.cfg  

$ kubectl exec -it  nvidia-driver-daemonset-2j6n9 -c nvidia-driver-ctr -n gpu-operator -- pgrep -f /usr/bin/nvidia-imex
239395

$ kubectl exec -it  nvidia-driver-daemonset-2j6n9 -c nvidia-driver-ctr -n gpu-operator -- nvidia-imex-ctl -a -q
{
 "nodeStatus": [
  {
   "hostname": "10.x.x.x",
   "message": "READY"
  },
  {
   "hostname": "10.x.x.x",
   "nodeId": 1,
   "message": "READY"
  },
  {
   "hostname": "10.x.x.x",
   "nodeId": 2,
   "message": "READY"
  },
  {
   "hostname": "10.x.x.x",
   "nodeId": 3,
   "message": "READY"
  }
 ]
}

{
 "nodeStatus": [
  {
   "hostname": "10.x.x.x",
   "message": "READY"
  },
  {
   "hostname": "10.x.x.x",
   "nodeId": 1,
   "message": "READY"
  },
  {
   "hostname": "10.x.x.x",
   "nodeId": 2,
   "message": "READY"
  },
  {
   "hostname": "10.x.x.x",
   "nodeId": 3,
   "message": "READY"
  }
 ]
}

# single node test
$ kubectl get pod
NAME                   READY   STATUS    RESTARTS   AGE
cuda-imex-test-67qsv   1/1     Running   0          3s
cuda-imex-test-6zfzr   1/1     Running   0          3s
cuda-imex-test-bcnhs   1/1     Running   0          3s
cuda-imex-test-mhmc9   1/1     Running   0          3s

$ kubectl logs cuda-imex-test-67qsv
channel0
Fri Nov 15 22:36:55 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GH200 96GB HBM3         On  |   00000009:01:00.0 Off |                    0 |
| N/A   37C    P0             96W /  900W |       1MiB /  97871MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GH200 96GB HBM3         On  |   00000019:01:00.0 Off |                    0 |
| N/A   37C    P0             82W /  900W |       1MiB /  97871MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
sh: 1: nvidia-modprobe: not found
Dispatcher pid: 1
Running test ipc_mempools_basic (pid: 10)
sh: 1: nvidia-modprobe: not found
sh: 1: nvidia-modprobe: not found
^^^^ PASS: ipc_mempools_basic (2356.6ms)
Total time: 2357ms
1 out of 1 ENABLED tests passed (100%)
&&&& cudaMallocAsync test PASSED

# multi-node test
$ kubectl get pod
NAME                             READY   STATUS             RESTARTS      AGE
mpi-memcpy-test-launcher-ptsqw   0/1     CrashLoopBackOff   1 (11s ago)   13s
mpi-memcpy-test-worker-0         1/1     Running            0             13s
mpi-memcpy-test-worker-1         1/1     Running            0             13s
mpi-memcpy-test-worker-2         1/1     Running            0             13s
mpi-memcpy-test-worker-3         1/1     Running            0             13s

$ kubectl logs mpi-memcpy-test-launcher-ptsqw
Warning: Permanently added '[mpi-memcpy-test-worker-0.mpi-memcpy-test.default.svc]:2222' (ECDSA) to the list of known hosts.
Warning: Permanently added '[mpi-memcpy-test-worker-3.mpi-memcpy-test.default.svc]:2222' (ECDSA) to the list of known hosts.
Warning: Permanently added '[mpi-memcpy-test-worker-1.mpi-memcpy-test.default.svc]:2222' (ECDSA) to the list of known hosts.
Warning: Permanently added '[mpi-memcpy-test-worker-2.mpi-memcpy-test.default.svc]:2222' (ECDSA) to the list of known hosts.
[mpi-memcpy-test-worker-3:00014] MCW rank 7 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
[mpi-memcpy-test-worker-3:00014] MCW rank 6 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
[mpi-memcpy-test-worker-2:00011] MCW rank 4 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
[mpi-memcpy-test-worker-2:00011] MCW rank 5 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
[mpi-memcpy-test-worker-1:00011] MCW rank 3 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
[mpi-memcpy-test-worker-1:00011] MCW rank 2 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
[mpi-memcpy-test-worker-0:00011] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
[mpi-memcpy-test-worker-0:00011] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
sh: 1: nvidia-modprobe: not found
sh: 1: nvidia-modprobe: not found
sh: 1: nvidia-modprobe: not found
sh: 1: nvidia-modprobe: not found
sh: 1: nvidia-modprobe: not found
sh: 1: nvidia-modprobe: not found
sh: 1: nvidia-modprobe: not found
sh: 1: nvidia-modprobe: not found
 0/ 8 mpi-memcpy-test-worker-0( 0/ 2) nodeid=0x1ddeda8c fabric=1 Device  0 102GB 132SMs Compute  9.0 PCIE-01:00.09
 4/ 8 mpi-memcpy-test-worker-2( 0/ 2) nodeid=0x1ddeda8a fabric=1 Device  0 102GB 132SMs Compute  9.0 PCIE-01:00.09
 2/ 8 mpi-memcpy-test-worker-1( 0/ 2) nodeid=0x1ddeda8b fabric=1 Device  0 102GB 132SMs Compute  9.0 PCIE-01:00.09
 6/ 8 mpi-memcpy-test-worker-3( 0/ 2) nodeid=0x1ddeda89 fabric=1 Device  0 102GB 132SMs Compute  9.0 PCIE-01:00.09
 5/ 8 mpi-memcpy-test-worker-2( 1/ 2) nodeid=0x1ddeda8a fabric=1 Device  1 102GB 132SMs Compute  9.0 PCIE-01:00.19
 3/ 8 mpi-memcpy-test-worker-1( 1/ 2) nodeid=0x1ddeda8b fabric=1 Device  1 102GB 132SMs Compute  9.0 PCIE-01:00.19
 1/ 8 mpi-memcpy-test-worker-0( 1/ 2) nodeid=0x1ddeda8c fabric=1 Device  1 102GB 132SMs Compute  9.0 PCIE-01:00.19
 7/ 8 mpi-memcpy-test-worker-3( 1/ 2) nodeid=0x1ddeda89 fabric=1 Device  1 102GB 132SMs Compute  9.0 PCIE-01:00.19
cuMemcpyAsync Write:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 395.06 395.04 394.99 394.97 394.99 395.13 395.07 
     1 395.08   0.00 395.04 394.98 395.07 395.01 395.13 394.98 
     2 394.98 395.05   0.00 395.06 395.03 395.11 395.04 394.97 
     3 395.08 395.09 395.02   0.00 395.10 395.02 395.01 395.06 
     4 395.04 394.97 394.95 395.21   0.00 395.05 395.03 395.04 
     5 394.98 395.10 394.98 395.04 394.95   0.00 395.02 395.10 
     6 394.92 395.09 395.08 395.14 395.04 395.01   0.00 394.95 
     7 394.99 394.97 394.97 395.04 395.06 395.05 395.00   0.00 

cuMemcpyAsync Read:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 391.75 391.77 391.75 391.82 391.82 391.80 391.72 
     1 391.72   0.00 391.87 391.69 391.80 391.88 391.77 391.90 
     2 391.70 391.84   0.00 391.77 391.72 391.87 391.75 391.80 
     3 391.75 391.81 391.82   0.00 391.68 391.77 391.67 391.68 
     4 391.76 391.83 391.79 391.77   0.00 391.76 391.90 391.74 
     5 391.81 391.66 391.71 391.80 391.83   0.00 391.83 391.73 
     6 391.74 391.88 391.81 391.73 391.97 391.88   0.00 391.82 
     7 391.74 391.89 391.82 391.72 391.85 391.87 391.76   0.00 

memcpyKernel Write 32b stride 32 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 158.54 158.39 158.47 158.72 158.87 158.55 158.54 
     1 158.48   0.00 158.43 158.41 158.76 158.86 158.56 158.66 
     2 158.35 158.41   0.00 158.32 158.83 158.62 158.44 158.50 
     3 158.39 158.45 158.24   0.00 158.52 158.65 158.41 158.53 
     4 158.69 158.69 158.68 158.67   0.00 158.98 158.71 158.89 
     5 158.74 158.74 158.67 158.62 158.90   0.00 158.84 158.87 
     6 158.56 158.57 158.40 158.38 158.70 158.91   0.00 158.67 
     7 158.59 158.63 158.47 158.58 158.87 158.92 158.63   0.00 

memcpyKernel Write 32b stride 128 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 372.82 372.83 372.75 372.46 372.46 372.74 372.70 
     1 373.01   0.00 372.66 372.59 372.58 372.92 372.69 372.84 
     2 372.76 372.93   0.00 372.74 372.42 372.55 372.71 372.62 
     3 372.92 372.78 372.61   0.00 372.76 372.54 372.89 372.66 
     4 372.81 373.08 373.09 372.93   0.00 373.09 373.00 372.95 
     5 373.12 372.90 373.38 372.81 372.74   0.00 373.27 373.19 
     6 372.85 373.00 373.06 372.77 372.98 372.87   0.00 373.26 
     7 372.85 372.51 372.89 372.86 373.19 373.19 373.15   0.00 

memcpyKernel Write 32b stride 512 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 376.42 376.51 376.36 376.36 376.40 376.44 376.51 
     1 376.31   0.00 376.31 376.27 376.42 376.37 376.37 376.22 
     2 376.27 376.34   0.00 376.14 376.26 376.30 376.35 376.42 
     3 376.33 376.37 376.47   0.00 376.25 376.33 376.33 376.42 
     4 376.33 376.27 376.41 376.43   0.00 376.22 376.28 376.27 
     5 376.48 376.48 376.47 376.47 376.47   0.00 376.50 376.53 
     6 376.21 376.43 376.19 376.33 376.24 376.21   0.00 376.32 
     7 375.96 376.21 376.22 376.38 376.16 376.19 376.22   0.00 

memcpyKernel Write 64b stride 32 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 298.05 297.86 297.62 298.72 299.06 297.82 298.39 
     1 298.10   0.00 297.82 297.89 298.80 298.99 298.34 298.57 
     2 297.89 297.91   0.00 297.29 298.68 298.54 297.94 297.98 
     3 297.93 297.89 297.50   0.00 298.34 298.36 297.94 298.15 
     4 298.91 298.92 298.68 298.71   0.00 299.64 299.03 299.23 
     5 298.76 299.03 298.72 298.50 299.61   0.00 298.87 299.20 
     6 298.35 298.09 297.69 297.96 298.92 299.08   0.00 298.49 
     7 298.49 298.41 298.36 298.04 299.22 298.95 298.38   0.00 

memcpyKernel Write 64b stride 128 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 375.89 375.60 375.85 375.91 375.92 375.96 375.85 
     1 375.94   0.00 375.76 375.67 375.86 375.95 375.98 375.87 
     2 375.83 375.81   0.00 375.74 375.72 375.67 375.64 375.76 
     3 375.78 375.73 375.60   0.00 375.86 376.04 375.85 375.81 
     4 375.97 375.80 376.08 376.11   0.00 376.12 375.75 376.04 
     5 375.90 375.91 375.90 376.00 375.91   0.00 375.91 375.72 
     6 375.86 375.89 376.02 376.08 376.04 375.76   0.00 375.65 
     7 376.01 376.07 375.70 375.84 376.17 376.07 376.02   0.00 

memcpyKernel Write 64b stride 512 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 376.31 376.34 376.65 376.33 376.31 376.54 376.40 
     1 376.26   0.00 376.31 376.59 376.24 376.27 376.22 376.16 
     2 376.41 376.44   0.00 376.62 376.05 376.38 376.45 376.23 
     3 376.46 376.22 376.11   0.00 376.54 376.39 376.36 376.52 
     4 376.28 376.34 376.07 376.27   0.00 376.14 376.10 376.29 
     5 376.26 376.21 376.01 376.36 376.14   0.00 376.51 376.40 
     6 376.38 376.20 376.31 376.34 376.29 376.39   0.00 376.16 
     7 376.36 376.23 376.57 376.33 376.56 376.26 376.42   0.00 

memcpyKernel Write 128b stride 32 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 369.94 370.08 370.22 370.80 370.14 370.20 370.50 
     1 369.80   0.00 370.02 369.57 369.85 369.76 370.10 369.95 
     2 370.16 369.92   0.00 370.20 370.06 370.00 369.70 369.60 
     3 369.38 369.80 369.95   0.00 370.41 369.78 369.66 369.84 
     4 370.46 370.05 370.29 369.76   0.00 370.03 369.79 370.48 
     5 370.52 370.64 370.55 370.31 370.61   0.00 370.34 370.44 
     6 370.26 369.96 370.20 370.17 370.77 370.66   0.00 370.30 
     7 370.41 370.23 370.30 370.42 370.00 370.70 370.72   0.00 

memcpyKernel Write 128b stride 128 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 376.62 376.57 376.42 376.65 376.55 376.62 376.52 
     1 376.55   0.00 376.53 376.50 376.56 376.40 376.46 376.40 
     2 376.38 376.59   0.00 376.60 376.51 376.50 376.51 376.54 
     3 376.52 376.54 376.39   0.00 376.58 376.39 376.56 376.57 
     4 376.49 376.32 376.47 376.51   0.00 376.45 376.42 376.41 
     5 376.61 376.55 376.52 376.55 376.63   0.00 376.50 376.59 
     6 376.29 376.26 376.45 376.49 376.45 376.58   0.00 376.53 
     7 376.41 376.41 376.35 376.41 376.66 376.51 376.52   0.00 

memcpyKernel Write 128b stride 512 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 376.76 376.78 376.79 376.89 376.80 376.86 376.82 
     1 376.98   0.00 376.81 376.83 376.75 376.85 376.67 376.85 
     2 376.83 376.77   0.00 376.74 376.68 376.63 376.72 376.58 
     3 376.85 376.84 376.79   0.00 376.82 376.74 376.58 376.64 
     4 376.77 376.77 376.72 376.73   0.00 376.73 376.92 376.93 
     5 376.80 376.85 376.92 376.92 376.83   0.00 376.89 376.82 
     6 376.85 376.83 376.96 376.87 376.57 376.86   0.00 376.80 
     7 376.84 376.90 376.77 376.67 376.69 376.77 376.81   0.00 

memcpyKernel Read 32b stride 32 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 363.44 364.03 363.73 364.36 364.19 364.59 363.51 
     1 363.00   0.00 364.90 364.34 363.80 364.11 364.16 364.66 
     2 363.89 364.34   0.00 364.60 364.15 364.90 364.19 364.45 
     3 364.78 363.83 363.75   0.00 364.16 363.86 364.39 364.33 
     4 363.83 364.90 363.89 364.34   0.00 364.57 363.71 363.62 
     5 365.22 364.85 364.27 364.81 364.83   0.00 364.10 365.31 
     6 364.97 365.69 364.60 364.70 364.97 365.14   0.00 364.42 
     7 364.69 365.38 364.76 364.70 364.16 365.24 364.35   0.00 

memcpyKernel Read 32b stride 128 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 370.66 370.32 370.34 370.23 370.60 370.05 369.31 
     1 369.96   0.00 370.06 369.94 369.96 370.55 370.01 369.73 
     2 369.23 369.72   0.00 370.25 370.11 369.96 370.02 370.24 
     3 370.29 370.15 369.98   0.00 370.44 370.37 370.20 369.66 
     4 368.19 367.61 367.69 367.56   0.00 367.75 368.06 367.54 
     5 370.58 370.47 370.26 369.69 370.39   0.00 370.74 370.53 
     6 370.04 369.87 370.11 370.32 370.07 369.84   0.00 370.11 
     7 370.43 369.55 370.30 370.31 369.44 369.94 369.94   0.00 

memcpyKernel Read 32b stride 512 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 372.01 372.00 372.02 372.23 371.83 372.03 371.81 
     1 371.72   0.00 371.31 371.24 371.34 371.41 370.91 371.20 
     2 371.26 371.49   0.00 371.43 371.19 371.71 371.38 371.36 
     3 371.55 371.52 371.84   0.00 371.62 371.20 371.36 371.32 
     4 370.32 370.62 370.56 370.57   0.00 370.25 370.40 370.10 
     5 371.54 371.40 371.55 371.34 371.00   0.00 371.54 371.39 
     6 371.73 372.32 371.56 371.76 371.81 372.00   0.00 372.16 
     7 371.86 371.87 371.37 371.78 372.00 371.75 371.47   0.00 

memcpyKernel Read 64b stride 32 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 368.14 368.15 368.17 368.75 367.93 368.78 368.67 
     1 367.91   0.00 368.45 368.49 368.72 368.19 368.15 367.98 
     2 368.16 367.60   0.00 368.43 368.25 368.48 368.08 368.37 
     3 367.96 368.64 368.50   0.00 368.65 368.29 368.42 368.31 
     4 367.42 367.73 367.39 367.25   0.00 367.22 367.65 368.01 
     5 369.01 368.82 368.57 368.65 368.87   0.00 368.70 368.86 
     6 368.49 368.14 368.86 368.52 368.44 368.97   0.00 368.25 
     7 367.89 368.86 368.77 368.41 368.08 368.17 368.18   0.00 

memcpyKernel Read 64b stride 128 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 368.76 368.38 368.57 368.45 368.09 368.92 368.59 
     1 366.74   0.00 366.48 366.78 366.26 366.86 366.61 366.41 
     2 366.48 366.30   0.00 366.84 366.51 366.52 366.10 366.92 
     3 366.52 366.85 366.61   0.00 366.38 366.57 366.56 366.54 
     4 365.76 366.10 365.27 365.21   0.00 365.63 365.67 365.90 
     5 366.75 366.98 366.57 366.66 366.54   0.00 366.81 366.54 
     6 366.72 366.71 367.21 366.70 366.44 367.04   0.00 366.85 
     7 366.64 366.66 366.48 366.86 367.12 366.99 366.53   0.00 

memcpyKernel Read 64b stride 512 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 370.02 370.19 370.06 370.16 370.17 370.02 369.46 
     1 369.99   0.00 369.83 369.77 369.95 369.93 369.58 370.00 
     2 370.06 369.87   0.00 369.62 370.21 369.99 370.00 369.88 
     3 369.66 369.73 369.85   0.00 369.73 369.65 369.65 370.02 
     4 369.66 369.52 369.70 369.50   0.00 369.54 369.74 369.43 
     5 369.77 369.50 369.37 369.57 369.93   0.00 369.83 369.42 
     6 369.42 369.74 369.84 369.83 369.41 370.14   0.00 369.82 
     7 369.89 369.71 369.81 369.42 369.64 369.72 369.72   0.00 

memcpyKernel Read 128b stride 32 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 369.17 368.86 369.24 369.06 368.73 369.00 368.51 
     1 368.87   0.00 368.31 369.04 368.46 368.76 368.40 368.34 
     2 369.13 369.10   0.00 368.36 368.72 369.09 368.73 369.01 
     3 368.60 368.69 368.40   0.00 368.73 368.90 368.79 369.15 
     4 366.70 366.67 366.97 366.42   0.00 366.81 366.40 367.07 
     5 368.87 369.34 368.79 369.53 368.83   0.00 369.08 368.82 
     6 369.23 368.95 369.02 368.56 369.11 368.90   0.00 368.96 
     7 368.45 369.40 368.40 368.72 368.70 368.77 368.77   0.00 

memcpyKernel Read 128b stride 128 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 370.30 370.48 369.69 370.17 370.24 369.97 370.12 
     1 369.64   0.00 369.70 369.85 369.64 369.54 369.50 369.64 
     2 369.58 369.41   0.00 369.80 369.75 369.82 369.48 369.67 
     3 369.57 369.83 369.57   0.00 369.63 369.68 369.78 369.47 
     4 369.15 368.81 369.26 369.12   0.00 369.09 368.92 368.85 
     5 369.58 369.86 369.77 369.82 369.89   0.00 369.72 369.47 
     6 369.94 369.51 369.66 369.96 369.81 369.95   0.00 369.82 
     7 370.00 369.60 370.00 369.63 369.85 369.79 369.64   0.00 

memcpyKernel Read 128b stride 512 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 370.76 370.66 370.82 370.75 370.95 370.86 370.75 
     1 370.92   0.00 370.90 370.82 370.55 370.62 370.89 370.59 
     2 370.61 370.96   0.00 370.65 370.39 370.49 370.41 370.69 
     3 370.74 370.51 370.74   0.00 370.69 370.70 370.81 370.74 
     4 370.66 370.65 370.73 370.45   0.00 370.48 370.58 370.56 
     5 370.39 370.50 370.84 370.66 370.62   0.00 371.00 370.82 
     6 370.79 370.82 370.81 370.69 370.94 370.70   0.00 370.69 
     7 370.71 370.80 370.63 370.92 370.63 370.75 370.61   0.00 

PASS

@guptaNswati
Copy link
Contributor Author

Test 2 DRA mode by setting hostNetwork: true for driver container and --set nvidiaCtkPath=/usr/local/nvidia/toolkit/nvidia-ctk --set nvidiaDriverRoot=/run/nvidia/driver

$ kubectl get pod -n gpu-operator
NAME                                                          READY   STATUS      RESTARTS   AGE
gpu-feature-discovery-dlvcg                                   1/1     Running     0          12m
gpu-feature-discovery-rm9xl                                   1/1     Running     0          14m
gpu-feature-discovery-wf7f7                                   1/1     Running     0          14m
gpu-feature-discovery-xz2kp                                   1/1     Running     0          13m
gpu-operator-d48c8dc97-svngp                                  1/1     Running     0          47h
gpu-operator-node-feature-discovery-gc-7f546fd4bc-66zwx       1/1     Running     0          47h
gpu-operator-node-feature-discovery-master-8448c8896c-8f5tb   1/1     Running     0          47h
gpu-operator-node-feature-discovery-worker-9q9lw              1/1     Running     0          47h
gpu-operator-node-feature-discovery-worker-sp9fb              1/1     Running     0          47h
gpu-operator-node-feature-discovery-worker-x2428              1/1     Running     0          47h
gpu-operator-node-feature-discovery-worker-z4cw5              1/1     Running     0          47h
nvidia-container-toolkit-daemonset-d758n                      1/1     Running     0          14m
nvidia-container-toolkit-daemonset-dlcdw                      1/1     Running     0          12m
nvidia-container-toolkit-daemonset-f76jc                      1/1     Running     0          13m
nvidia-container-toolkit-daemonset-hbbz7                      1/1     Running     0          14m
nvidia-cuda-validator-hjb46                                   0/1     Completed   0          13m
nvidia-cuda-validator-lp8lx                                   0/1     Completed   0          12m
nvidia-cuda-validator-phz82                                   0/1     Completed   0          11m
nvidia-cuda-validator-zhnq4                                   0/1     Completed   0          11m
nvidia-dcgm-exporter-2xfkw                                    1/1     Running     0          14m
nvidia-dcgm-exporter-9f5g5                                    1/1     Running     0          14m
nvidia-dcgm-exporter-jj245                                    1/1     Running     0          12m
nvidia-dcgm-exporter-kgmdd                                    1/1     Running     0          13m
nvidia-device-plugin-daemonset-hg699                          1/1     Running     0          13m
nvidia-device-plugin-daemonset-kpgz8                          1/1     Running     0          14m
nvidia-device-plugin-daemonset-tfljw                          1/1     Running     0          14m
nvidia-device-plugin-daemonset-zgdsm                          1/1     Running     0          12m
nvidia-driver-daemonset-6jrgh                                 1/1     Running     0          16m
nvidia-driver-daemonset-7fkdx                                 1/1     Running     0          13m
nvidia-driver-daemonset-plp7g                                 1/1     Running     0          15m
nvidia-driver-daemonset-vwwgt                                 1/1     Running     0          14m
nvidia-mig-manager-49d9d                                      1/1     Running     0          13m
nvidia-mig-manager-5v4fr                                      1/1     Running     0          14m
nvidia-mig-manager-8dv98                                      1/1     Running     0          12m
nvidia-mig-manager-bvjsr                                      1/1     Running     0          14m
nvidia-operator-validator-mldmw                               1/1     Running     0          14m
nvidia-operator-validator-nq5vr                               1/1     Running     0          12m
nvidia-operator-validator-tn4q5                               1/1     Running     0          13m
nvidia-operator-validator-wcpjv                               1/1     Running     0          14m

$ kubectl logs nvidia-driver-daemonset-6jrgh -n gpu-operator
Copying imex nodes_config.cfg to /etc/nvidia-imex
Starting NVIDIA imex daemon...
Mounting NVIDIA driver rootfs...
Done, now waiting for signal

$ kubectl exec -it  nvidia-driver-daemonset-6jrgh  -c nvidia-driver-ctr -n gpu-operator -- cat /var/log/nvidia-imex.log
IMEX_WAIT_FOR_QUORUM != FULL, continuing initialization without waiting for connections to all nodes.
GPU event successfully subscribed

$ kubectl get pod -n nvidia-dra-driver 
NAME                                                           READY   STATUS    RESTARTS   AGE
nvidia-dra-driver-k8s-dra-driver-controller-7f78c745f6-x8lt8   1/1     Running   0          8m23s
nvidia-dra-driver-k8s-dra-driver-kubelet-plugin-6fbvf          1/1     Running   0          8m23s
nvidia-dra-driver-k8s-dra-driver-kubelet-plugin-ccvlh          1/1     Running   0          8m23s
nvidia-dra-driver-k8s-dra-driver-kubelet-plugin-jcjkh          1/1     Running   0          8m23s
nvidia-dra-driver-k8s-dra-driver-kubelet-plugin-z4xqm          1/1     Running   0          8m23s

$ cat imex-dra-test-pod.yaml
---
apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClaim
metadata:
  name: imex-dra-test-pod-channel
spec:
  devices:
    requests:
    - name: channel
      deviceClassName: imex.nvidia.com
---
apiVersion: v1
kind: Pod
metadata:
  name: imex-dra-test
spec:
  containers:
  - name: ctr
    image: ubuntu:22.04
    command: ["bash", "-c"]
    args: ["ls -la /dev/nvidia-caps-imex-channels; trap 'exit 0' TERM; sleep 9999 & wait"]
    resources:
      claims:
      - name: imex-channel
  resourceClaims:
  - name: imex-channel
    resourceClaimName: imex-dra-test-pod-channel
    
$ kubectl logs imex-dra-test
total 0
drwxr-xr-x 2 root root     60 Nov 20 23:55 .
drwxr-xr-x 6 root root    380 Nov 20 23:55 ..
crw-rw-rw- 1 root root 506, 0 Nov 20 23:55 channel0

# mpi_memcpy test
$ kubectl logs mpi-memcpy-dra-test-launcher-fg9lf -f
Warning: Permanently added '[mpi-memcpy-dra-test-worker-3.mpi-memcpy-dra-test.default.svc]:2222' (ECDSA) to the list of known hosts.
Warning: Permanently added '[mpi-memcpy-dra-test-worker-1.mpi-memcpy-dra-test.default.svc]:2222' (ECDSA) to the list of known hosts.
Warning: Permanently added '[mpi-memcpy-dra-test-worker-2.mpi-memcpy-dra-test.default.svc]:2222' (ECDSA) to the list of known hosts.
Warning: Permanently added '[mpi-memcpy-dra-test-worker-0.mpi-memcpy-dra-test.default.svc]:2222' (ECDSA) to the list of known hosts.
[mpi-memcpy-dra-test-worker-0:00013] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
[mpi-memcpy-dra-test-worker-0:00013] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
[mpi-memcpy-dra-test-worker-1:00011] MCW rank 2 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
[mpi-memcpy-dra-test-worker-1:00011] MCW rank 3 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
[mpi-memcpy-dra-test-worker-2:00017] MCW rank 5 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
[mpi-memcpy-dra-test-worker-2:00017] MCW rank 4 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
[mpi-memcpy-dra-test-worker-3:00017] MCW rank 6 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
[mpi-memcpy-dra-test-worker-3:00017] MCW rank 7 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
sh: 1: nvidia-modprobe: not found
sh: 1: nvidia-modprobe: not found
sh: 1: nvidia-modprobe: not found
sh: 1: nvidia-modprobe: not found
sh: 1: nvidia-modprobe: not found
sh: 1: nvidia-modprobe: not found
sh: 1: nvidia-modprobe: not found
sh: 1: nvidia-modprobe: not found
 0/ 8 mpi-memcpy-dra-test-worker-0( 0/ 2) nodeid=0xc4cab78 fabric=1 Device  0 102GB 132SMs Compute  9.0 PCIE-01:00.09
 2/ 8 mpi-memcpy-dra-test-worker-1( 0/ 2) nodeid=0xc4cab79 fabric=1 Device  0 102GB 132SMs Compute  9.0 PCIE-01:00.09
 6/ 8 mpi-memcpy-dra-test-worker-3( 0/ 2) nodeid=0xc4cab7b fabric=1 Device  0 102GB 132SMs Compute  9.0 PCIE-01:00.09
 4/ 8 mpi-memcpy-dra-test-worker-2( 0/ 2) nodeid=0xc4cab7a fabric=1 Device  0 102GB 132SMs Compute  9.0 PCIE-01:00.09
 5/ 8 mpi-memcpy-dra-test-worker-2( 1/ 2) nodeid=0xc4cab7a fabric=1 Device  1 102GB 132SMs Compute  9.0 PCIE-01:00.19
 1/ 8 mpi-memcpy-dra-test-worker-0( 1/ 2) nodeid=0xc4cab78 fabric=1 Device  1 102GB 132SMs Compute  9.0 PCIE-01:00.19
 3/ 8 mpi-memcpy-dra-test-worker-1( 1/ 2) nodeid=0xc4cab79 fabric=1 Device  1 102GB 132SMs Compute  9.0 PCIE-01:00.19
 7/ 8 mpi-memcpy-dra-test-worker-3( 1/ 2) nodeid=0xc4cab7b fabric=1 Device  1 102GB 132SMs Compute  9.0 PCIE-01:00.19
cuMemcpyAsync Write:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 395.13 395.08 395.05 395.02 395.08 395.11 395.04 
     1 395.05   0.00 394.98 395.03 394.99 395.09 395.01 395.02 
     2 394.98 395.03   0.00 395.05 394.99 394.99 394.98 395.05 
     3 395.00 394.95 395.12   0.00 394.97 395.01 395.04 395.07 
     4 394.95 394.95 395.12 395.12   0.00 394.98 395.07 395.08 
     5 395.02 395.09 395.11 395.14 395.07   0.00 395.01 395.10 
     6 395.03 395.04 395.07 395.04 395.05 395.08   0.00 395.06 
     7 395.00 395.02 395.06 395.08 395.01 395.12 395.07   0.00 

cuMemcpyAsync Read:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 391.83 391.74 391.88 391.69 391.94 391.80 391.77 
     1 391.82   0.00 391.76 391.82 391.82 391.75 391.76 391.79 
     2 391.86 391.84   0.00 391.72 391.78 391.79 391.77 391.78 
     3 391.75 391.85 391.66   0.00 391.74 391.69 391.74 391.80 
     4 391.80 391.84 391.79 391.88   0.00 391.82 391.80 391.65 
     5 391.77 391.82 391.69 391.85 391.76   0.00 391.72 391.81 
     6 391.77 391.75 391.71 391.80 391.72 391.74   0.00 391.75 
     7 391.71 391.77 391.61 391.67 391.73 391.82 391.79   0.00 

memcpyKernel Write 32b stride 32 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 159.02 158.81 158.86 158.75 158.76 158.58 158.68 
     1 158.97   0.00 158.87 158.99 158.81 158.86 158.68 158.58 
     2 158.73 158.85   0.00 158.76 158.45 158.52 158.33 158.49 
     3 158.85 158.81 158.61   0.00 158.60 158.65 158.52 158.57 
     4 158.80 158.94 158.44 158.69   0.00 158.58 158.27 158.46 
     5 158.72 158.74 158.49 158.60 158.57   0.00 158.32 158.41 
     6 158.54 158.60 158.49 158.37 158.31 158.36   0.00 158.26 
     7 158.74 158.56 158.42 158.51 158.49 158.41 158.23   0.00 

memcpyKernel Write 32b stride 128 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 373.05 372.56 372.67 372.97 372.26 372.80 373.14 
     1 373.28   0.00 373.43 373.32 373.24 373.39 373.51 373.05 
     2 373.13 373.29   0.00 373.02 372.47 373.04 373.16 373.28 
     3 372.86 372.74 372.69   0.00 373.18 373.12 373.03 372.66 
     4 372.89 372.74 372.85 372.37   0.00 372.88 372.52 372.62 
     5 373.17 373.17 372.88 372.82 373.17   0.00 373.13 373.03 
     6 372.90 372.86 372.80 373.03 373.12 372.78   0.00 372.67 
     7 373.03 372.80 372.61 372.97 372.68 372.70 372.74   0.00 

memcpyKernel Write 32b stride 512 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 376.34 376.34 376.29 376.27 376.28 376.26 376.47 
     1 376.46   0.00 376.46 376.57 376.43 376.42 376.46 376.46 
     2 376.49 376.29   0.00 376.34 376.33 376.34 376.34 376.26 
     3 376.34 376.29 376.27   0.00 376.31 376.29 376.29 376.30 
     4 376.16 376.24 376.30 376.51   0.00 376.25 376.30 376.46 
     5 376.29 376.28 376.31 376.19 376.05   0.00 376.34 376.20 
     6 376.25 376.43 376.33 376.26 376.21 376.26   0.00 376.19 
     7 376.24 376.36 376.19 376.42 376.24 376.24 376.42   0.00 

memcpyKernel Write 64b stride 32 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 299.76 299.10 299.20 299.08 299.06 298.72 298.53 
     1 299.61   0.00 299.22 299.10 299.06 298.83 298.59 298.58 
     2 298.66 299.06   0.00 298.42 298.14 298.37 297.97 297.90 
     3 299.25 299.19 298.66   0.00 298.43 298.52 298.33 298.10 
     4 298.83 298.92 297.96 298.53   0.00 298.08 297.80 297.76 
     5 298.73 298.90 298.30 298.37 298.09   0.00 297.88 297.76 
     6 298.60 298.67 297.93 298.05 297.87 297.90   0.00 297.29 
     7 298.38 298.37 297.88 297.86 297.72 297.74 297.66   0.00 

memcpyKernel Write 64b stride 128 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 376.03 376.08 376.07 375.98 376.04 376.24 376.15 
     1 375.95   0.00 375.82 375.92 376.13 375.73 375.85 376.04 
     2 376.00 376.06   0.00 376.22 376.01 375.84 376.08 376.17 
     3 375.91 376.21 375.91   0.00 376.09 376.07 376.13 376.06 
     4 375.94 375.74 375.71 375.58   0.00 375.93 375.88 375.73 
     5 375.70 375.98 375.98 375.95 375.94   0.00 376.14 375.94 
     6 375.96 375.61 376.00 375.63 376.06 376.11   0.00 376.12 
     7 375.67 375.66 375.77 375.92 375.70 375.55 375.96   0.00 

memcpyKernel Write 64b stride 512 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 376.32 375.96 376.18 376.07 376.05 376.05 375.78 
     1 376.29   0.00 376.33 376.29 376.15 376.32 376.37 376.43 
     2 376.48 376.31   0.00 376.33 376.40 376.26 376.24 376.29 
     3 376.34 375.93 376.32   0.00 376.29 376.14 376.30 376.31 
     4 376.39 376.16 376.30 376.41   0.00 376.33 376.56 376.52 
     5 376.22 376.54 376.09 376.37 376.32   0.00 376.40 376.19 
     6 376.16 376.27 376.26 375.96 376.29 376.37   0.00 376.30 
     7 376.50 376.38 376.30 376.36 376.25 376.62 376.52   0.00 

memcpyKernel Write 128b stride 32 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 369.83 369.92 370.08 369.91 370.59 369.89 369.85 
     1 370.46   0.00 370.83 370.02 369.72 370.26 370.25 370.02 
     2 370.02 370.19   0.00 370.36 370.25 370.39 370.54 370.41 
     3 370.15 369.96 370.22   0.00 370.11 370.31 370.02 370.08 
     4 370.34 370.04 370.35 370.02   0.00 370.14 369.48 370.49 
     5 370.66 370.27 370.40 370.27 370.18   0.00 370.85 370.01 
     6 370.36 370.68 370.17 370.16 369.78 370.13   0.00 370.58 
     7 370.36 369.81 370.31 370.17 369.35 369.77 370.01   0.00 

memcpyKernel Write 128b stride 128 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 376.28 376.56 376.45 376.36 376.50 376.33 376.49 
     1 376.61   0.00 376.70 376.65 376.55 376.52 376.48 376.59 
     2 376.55 376.39   0.00 376.49 376.26 376.39 376.44 376.41 
     3 376.57 376.46 376.47   0.00 376.48 376.52 376.56 376.42 
     4 376.49 376.52 376.63 376.52   0.00 376.50 376.53 376.52 
     5 376.26 376.42 376.55 376.21 376.56   0.00 376.25 376.43 
     6 376.37 376.49 376.29 376.39 376.53 376.49   0.00 376.40 
     7 376.51 376.74 376.70 376.61 376.57 376.34 376.37   0.00 

memcpyKernel Write 128b stride 512 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 376.80 376.81 376.95 376.77 376.87 376.86 376.84 
     1 376.76   0.00 376.96 376.83 376.84 376.83 376.96 376.85 
     2 376.80 376.85   0.00 376.68 376.92 376.57 376.69 376.64 
     3 376.71 376.72 376.82   0.00 376.72 376.88 376.83 376.67 
     4 376.79 376.90 376.92 376.83   0.00 376.80 376.67 376.88 
     5 376.87 376.82 376.81 376.86 376.85   0.00 376.72 376.78 
     6 376.93 376.85 376.73 376.81 376.83 376.82   0.00 376.76 
     7 376.79 376.80 376.89 376.81 376.74 376.91 376.72   0.00 

memcpyKernel Read 32b stride 32 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 363.50 364.51 364.33 364.57 364.48 364.07 363.77 
     1 364.42   0.00 365.13 364.78 362.33 364.25 365.44 365.96 
     2 365.06 364.62   0.00 364.30 364.51 363.79 365.03 364.67 
     3 363.79 364.19 363.50   0.00 364.57 364.49 363.78 363.80 
     4 363.08 363.88 364.13 364.96   0.00 363.65 364.01 364.31 
     5 363.33 364.87 364.40 365.38 364.90   0.00 364.64 364.49 
     6 363.89 364.69 363.77 364.28 364.52 364.59   0.00 365.11 
     7 363.76 363.87 364.46 363.67 364.38 363.01 364.87   0.00 

memcpyKernel Read 32b stride 128 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 368.19 367.85 367.73 368.03 367.78 367.63 367.88 
     1 369.81   0.00 369.93 369.57 370.88 369.95 370.11 369.91 
     2 369.84 369.94   0.00 370.33 370.61 370.11 370.14 369.64 
     3 367.58 367.49 367.50   0.00 367.19 367.00 367.55 367.85 
     4 370.23 370.22 370.59 370.95   0.00 370.11 370.54 370.43 
     5 370.03 370.26 369.87 369.94 370.33   0.00 370.09 370.05 
     6 367.60 367.28 368.27 367.81 367.45 367.87   0.00 368.06 
     7 369.89 370.39 370.04 370.16 370.17 370.19 369.80   0.00 

memcpyKernel Read 32b stride 512 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 370.90 370.56 370.68 370.57 370.16 370.36 370.61 
     1 371.36   0.00 371.19 370.99 371.34 371.38 371.24 371.15 
     2 371.90 371.52   0.00 371.88 371.89 371.98 372.00 371.62 
     3 370.51 371.05 370.82   0.00 370.63 370.81 370.78 370.49 
     4 372.26 372.59 372.23 372.22   0.00 371.86 371.95 372.33 
     5 371.90 372.04 371.52 372.00 371.80   0.00 371.66 372.00 
     6 370.63 370.77 370.44 370.61 370.25 370.72   0.00 370.71 
     7 371.93 372.20 372.21 372.48 372.23 371.72 372.31   0.00 

memcpyKernel Read 64b stride 32 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 367.61 368.01 367.56 367.76 367.90 367.41 367.99 
     1 369.00   0.00 369.29 369.01 369.04 369.14 369.00 368.68 
     2 368.87 368.09   0.00 368.71 368.67 367.84 368.22 367.58 
     3 368.44 367.84 368.00   0.00 367.71 367.68 367.40 368.05 
     4 368.32 368.83 368.79 368.60   0.00 368.38 368.36 368.26 
     5 368.98 368.07 368.52 368.23 368.40   0.00 368.50 368.01 
     6 367.75 367.84 367.77 367.82 367.71 367.73   0.00 367.76 
     7 367.37 368.84 368.78 368.86 368.77 368.49 368.67   0.00 

memcpyKernel Read 64b stride 128 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 365.96 365.88 365.86 365.67 365.89 365.62 365.52 
     1 366.76   0.00 367.07 366.51 366.96 366.87 366.81 366.41 
     2 367.05 366.71   0.00 366.71 366.52 366.85 366.93 366.84 
     3 365.98 366.07 365.86   0.00 365.97 365.74 365.64 365.45 
     4 368.31 368.29 368.61 368.66   0.00 368.64 368.28 368.82 
     5 366.52 366.64 366.78 366.56 366.65   0.00 366.57 366.61 
     6 365.49 365.17 365.60 365.39 365.97 365.92   0.00 365.50 
     7 368.70 368.31 368.32 368.34 368.56 368.72 368.71   0.00 

memcpyKernel Read 64b stride 512 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 369.89 369.67 369.80 369.72 369.44 369.44 369.70 
     1 370.08   0.00 369.54 369.71 369.76 369.31 369.78 369.70 
     2 369.79 369.81   0.00 370.15 369.68 369.89 369.96 370.07 
     3 369.73 369.56 369.68   0.00 369.51 369.73 369.81 369.81 
     4 370.00 369.95 369.80 369.75   0.00 369.65 370.09 370.10 
     5 369.62 369.76 369.81 369.50 369.98   0.00 369.85 369.63 
     6 369.37 369.41 369.39 369.63 369.46 369.49   0.00 369.73 
     7 370.16 369.89 369.77 369.77 369.99 370.08 370.16   0.00 

memcpyKernel Read 128b stride 32 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 367.22 366.71 367.05 366.37 367.02 366.87 367.14 
     1 368.65   0.00 368.62 369.09 368.80 368.60 369.26 369.12 
     2 368.54 368.76   0.00 369.00 368.88 368.40 369.01 368.72 
     3 366.98 366.61 367.06   0.00 366.34 366.78 367.30 367.15 
     4 368.92 369.00 369.15 369.11   0.00 369.20 368.99 368.92 
     5 369.09 368.59 368.59 368.38 368.42   0.00 368.59 368.80 
     6 366.94 367.17 366.94 366.93 366.89 366.67   0.00 366.43 
     7 369.04 369.27 369.33 368.63 369.11 369.09 368.70   0.00 

memcpyKernel Read 128b stride 128 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 368.90 369.14 368.94 369.06 369.06 369.11 369.00 
     1 369.59   0.00 369.74 369.71 369.49 369.79 369.56 369.79 
     2 369.84 370.01   0.00 369.74 370.13 369.92 369.92 370.09 
     3 368.96 369.05 368.90   0.00 369.21 369.22 369.28 369.03 
     4 370.07 370.16 370.01 370.63   0.00 369.96 370.24 370.23 
     5 369.87 369.72 369.91 370.11 369.76   0.00 369.97 369.99 
     6 369.11 369.14 369.01 368.85 368.86 369.37   0.00 369.32 
     7 369.98 370.12 370.06 370.33 369.98 370.21 370.34   0.00 

memcpyKernel Read 128b stride 512 threads/SM:

Measured bandwidth in GB/s of size 512 MiB:
            0      1      2      3      4      5      6      7 
--------------------------------------------------------------
     0   0.00 370.90 370.39 370.69 370.73 370.63 370.70 370.71 
     1 370.83   0.00 370.83 370.67 370.78 370.78 370.77 370.90 
     2 370.94 370.71   0.00 370.79 370.75 370.81 370.68 370.69 
     3 370.66 370.73 370.91   0.00 370.49 371.00 370.48 370.66 
     4 370.69 370.79 370.61 370.83   0.00 370.64 370.95 370.66 
     5 370.78 370.76 370.60 370.71 371.00   0.00 370.77 370.86 
     6 370.50 370.91 370.84 370.38 370.56 370.67   0.00 370.66 
     7 370.78 370.78 370.53 370.59 370.75 370.72 370.76   0.00 

PASS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants