Skip to content

Conversation

alpha-baby
Copy link
Contributor

@alpha-baby alpha-baby commented Sep 24, 2025

1. Optimization Strategies

Update later

2. test env

#nvidia-smi topo -m
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    NIC6    NIC7    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV18    NV18    NV18    NV18    NV18    NV18    NV18    PIX     NODE    NODE    NODE    SYS     SYS     SYS     SYS     0-47,96-143     0               N/A
GPU1    NV18     X      NV18    NV18    NV18    NV18    NV18    NV18    NODE    PIX     NODE    NODE    SYS     SYS     SYS     SYS     0-47,96-143     0               N/A
GPU2    NV18    NV18     X      NV18    NV18    NV18    NV18    NV18    NODE    NODE    PIX     NODE    SYS     SYS     SYS     SYS     0-47,96-143     0               N/A
GPU3    NV18    NV18    NV18     X      NV18    NV18    NV18    NV18    NODE    NODE    NODE    PIX     SYS     SYS     SYS     SYS     0-47,96-143     0               N/A
GPU4    NV18    NV18    NV18    NV18     X      NV18    NV18    NV18    SYS     SYS     SYS     SYS     PIX     NODE    NODE    NODE    48-95,144-191   1               N/A
GPU5    NV18    NV18    NV18    NV18    NV18     X      NV18    NV18    SYS     SYS     SYS     SYS     NODE    PIX     NODE    NODE    48-95,144-191   1               N/A
GPU6    NV18    NV18    NV18    NV18    NV18    NV18     X      NV18    SYS     SYS     SYS     SYS     NODE    NODE    PIX     NODE    48-95,144-191   1               N/A
GPU7    NV18    NV18    NV18    NV18    NV18    NV18    NV18     X      SYS     SYS     SYS     SYS     NODE    NODE    NODE    PIX     48-95,144-191   1               N/A
NIC0    PIX     NODE    NODE    NODE    SYS     SYS     SYS     SYS      X      NODE    NODE    NODE    SYS     SYS     SYS     SYS
NIC1    NODE    PIX     NODE    NODE    SYS     SYS     SYS     SYS     NODE     X      NODE    NODE    SYS     SYS     SYS     SYS
NIC2    NODE    NODE    PIX     NODE    SYS     SYS     SYS     SYS     NODE    NODE     X      NODE    SYS     SYS     SYS     SYS
NIC3    NODE    NODE    NODE    PIX     SYS     SYS     SYS     SYS     NODE    NODE    NODE     X      SYS     SYS     SYS     SYS
NIC4    SYS     SYS     SYS     SYS     PIX     NODE    NODE    NODE    SYS     SYS     SYS     SYS      X      NODE    NODE    NODE
NIC5    SYS     SYS     SYS     SYS     NODE    PIX     NODE    NODE    SYS     SYS     SYS     SYS     NODE     X      NODE    NODE
NIC6    SYS     SYS     SYS     SYS     NODE    NODE    PIX     NODE    SYS     SYS     SYS     SYS     NODE    NODE     X      NODE
NIC7    SYS     SYS     SYS     SYS     NODE    NODE    NODE    PIX     SYS     SYS     SYS     SYS     NODE    NODE    NODE     X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_bond_0
  NIC1: mlx5_bond_1
  NIC2: mlx5_bond_2
  NIC3: mlx5_bond_3
  NIC4: mlx5_bond_4
  NIC5: mlx5_bond_5
  NIC6: mlx5_bond_6
  NIC7: mlx5_bond_7

NIC:

Mellanox [ConnectX-7]
2 port
per port speed: 200 Gbps

3. performance test result:

# num_rank dispatch before opt dispatch after opt Reduce percentage
1 8 37.09 us 60.72 us -63.70%
2 16 142.64 us 78.89 us 44.69%
3 24 150.10 us 91.19 us 39.24%
4 32 169.08 us 112.45 us 33.49%
5 48 180.25 us 135.50 us 24.82%

4. performance test raw result:

8rank(one node server)

before opt:

[rank 4] Dispatch + combine bandwidth: 193.41 GB/s, avg_t=114.00 us, min_t=111.01 us, max_t=117.50 us
[rank 2] Dispatch + combine bandwidth: 193.42 GB/s, avg_t=113.99 us, min_t=111.33 us, max_t=116.96 us
[rank 0] Dispatch + combine bandwidth: 193.48 GB/s, avg_t=113.95 us, min_t=111.01 us, max_t=117.09 us
[rank 1] Dispatch + combine bandwidth: 193.45 GB/s, avg_t=113.97 us, min_t=111.17 us, max_t=117.66 us
[rank 6] Dispatch + combine bandwidth: 193.37 GB/s, avg_t=114.02 us, min_t=111.36 us, max_t=117.95 us
[rank 7] Dispatch + combine bandwidth: 193.70 GB/s, avg_t=113.83 us, min_t=110.40 us, max_t=116.64 us
[rank 3] Dispatch + combine bandwidth: 193.27 GB/s, avg_t=114.08 us, min_t=110.43 us, max_t=116.54 us
[rank 5] Dispatch + combine bandwidth: 193.47 GB/s, avg_t=113.96 us, min_t=110.53 us, max_t=117.60 us
[rank 2] Dispatch bandwidth: 203.70 GB/s, avg_t=36.88 us | Combine bandwidth: 215.93 GB/s, avg_t=67.32 us
[rank 1] Dispatch bandwidth: 202.55 GB/s, avg_t=37.09 us | Combine bandwidth: 216.80 GB/s, avg_t=67.05 us
[rank 6] Dispatch bandwidth: 203.53 GB/s, avg_t=36.91 us | Combine bandwidth: 214.65 GB/s, avg_t=67.72 us
[rank 3] Dispatch bandwidth: 204.55 GB/s, avg_t=36.72 us | Combine bandwidth: 214.41 GB/s, avg_t=67.80 us
[rank 5] Dispatch bandwidth: 208.09 GB/s, avg_t=36.10 us | Combine bandwidth: 212.45 GB/s, avg_t=68.42 us
[rank 0] Dispatch bandwidth: 207.21 GB/s, avg_t=36.25 us | Combine bandwidth: 216.76 GB/s, avg_t=67.06 us
[rank 7] Dispatch bandwidth: 207.24 GB/s, avg_t=36.25 us | Combine bandwidth: 214.30 GB/s, avg_t=67.83 us
[rank 4] Dispatch bandwidth: 203.28 GB/s, avg_t=36.95 us | Combine bandwidth: 215.13 GB/s, avg_t=67.57 us
[rank 1] Dispatch send/recv time: 30.34 + 8.34 us | Combine send/recv time: 53.37 + 10.06 us
[rank 2] Dispatch send/recv time: 30.19 + 8.40 us | Combine send/recv time: 53.76 + 10.28 us
[rank 7] Dispatch send/recv time: 29.60 + 8.34 us | Combine send/recv time: 51.73 + 10.22 us
[rank 6] Dispatch send/recv time: 29.92 + 8.45 us | Combine send/recv time: 53.37 + 9.97 us
[rank 0] Dispatch send/recv time: 29.06 + 8.65 us | Combine send/recv time: 53.17 + 10.08 us
[rank 3] Dispatch send/recv time: 29.42 + 9.52 us | Combine send/recv time: 50.76 + 10.21 us
[rank 5] Dispatch send/recv time: 29.09 + 8.31 us | Combine send/recv time: 51.54 + 10.21 us
[rank 4] Dispatch send/recv time: 29.60 + 8.36 us | Combine send/recv time: 51.17 + 10.31 us

after opt:

[rank 5] Dispatch + combine bandwidth: 150.51 GB/s, avg_t=146.49 us, min_t=141.22 us, max_t=151.04 us
[rank 0] Dispatch + combine bandwidth: 150.61 GB/s, avg_t=146.39 us, min_t=142.46 us, max_t=151.30 us
[rank 4] Dispatch + combine bandwidth: 150.60 GB/s, avg_t=146.40 us, min_t=143.17 us, max_t=149.95 us
[rank 1] Dispatch + combine bandwidth: 150.70 GB/s, avg_t=146.30 us, min_t=141.25 us, max_t=151.30 us
[rank 7] Dispatch + combine bandwidth: 150.46 GB/s, avg_t=146.54 us, min_t=142.66 us, max_t=150.75 us
[rank 2] Dispatch + combine bandwidth: 150.57 GB/s, avg_t=146.43 us, min_t=143.23 us, max_t=150.21 us
[rank 6] Dispatch + combine bandwidth: 150.54 GB/s, avg_t=146.46 us, min_t=141.98 us, max_t=150.53 us
[rank 3] Dispatch + combine bandwidth: 150.67 GB/s, avg_t=146.33 us, min_t=142.43 us, max_t=150.53 us
[rank 7] Dispatch bandwidth: 130.99 GB/s, avg_t=57.34 us | Combine bandwidth: 200.77 GB/s, avg_t=72.40 us
[rank 1] Dispatch bandwidth: 130.74 GB/s, avg_t=57.46 us | Combine bandwidth: 199.87 GB/s, avg_t=72.73 us
[rank 0] Dispatch bandwidth: 130.26 GB/s, avg_t=57.67 us | Combine bandwidth: 199.67 GB/s, avg_t=72.80 us
[rank 5] Dispatch bandwidth: 131.90 GB/s, avg_t=56.95 us | Combine bandwidth: 199.32 GB/s, avg_t=72.93 us
[rank 4] Dispatch bandwidth: 130.29 GB/s, avg_t=57.65 us | Combine bandwidth: 201.46 GB/s, avg_t=72.16 us
[rank 2] Dispatch bandwidth: 130.54 GB/s, avg_t=57.54 us | Combine bandwidth: 200.89 GB/s, avg_t=72.36 us
[rank 3] Dispatch bandwidth: 123.72 GB/s, avg_t=60.72 us | Combine bandwidth: 208.88 GB/s, avg_t=69.59 us
[rank 6] Dispatch bandwidth: 129.81 GB/s, avg_t=57.87 us | Combine bandwidth: 200.89 GB/s, avg_t=72.36 us
[rank 7] Dispatch send/recv time: 20.85 + 31.38 us | Combine send/recv time: 64.06 + 10.00 us
[rank 2] Dispatch send/recv time: 21.75 + 35.04 us | Combine send/recv time: 63.83 + 10.01 us
[rank 1] Dispatch send/recv time: 21.90 + 33.83 us | Combine send/recv time: 61.35 + 9.81 us
[rank 5] Dispatch send/recv time: 21.73 + 30.19 us | Combine send/recv time: 57.54 + 9.86 us
[rank 0] Dispatch send/recv time: 22.02 + 30.42 us | Combine send/recv time: 54.38 + 9.97 us
[rank 6] Dispatch send/recv time: 21.14 + 30.60 us | Combine send/recv time: 61.88 + 9.83 us
[rank 3] Dispatch send/recv time: 21.56 + 38.61 us | Combine send/recv time: 58.21 + 9.92 us
[rank 4] Dispatch send/recv time: 21.82 + 32.01 us | Combine send/recv time: 58.74 + 9.87 us

16 rank(two node server)

before opt:

[rank 4] Dispatch bandwidth: 124.77 GB/s, avg_t=176.72 us, min_t=34.94 us, max_t=326.62 us
[rank 0] Dispatch bandwidth: 124.48 GB/s, avg_t=177.12 us, min_t=34.40 us, max_t=332.16 us
[rank 5] Dispatch bandwidth: 124.43 GB/s, avg_t=177.20 us, min_t=36.03 us, max_t=327.01 us
[rank 7] Dispatch bandwidth: 124.57 GB/s, avg_t=176.99 us, min_t=37.02 us, max_t=330.78 us
[rank 1] Dispatch bandwidth: 124.44 GB/s, avg_t=177.19 us, min_t=36.10 us, max_t=332.26 us
[rank 2] Dispatch bandwidth: 124.47 GB/s, avg_t=177.14 us, min_t=37.18 us, max_t=324.96 us
[rank 6] Dispatch bandwidth: 124.58 GB/s, avg_t=176.98 us, min_t=35.23 us, max_t=332.00 us
[rank 3] Dispatch bandwidth: 124.60 GB/s, avg_t=176.96 us, min_t=38.50 us, max_t=330.50 us
[rank 4] Dispatch bandwidth: 52.73 GB/s, avg_t=142.45 us | 
[rank 2] Dispatch bandwidth: 52.72 GB/s, avg_t=142.49 us | 
[rank 1] Dispatch bandwidth: 52.63 GB/s, avg_t=142.73 us | 
[rank 7] Dispatch bandwidth: 52.66 GB/s, avg_t=142.64 us | 
[rank 3] Dispatch bandwidth: 52.59 GB/s, avg_t=142.85 us | 
[rank 5] Dispatch bandwidth: 52.72 GB/s, avg_t=142.49 us | 
[rank 6] Dispatch bandwidth: 52.78 GB/s, avg_t=142.31 us | 
[rank 0] Dispatch bandwidth: 52.85 GB/s, avg_t=142.13 us | 
[rank 4] Dispatch send/recv time: 24.87 + 8.33 us | 
[rank 2] Dispatch send/recv time: 24.84 + 8.30 us | 
[rank 7] Dispatch send/recv time: 25.97 + 9.27 us | 
[rank 1] Dispatch send/recv time: 24.12 + 8.46 us | 
[rank 6] Dispatch send/recv time: 24.75 + 8.44 us | 
[rank 5] Dispatch send/recv time: 24.80 + 8.35 us | 
[rank 3] Dispatch send/recv time: 25.10 + 10.34 us | 
[rank 0] Dispatch send/recv time: 24.80 + 8.60 us | 

after opt:

[rank 0] Dispatch + combine bandwidth: 76.63 GB/s, avg_t=287.73 us, min_t=273.79 us, max_t=317.86 us
[rank 2] Dispatch + combine bandwidth: 76.64 GB/s, avg_t=287.69 us, min_t=276.22 us, max_t=314.69 us
[rank 5] Dispatch + combine bandwidth: 76.59 GB/s, avg_t=287.87 us, min_t=269.54 us, max_t=314.62 us
[rank 7] Dispatch + combine bandwidth: 76.58 GB/s, avg_t=287.91 us, min_t=270.02 us, max_t=321.15 us
[rank 4] Dispatch + combine bandwidth: 76.55 GB/s, avg_t=288.02 us, min_t=273.31 us, max_t=320.29 us
[rank 6] Dispatch + combine bandwidth: 76.59 GB/s, avg_t=287.86 us, min_t=273.82 us, max_t=318.94 us
[rank 3] Dispatch + combine bandwidth: 76.59 GB/s, avg_t=287.87 us, min_t=272.29 us, max_t=313.57 us
[rank 1] Dispatch + combine bandwidth: 76.58 GB/s, avg_t=287.92 us, min_t=272.93 us, max_t=315.46 us
[rank 5] Dispatch bandwidth: 95.22 GB/s, avg_t=78.89 us | Combine bandwidth: 73.07 GB/s, avg_t=198.93 us
[rank 2] Dispatch bandwidth: 97.53 GB/s, avg_t=77.02 us | Combine bandwidth: 72.38 GB/s, avg_t=200.83 us
[rank 0] Dispatch bandwidth: 102.81 GB/s, avg_t=73.06 us | Combine bandwidth: 71.10 GB/s, avg_t=204.45 us
[rank 4] Dispatch bandwidth: 102.05 GB/s, avg_t=73.61 us | Combine bandwidth: 71.15 GB/s, avg_t=204.32 us
[rank 7] Dispatch bandwidth: 101.21 GB/s, avg_t=74.22 us | Combine bandwidth: 71.52 GB/s, avg_t=203.24 us
[rank 1] Dispatch bandwidth: 100.77 GB/s, avg_t=74.55 us | Combine bandwidth: 71.63 GB/s, avg_t=202.95 us
[rank 6] Dispatch bandwidth: 103.37 GB/s, avg_t=72.67 us | Combine bandwidth: 70.81 GB/s, avg_t=205.29 us
[rank 3] Dispatch bandwidth: 99.26 GB/s, avg_t=75.68 us | Combine bandwidth: 72.15 GB/s, avg_t=201.48 us
[rank 2] Dispatch send/recv time: 35.76 + 30.44 us | Combine send/recv time: 41.99 + 10.20 us
[rank 4] Dispatch send/recv time: 37.21 + 31.54 us | Combine send/recv time: 39.77 + 10.13 us
[rank 7] Dispatch send/recv time: 35.38 + 36.78 us | Combine send/recv time: 39.66 + 10.14 us
[rank 5] Dispatch send/recv time: 36.69 + 29.78 us | Combine send/recv time: 40.65 + 10.19 us
[rank 0] Dispatch send/recv time: 37.38 + 29.93 us | Combine send/recv time: 38.31 + 10.20 us
[rank 1] Dispatch send/recv time: 36.32 + 32.89 us | Combine send/recv time: 42.16 + 10.19 us
[rank 6] Dispatch send/recv time: 36.71 + 30.98 us | Combine send/recv time: 40.65 + 10.20 us
[rank 3] Dispatch send/recv time: 35.30 + 35.37 us | Combine send/recv time: 39.19 + 10.22 us

rank 24(three node server):

before opt:

[rank 7] Dispatch + combine bandwidth: 52.78 GB/s, avg_t=417.72 us, min_t=406.78 us, max_t=428.77 us
[rank 2] Dispatch + combine bandwidth: 52.80 GB/s, avg_t=417.58 us, min_t=404.42 us, max_t=433.82 us
[rank 5] Dispatch + combine bandwidth: 52.77 GB/s, avg_t=417.80 us, min_t=406.94 us, max_t=430.69 us
[rank 3] Dispatch + combine bandwidth: 52.80 GB/s, avg_t=417.55 us, min_t=404.90 us, max_t=434.14 us
[rank 1] Dispatch + combine bandwidth: 52.78 GB/s, avg_t=417.75 us, min_t=409.15 us, max_t=433.50 us
[rank 4] Dispatch + combine bandwidth: 52.80 GB/s, avg_t=417.60 us, min_t=404.80 us, max_t=436.48 us
[rank 0] Dispatch + combine bandwidth: 52.78 GB/s, avg_t=417.74 us, min_t=403.97 us, max_t=439.62 us
[rank 6] Dispatch + combine bandwidth: 52.80 GB/s, avg_t=417.56 us, min_t=403.10 us, max_t=441.09 us
[rank 2] Dispatch bandwidth: 50.05 GB/s, avg_t=150.10 us | Combine bandwidth: 56.30 GB/s, avg_t=258.22 us
[rank 1] Dispatch bandwidth: 52.12 GB/s, avg_t=144.11 us | Combine bandwidth: 54.96 GB/s, avg_t=264.50 us
[rank 3] Dispatch bandwidth: 50.96 GB/s, avg_t=147.41 us | Combine bandwidth: 55.63 GB/s, avg_t=261.33 us
[rank 5] Dispatch bandwidth: 49.03 GB/s, avg_t=153.21 us | Combine bandwidth: 56.98 GB/s, avg_t=255.13 us
[rank 6] Dispatch bandwidth: 51.94 GB/s, avg_t=144.61 us | Combine bandwidth: 55.13 GB/s, avg_t=263.69 us
[rank 0] Dispatch bandwidth: 50.73 GB/s, avg_t=148.06 us | Combine bandwidth: 55.73 GB/s, avg_t=260.85 us
[rank 4] Dispatch bandwidth: 50.70 GB/s, avg_t=148.16 us | Combine bandwidth: 55.84 GB/s, avg_t=260.35 us
[rank 7] Dispatch bandwidth: 52.29 GB/s, avg_t=143.67 us | Combine bandwidth: 54.84 GB/s, avg_t=265.09 us
[rank 2] Dispatch send/recv time: 22.09 + 8.32 us | Combine send/recv time: 24.02 + 9.97 us
[rank 7] Dispatch send/recv time: 23.92 + 8.11 us | Combine send/recv time: 24.68 + 9.96 us
[rank 1] Dispatch send/recv time: 22.74 + 8.39 us | Combine send/recv time: 24.52 + 9.98 us
[rank 6] Dispatch send/recv time: 22.50 + 8.25 us | Combine send/recv time: 23.71 + 9.77 us
[rank 3] Dispatch send/recv time: 22.58 + 9.24 us | Combine send/recv time: 24.44 + 9.97 us
[rank 4] Dispatch send/recv time: 22.85 + 9.44 us | Combine send/recv time: 24.36 + 10.02 us
[rank 5] Dispatch send/recv time: 22.66 + 10.69 us | Combine send/recv time: 26.37 + 9.95 us
[rank 0] Dispatch send/recv time: 22.78 + 8.54 us | Combine send/recv time: 25.32 + 10.01 us

after opt:

[rank 0] Dispatch + combine bandwidth: 61.06 GB/s, avg_t=361.12 us, min_t=352.35 us, max_t=372.74 us
[rank 1] Dispatch + combine bandwidth: 61.01 GB/s, avg_t=361.39 us, min_t=351.10 us, max_t=374.24 us
[rank 4] Dispatch + combine bandwidth: 61.03 GB/s, avg_t=361.27 us, min_t=353.57 us, max_t=373.47 us
[rank 6] Dispatch + combine bandwidth: 61.00 GB/s, avg_t=361.45 us, min_t=351.17 us, max_t=369.22 us
[rank 5] Dispatch + combine bandwidth: 61.01 GB/s, avg_t=361.36 us, min_t=350.78 us, max_t=382.82 us
[rank 3] Dispatch + combine bandwidth: 61.04 GB/s, avg_t=361.24 us, min_t=351.74 us, max_t=374.56 us
[rank 2] Dispatch + combine bandwidth: 61.01 GB/s, avg_t=361.40 us, min_t=351.74 us, max_t=373.12 us
[rank 7] Dispatch + combine bandwidth: 61.03 GB/s, avg_t=361.28 us, min_t=353.54 us, max_t=373.38 us
[rank 4] Dispatch bandwidth: 82.83 GB/s, avg_t=90.69 us | Combine bandwidth: 54.76 GB/s, avg_t=265.48 us
[rank 5] Dispatch bandwidth: 78.13 GB/s, avg_t=96.14 us | Combine bandwidth: 55.96 GB/s, avg_t=259.79 us
[rank 1] Dispatch bandwidth: 86.82 GB/s, avg_t=86.52 us | Combine bandwidth: 54.02 GB/s, avg_t=269.10 us
[rank 0] Dispatch bandwidth: 81.45 GB/s, avg_t=92.23 us | Combine bandwidth: 54.60 GB/s, avg_t=266.25 us
[rank 6] Dispatch bandwidth: 84.37 GB/s, avg_t=89.04 us | Combine bandwidth: 54.43 GB/s, avg_t=267.06 us
[rank 2] Dispatch bandwidth: 81.92 GB/s, avg_t=91.70 us | Combine bandwidth: 55.26 GB/s, avg_t=263.05 us
[rank 3] Dispatch bandwidth: 82.37 GB/s, avg_t=91.19 us | Combine bandwidth: 55.23 GB/s, avg_t=263.22 us
[rank 7] Dispatch bandwidth: 84.25 GB/s, avg_t=89.16 us | Combine bandwidth: 54.47 GB/s, avg_t=266.87 us
[rank 6] Dispatch send/recv time: 38.72 + 30.30 us | Combine send/recv time: 34.87 + 10.26 us
[rank 1] Dispatch send/recv time: 40.55 + 31.63 us | Combine send/recv time: 35.84 + 10.32 us
[rank 4] Dispatch send/recv time: 39.12 + 36.25 us | Combine send/recv time: 33.65 + 10.10 us
[rank 5] Dispatch send/recv time: 37.35 + 35.85 us | Combine send/recv time: 33.42 + 10.13 us
[rank 0] Dispatch send/recv time: 39.23 + 30.46 us | Combine send/recv time: 32.34 + 10.21 us
[rank 3] Dispatch send/recv time: 38.59 + 36.25 us | Combine send/recv time: 34.79 + 10.17 us
[rank 2] Dispatch send/recv time: 37.19 + 31.72 us | Combine send/recv time: 34.42 + 10.15 us
[rank 7] Dispatch send/recv time: 37.80 + 30.44 us | Combine send/recv time: 34.40 + 10.11 us

rank 32(4 node server):

before opt:

[rank 0] Dispatch + combine bandwidth: 47.20 GB/s, avg_t=467.13 us, min_t=456.80 us, max_t=481.44 us
[rank 7] Dispatch + combine bandwidth: 47.18 GB/s, avg_t=467.37 us, min_t=460.48 us, max_t=481.63 us
[rank 6] Dispatch + combine bandwidth: 47.18 GB/s, avg_t=467.33 us, min_t=459.87 us, max_t=479.71 us
[rank 1] Dispatch + combine bandwidth: 47.17 GB/s, avg_t=467.43 us, min_t=455.14 us, max_t=483.87 us
[rank 3] Dispatch + combine bandwidth: 47.18 GB/s, avg_t=467.36 us, min_t=458.69 us, max_t=480.58 us
[rank 2] Dispatch + combine bandwidth: 47.17 GB/s, avg_t=467.40 us, min_t=457.70 us, max_t=478.40 us
[rank 4] Dispatch + combine bandwidth: 47.18 GB/s, avg_t=467.37 us, min_t=454.30 us, max_t=481.31 us
[rank 5] Dispatch + combine bandwidth: 47.17 GB/s, avg_t=467.42 us, min_t=457.76 us, max_t=481.92 us
[rank 7] Dispatch bandwidth: 47.98 GB/s, avg_t=156.56 us | Combine bandwidth: 48.54 GB/s, avg_t=299.46 us
[rank 0] Dispatch bandwidth: 47.39 GB/s, avg_t=158.50 us | Combine bandwidth: 48.93 GB/s, avg_t=297.11 us
[rank 4] Dispatch bandwidth: 45.54 GB/s, avg_t=164.94 us | Combine bandwidth: 49.98 GB/s, avg_t=290.82 us
[rank 3] Dispatch bandwidth: 44.43 GB/s, avg_t=169.08 us | Combine bandwidth: 50.71 GB/s, avg_t=286.66 us
[rank 6] Dispatch bandwidth: 46.97 GB/s, avg_t=159.91 us | Combine bandwidth: 49.10 GB/s, avg_t=296.04 us
[rank 2] Dispatch bandwidth: 46.91 GB/s, avg_t=160.13 us | Combine bandwidth: 49.20 GB/s, avg_t=295.49 us
[rank 1] Dispatch bandwidth: 46.35 GB/s, avg_t=162.06 us | Combine bandwidth: 49.48 GB/s, avg_t=293.81 us
[rank 5] Dispatch bandwidth: 46.52 GB/s, avg_t=161.46 us | Combine bandwidth: 49.37 GB/s, avg_t=294.46 us
[rank 7] Dispatch send/recv time: 22.78 + 9.58 us | Combine send/recv time: 23.44 + 10.01 us
[rank 6] Dispatch send/recv time: 21.31 + 10.28 us | Combine send/recv time: 26.27 + 9.97 us
[rank 1] Dispatch send/recv time: 21.44 + 8.13 us | Combine send/recv time: 22.45 + 9.90 us
[rank 2] Dispatch send/recv time: 21.87 + 8.18 us | Combine send/recv time: 21.80 + 10.04 us
[rank 0] Dispatch send/recv time: 21.69 + 8.47 us | Combine send/recv time: 23.05 + 9.84 us
[rank 3] Dispatch send/recv time: 21.49 + 8.38 us | Combine send/recv time: 22.37 + 10.01 us
[rank 5] Dispatch send/recv time: 21.89 + 8.29 us | Combine send/recv time: 22.24 + 9.94 us
[rank 4] Dispatch send/recv time: 21.71 + 9.44 us | Combine send/recv time: 23.14 + 10.03 us

after opt:

[rank 0] Dispatch + combine bandwidth: 53.28 GB/s, avg_t=413.80 us, min_t=405.44 us, max_t=422.08 us
[rank 4] Dispatch + combine bandwidth: 53.25 GB/s, avg_t=414.04 us, min_t=400.58 us, max_t=427.94 us
[rank 7] Dispatch + combine bandwidth: 53.28 GB/s, avg_t=413.84 us, min_t=400.03 us, max_t=428.93 us
[rank 6] Dispatch + combine bandwidth: 53.29 GB/s, avg_t=413.73 us, min_t=405.22 us, max_t=425.98 us
[rank 3] Dispatch + combine bandwidth: 53.28 GB/s, avg_t=413.81 us, min_t=404.54 us, max_t=423.52 us
[rank 5] Dispatch + combine bandwidth: 53.26 GB/s, avg_t=413.97 us, min_t=405.09 us, max_t=422.56 us
[rank 2] Dispatch + combine bandwidth: 53.25 GB/s, avg_t=414.02 us, min_t=407.26 us, max_t=421.98 us
[rank 1] Dispatch + combine bandwidth: 53.25 GB/s, avg_t=414.05 us, min_t=399.81 us, max_t=431.62 us
[rank 4] Dispatch bandwidth: 68.90 GB/s, avg_t=109.02 us | Combine bandwidth: 48.84 GB/s, avg_t=297.64 us
[rank 3] Dispatch bandwidth: 65.03 GB/s, avg_t=115.51 us | Combine bandwidth: 50.20 GB/s, avg_t=289.57 us
[rank 1] Dispatch bandwidth: 68.48 GB/s, avg_t=109.69 us | Combine bandwidth: 49.13 GB/s, avg_t=295.88 us
[rank 2] Dispatch bandwidth: 70.58 GB/s, avg_t=106.42 us | Combine bandwidth: 48.60 GB/s, avg_t=299.12 us
[rank 6] Dispatch bandwidth: 68.83 GB/s, avg_t=109.14 us | Combine bandwidth: 49.02 GB/s, avg_t=296.58 us
[rank 5] Dispatch bandwidth: 66.80 GB/s, avg_t=112.45 us | Combine bandwidth: 49.49 GB/s, avg_t=293.72 us
[rank 7] Dispatch bandwidth: 72.37 GB/s, avg_t=103.80 us | Combine bandwidth: 48.22 GB/s, avg_t=301.46 us
[rank 0] Dispatch bandwidth: 68.94 GB/s, avg_t=108.96 us | Combine bandwidth: 48.46 GB/s, avg_t=299.97 us
[rank 5] Dispatch send/recv time: 40.07 + 29.81 us | Combine send/recv time: 29.81 + 10.20 us
[rank 3] Dispatch send/recv time: 40.86 + 31.51 us | Combine send/recv time: 31.53 + 10.11 us
[rank 4] Dispatch send/recv time: 41.47 + 36.13 us | Combine send/recv time: 32.43 + 10.25 us
[rank 6] Dispatch send/recv time: 41.31 + 40.32 us | Combine send/recv time: 35.72 + 10.06 us
[rank 1] Dispatch send/recv time: 41.63 + 32.59 us | Combine send/recv time: 33.85 + 10.09 us
[rank 7] Dispatch send/recv time: 40.06 + 36.35 us | Combine send/recv time: 34.64 + 10.12 us
[rank 2] Dispatch send/recv time: 40.58 + 31.48 us | Combine send/recv time: 35.93 + 10.08 us
[rank 0] Dispatch send/recv time: 41.50 + 30.18 us | Combine send/recv time: 30.64 + 10.09 us

rank 48(6 node server)

before opt:

[rank 0] Dispatch + combine bandwidth: 43.31 GB/s, avg_t=509.13 us, min_t=493.89 us, max_t=527.23 us
[rank 5] Dispatch + combine bandwidth: 43.32 GB/s, avg_t=508.98 us, min_t=494.18 us, max_t=525.98 us
[rank 1] Dispatch + combine bandwidth: 43.29 GB/s, avg_t=509.31 us, min_t=493.25 us, max_t=530.11 us
[rank 7] Dispatch + combine bandwidth: 43.31 GB/s, avg_t=509.12 us, min_t=496.32 us, max_t=532.03 us
[rank 4] Dispatch + combine bandwidth: 43.32 GB/s, avg_t=508.95 us, min_t=497.41 us, max_t=527.04 us
[rank 2] Dispatch + combine bandwidth: 43.30 GB/s, avg_t=509.17 us, min_t=497.47 us, max_t=531.46 us
[rank 6] Dispatch + combine bandwidth: 43.30 GB/s, avg_t=509.25 us, min_t=493.25 us, max_t=532.42 us
[rank 3] Dispatch + combine bandwidth: 43.30 GB/s, avg_t=509.21 us, min_t=495.78 us, max_t=533.66 us
[rank 4] Dispatch bandwidth: 43.16 GB/s, avg_t=174.05 us | Combine bandwidth: 44.37 GB/s, avg_t=327.61 us
[rank 3] Dispatch bandwidth: 42.39 GB/s, avg_t=177.19 us | Combine bandwidth: 45.09 GB/s, avg_t=322.36 us
[rank 0] Dispatch bandwidth: 42.64 GB/s, avg_t=176.16 us | Combine bandwidth: 44.47 GB/s, avg_t=326.86 us
[rank 2] Dispatch bandwidth: 42.44 GB/s, avg_t=176.98 us | Combine bandwidth: 45.00 GB/s, avg_t=323.02 us
[rank 5] Dispatch bandwidth: 41.15 GB/s, avg_t=182.56 us | Combine bandwidth: 45.67 GB/s, avg_t=318.33 us
[rank 6] Dispatch bandwidth: 41.67 GB/s, avg_t=180.25 us | Combine bandwidth: 45.31 GB/s, avg_t=320.83 us
[rank 7] Dispatch bandwidth: 43.40 GB/s, avg_t=173.06 us | Combine bandwidth: 44.58 GB/s, avg_t=326.06 us
[rank 1] Dispatch bandwidth: 42.50 GB/s, avg_t=176.75 us | Combine bandwidth: 44.90 GB/s, avg_t=323.79 us
[rank 5] Dispatch send/recv time: 20.77 + 8.57 us | Combine send/recv time: 21.94 + 10.22 us
[rank 7] Dispatch send/recv time: 21.43 + 8.36 us | Combine send/recv time: 21.87 + 10.09 us
[rank 4] Dispatch send/recv time: 20.83 + 8.24 us | Combine send/recv time: 22.26 + 10.10 us
[rank 2] Dispatch send/recv time: 20.31 + 9.28 us | Combine send/recv time: 24.11 + 9.90 us
[rank 3] Dispatch send/recv time: 20.77 + 8.51 us | Combine send/recv time: 22.74 + 10.13 us
[rank 6] Dispatch send/recv time: 20.36 + 9.27 us | Combine send/recv time: 24.15 + 10.04 us
[rank 0] Dispatch send/recv time: 20.71 + 8.62 us | Combine send/recv time: 23.05 + 10.17 us
[rank 1] Dispatch send/recv time: 20.61 + 8.45 us | Combine send/recv time: 21.61 + 10.11 us

after opt:

[rank 0] Dispatch + combine bandwidth: 47.09 GB/s, avg_t=468.18 us, min_t=453.57 us, max_t=482.69 us
[rank 2] Dispatch + combine bandwidth: 47.08 GB/s, avg_t=468.34 us, min_t=456.19 us, max_t=482.88 us
[rank 6] Dispatch + combine bandwidth: 47.06 GB/s, avg_t=468.48 us, min_t=449.06 us, max_t=492.03 us
[rank 3] Dispatch + combine bandwidth: 47.09 GB/s, avg_t=468.18 us, min_t=448.45 us, max_t=495.81 us
[rank 1] Dispatch + combine bandwidth: 47.09 GB/s, avg_t=468.23 us, min_t=453.15 us, max_t=483.42 us
[rank 5] Dispatch + combine bandwidth: 47.10 GB/s, avg_t=468.15 us, min_t=456.00 us, max_t=484.96 us
[rank 7] Dispatch + combine bandwidth: 47.07 GB/s, avg_t=468.39 us, min_t=455.07 us, max_t=482.78 us
[rank 4] Dispatch + combine bandwidth: 47.08 GB/s, avg_t=468.36 us, min_t=457.44 us, max_t=480.29 us
[rank 3] Dispatch bandwidth: 54.78 GB/s, avg_t=137.14 us | Combine bandwidth: 45.38 GB/s, avg_t=320.36 us
[rank 6] Dispatch bandwidth: 55.32 GB/s, avg_t=135.78 us | Combine bandwidth: 45.19 GB/s, avg_t=321.69 us
[rank 2] Dispatch bandwidth: 57.95 GB/s, avg_t=129.63 us | Combine bandwidth: 44.34 GB/s, avg_t=327.86 us
[rank 4] Dispatch bandwidth: 59.01 GB/s, avg_t=127.30 us | Combine bandwidth: 44.03 GB/s, avg_t=330.19 us
[rank 0] Dispatch bandwidth: 57.99 GB/s, avg_t=129.54 us | Combine bandwidth: 44.27 GB/s, avg_t=328.40 us
[rank 7] Dispatch bandwidth: 56.65 GB/s, avg_t=132.59 us | Combine bandwidth: 44.70 GB/s, avg_t=325.24 us
[rank 5] Dispatch bandwidth: 55.44 GB/s, avg_t=135.50 us | Combine bandwidth: 45.19 GB/s, avg_t=321.68 us
[rank 1] Dispatch bandwidth: 57.05 GB/s, avg_t=131.66 us | Combine bandwidth: 44.56 GB/s, avg_t=326.24 us
[rank 3] Dispatch send/recv time: 41.92 + 31.73 us | Combine send/recv time: 32.13 + 10.40 us
[rank 6] Dispatch send/recv time: 42.99 + 38.89 us | Combine send/recv time: 31.89 + 45.06 us
[rank 2] Dispatch send/recv time: 44.40 + 38.51 us | Combine send/recv time: 33.25 + 21.76 us
[rank 4] Dispatch send/recv time: 42.82 + 33.35 us | Combine send/recv time: 31.29 + 26.30 us
[rank 5] Dispatch send/recv time: 41.30 + 30.58 us | Combine send/recv time: 29.54 + 18.60 us
[rank 0] Dispatch send/recv time: 46.85 + 29.57 us | Combine send/recv time: 28.81 + 10.28 us
[rank 1] Dispatch send/recv time: 44.60 + 31.38 us | Combine send/recv time: 31.33 + 10.24 us
[rank 7] Dispatch send/recv time: 42.05 + 34.21 us | Combine send/recv time: 32.31 + 37.02 us

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant