Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unit test #488

Merged
merged 1 commit into from
Oct 8, 2024
Merged

Fix unit test #488

merged 1 commit into from
Oct 8, 2024

Conversation

Issacwww
Copy link
Contributor

@Issacwww Issacwww commented Oct 8, 2024

Issue #, if available:

Description of changes:
encountered several issue when execute unit test for bottlerocket
1.

# Running tests in gpu_unit_tests/tests/test_basic.sh
common.sh: line 14: nvidia-smi: command not found

fix by adding resource limit
2.
dgcmi not found
fix by installing datacenter-gpu-manager
3. missing file for test-sysinfo, followed the readme to add expected files

testing

k logs unit-test-job-nb6gn
# Running tests in gpu_unit_tests/tests/test_basic.sh
ok - test_01_device_query
ok - test_02_vector_add
ok - test_03_bandwidth
ok - test_04_bus_grind
ok - test_05_dcgm_diagnostics
# Running tests in gpu_unit_tests/tests/test_sysinfo.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:02:03 --:--:--     0
curl: (56) Recv failure: Connection reset by peer
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    10  100    10    0     0  12787      0 --:--:-- --:--:-- --:--:-- 10000
ok - test_numa_topo_topo
ok - test_nvidia_gpu_count
ok - test_nvidia_gpu_throttled
ok - test_nvidia_gpu_unused
not ok - test_nvidia_persistence_status
# Unexpected perfistance status, likely system configuration issue
#  test data value diff:
# --- test_sysinfo.sh.data/g5.8xlarge/nvidia_persistence_status.txt	2024-10-07 22:15:11.000000000 +0000
# +++ /tmp/test_sysinfo.sh.actual-data.tgP/nvidia_persistence_status.txt	2024-10-08 07:37:30.716879902 +0000
# @@ -1,2 +1,2 @@
#  name, pci.bus_id, persistence_mode
# -NVIDIA A10G, 00000000:00:1E.0, Enabled
# +NVIDIA A10G, 00000000:00:1E.0, Disabled
# common.sh:32:_assert_data()
# common.sh:37:assert_data()
# test_sysinfo.sh:52:test_nvidia_persistence_status()
ok - test_nvidia_smi_topo

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@cartermckinnon cartermckinnon merged commit df5df8e into main Oct 8, 2024
5 checks passed
@cartermckinnon cartermckinnon deleted the fixUnitTest branch October 8, 2024 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants