Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] RuntimeError: Child process exited unsuccessfully with error code -6 #17495

Open
MehdiTantaoui-99 opened this issue Oct 29, 2024 · 0 comments
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug

Comments

@MehdiTantaoui-99
Copy link

I ran tuning on an ONNX file using python and tvmc API, but after reaching half of the tasks it throws an error which stops the tuning and makes you start from the beginning (happened multiple times)

 # Perform actual tuning with selected tasks
tvmc.tune(
    model,
    target=target,
    tuning_records=tuning_records,
    enable_autoscheduler=args.enable_autoscheduler,
    trials=args.tuning_trials,
    early_stopping=args.early_stopping,
    timeout=20,
)
print("Tuning completed.")
----------------------------------------------------------------------
|  ID  |                       Task Description                        | Latency (ms) | Speed (GFLOPS) | Trials |
-----------------------------------------------------------------------------------------------------------------
|    0 |                                    vm_mod_fused_nn_conv2d_add |        0.012 |         652.45 |     18 |
|    1 |                          vm_mod_fused_nn_conv2d_add_nn_relu_5 |        0.084 |        3351.34 |     18 |
|    2 |                              vm_mod_fused_nn_conv2d_add_add_3 |        0.028 |        4974.26 |     18 |
|    3 |                          vm_mod_fused_nn_conv2d_add_nn_relu_1 |        0.169 |        4028.20 |     18 |
|    4 |                        vm_mod_fused_nn_conv2d_add_add_nn_relu |        0.304 |        5958.98 |     18 |
|    5 |                      vm_mod_fused_nn_conv2d_add_add_nn_relu_5 |        0.129 |        3509.89 |     18 |
|    6 |                          vm_mod_fused_nn_conv2d_add_nn_relu_8 |        0.124 |        1992.50 |     18 |
|    7 |                              vm_mod_fused_nn_conv2d_add_add_1 |        0.087 |        3123.05 |     18 |
|    8 |                      vm_mod_fused_nn_conv2d_add_add_nn_relu_3 |        0.255 |        4438.36 |     18 |
|    9 |                          vm_mod_fused_nn_conv2d_add_nn_relu_4 |        0.267 |        5502.94 |     18 |
|   10 |                      vm_mod_fused_nn_conv2d_add_add_nn_relu_7 |        0.082 |        3001.29 |     18 |
|   11 |                      vm_mod_fused_nn_conv2d_add_add_nn_relu_1 |        0.426 |        5669.90 |     18 |
|   12 |                              vm_mod_fused_nn_conv2d_add_add_6 |        0.023 |        2781.69 |     18 |
|   13 |                            vm_mod_fused_nn_conv2d_add_nn_relu |        0.170 |        5459.73 |     18 |
|   14 |                          vm_mod_fused_nn_conv2d_add_nn_relu_7 |        0.165 |        3657.21 |     18 |
|   15 |                                vm_mod_fused_nn_conv2d_add_add |            - |              - |      0 |
|   16 |                              vm_mod_fused_nn_conv2d_add_add_4 |            - |              - |      0 |
|   17 |                          vm_mod_fused_nn_conv2d_add_nn_relu_3 |            - |              - |      0 |
|   18 |                      vm_mod_fused_nn_conv2d_add_add_nn_relu_6 |            - |              - |      0 |
|   19 |                              vm_mod_fused_nn_conv2d_add_add_2 |            - |              - |      0 |
|   20 |                      vm_mod_fused_nn_conv2d_add_add_nn_relu_4 |            - |              - |      0 |
|   21 |                          vm_mod_fused_nn_conv2d_add_nn_relu_6 |            - |              - |      0 |
|   22 |                            vm_mod_fused_nn_conv2d_add_sigmoid |            - |              - |      0 |
|   23 |                          vm_mod_fused_nn_conv2d_add_nn_relu_2 |            - |              - |      0 |
|   24 |                      vm_mod_fused_nn_conv2d_add_add_nn_relu_2 |            - |              - |      0 |
|   25 |                              vm_mod_fused_nn_conv2d_add_add_7 |            - |              - |      0 |
|   26 |                              vm_mod_fused_nn_conv2d_add_add_5 |            - |              - |      0 |
-----------------------------------------------------------------------------------------------------------------

Expected behavior

To complete all tasks for tuning

Actual behavior

We get an error:

terminate called after throwing an instance of 'tvm::runtime::InternalError'
  what():  [13:54:11] /home/ubuntu/tvm/src/runtime/cuda/cuda_device_api.cc:312: InternalError: Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: misaligned address
Stack trace:
  0: tvm::runtime::CUDATimerNode::~CUDATimerNode()
        at /home/ubuntu/tvm/src/runtime/cuda/cuda_device_api.cc:312
  1: tvm::runtime::SimpleObjAllocator::Handler<tvm::runtime::CUDATimerNode>::Deleter_(tvm::runtime::Object*)
        at /home/ubuntu/tvm/include/tvm/runtime/memory.h:138
  2: tvm::runtime::ObjectPtr<tvm::runtime::Object>::reset()
        at /home/ubuntu/tvm/include/tvm/runtime/object.h:455
  3: tvm::runtime::ObjectPtr<tvm::runtime::Object>::~ObjectPtr()
        at /home/ubuntu/tvm/include/tvm/runtime/object.h:404
  4: tvm::runtime::ObjectRef::~ObjectRef()
        at /home/ubuntu/tvm/include/tvm/runtime/object.h:519
  5: tvm::runtime::Timer::~Timer()
        at /home/ubuntu/tvm/include/tvm/runtime/profiling.h:86
  6: operator()
        at /home/ubuntu/tvm/src/runtime/profiling.cc:915
  7: tvm::runtime::LocalSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
        at /home/ubuntu/tvm/src/runtime/rpc/rpc_local_session.cc:107
  8: tvm::runtime::RPCSession::AsyncCallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::RPCCode, tvm::runtime::TVMArgs)>)
        at /home/ubuntu/tvm/src/runtime/rpc/rpc_session.cc:47
  9: tvm::runtime::RPCEndpoint::EventHandler::HandleNormalCallFunc()
        at /home/ubuntu/tvm/src/runtime/rpc/rpc_endpoint.cc:542
  10: tvm::runtime::RPCEndpoint::EventHandler::HandleProcessPacket(std::function<void (tvm::runtime::TVMArgs)>)
        at /home/ubuntu/tvm/src/runtime/rpc/rpc_endpoint.cc:362
  11: tvm::runtime::RPCEndpoint::EventHandler::HandleNextEvent(bool, bool, std::function<void (tvm::runtime::TVMArgs)>)
        at /home/ubuntu/tvm/src/runtime/rpc/rpc_endpoint.cc:136
  12: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)
        at /home/ubuntu/tvm/src/runtime/rpc/rpc_endpoint.cc:714
  13: tvm::runtime::RPCEndpoint::ServerLoop()
        at /home/ubuntu/tvm/src/runtime/rpc/rpc_endpoint.cc:805
  14: tvm::runtime::RPCServerLoop(int)
        at /home/ubuntu/tvm/src/runtime/rpc/rpc_socket_impl.cc:119
  15: operator()
        at /home/ubuntu/tvm/src/runtime/rpc/rpc_socket_impl.cc:138

Exception in thread Thread-1 (_listen_loop):
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/tvm-build-venv/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/home/ubuntu/miniconda3/envs/tvm-build-venv/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/tvm/python/tvm/rpc/server.py", line 279, in _listen_loop
    _serving(conn, addr, opts, load_library)
  File "/home/ubuntu/tvm/python/tvm/rpc/server.py", line 168, in _serving
    raise RuntimeError(
RuntimeError: Child process 49293 exited unsuccessfully with error code -6

Environment

PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
tvm version 0.19.dev0
@MehdiTantaoui-99 MehdiTantaoui-99 added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug labels Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug
Projects
None yet
Development

No branches or pull requests

1 participant