Skip to content

[Bug] following cross_compilation_and_rpc tutorial Segfault happened when run model in nv thor #18923

@thishome

Description

@thishome

Hi, I'm trying to run the example in this tutorial in nvidia thor system. But errors(below) happen in the end when resource deallocating.
can any one give me a hint? Thank you.

Expected behavior

exit nomally

Actual behavior

thor CPP RPC server outputs:

[06:06:35] /data/opensource/tvm/tvm/apps/cpp_rpc/rpc_env.cc:149: Load module from /jeremy/rpc/rpc/model_deployed.so ...
!!!!!!! Segfault encountered !!!!!!!
  File "/data/opensource/tvm/tvm/3rdparty/tvm-ffi/src/ffi/backtrace.cc", line 154, in TVMFFISegFaultHandler(int)
  File "<unknown>", line 0, in __aarch64_ldadd8_rel
  File "/data/opensource/tvm/tvm/3rdparty/tvm-ffi/include/tvm/ffi/object.h", line 409, in tvm::ffi::Object::DecRef()
  File "/data/opensource/tvm/tvm/src/runtime/rpc/rpc_local_session.cc", line 145, in tvm::runtime::LocalSession::FreeHandle(void*)
  File "/data/opensource/tvm/tvm/src/runtime/rpc/rpc_endpoint.cc", line 674, in void tvm::runtime::RPCEndpoint::EventHandler::SysCallHandler<void (*)(tvm::runtime::RPCSession*, tvm::ffi::PackedArgs, tvm::ffi::Any*)>(void (*)(tvm::runtime::RPCSession*, tvm::ffi::PackedArgs, tvm::ffi::Any*))
  File "/data/opensource/tvm/tvm/src/runtime/rpc/rpc_endpoint.cc", line 1052, in tvm::runtime::RPCEndpoint::EventHandler::HandleSyscall(tvm::runtime::RPCCode)
  File "/data/opensource/tvm/tvm/src/runtime/rpc/rpc_endpoint.cc", line 134, in tvm::runtime::RPCEndpoint::EventHandler::HandleNextEvent(bool, bool, std::function<void (tvm::ffi::PackedArgs)>)
  File "/data/opensource/tvm/tvm/src/runtime/rpc/rpc_endpoint.cc", line 746, in tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::ffi::PackedArgs)>)
  File "/data/opensource/tvm/tvm/src/runtime/rpc/rpc_endpoint.cc", line 838, in tvm::runtime::RPCEndpoint::ServerLoop()
  File "/data/opensource/tvm/tvm/src/runtime/rpc/rpc_socket_impl.cc", line 117, in tvm::runtime::RPCServerLoop(int)
  File "/data/opensource/tvm/tvm/apps/cpp_rpc/rpc_server.cc", line 331, in tvm::runtime::RPCServer::ServerLoopProc(tvm::support::TCPSocket, tvm::support::SockAddr, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
  File "/data/opensource/tvm/tvm/apps/cpp_rpc/rpc_server.cc", line 216, in tvm::runtime::RPCServer::ListenLoopProc()
  File "<unknown>", line 0, in 0xffffffffffffffff

[06:06:37] /data/opensource/tvm/tvm/apps/cpp_rpc/rpc_server.cc:222: Child pid=3708708 exited, Process status =139
[06:06:37] /data/opensource/tvm/tvm/apps/cpp_rpc/rpc_server.cc:239: Socket Connection Closed

x86 python outputs:

Exported library to /tmp/tmp7kpedlni/model_deployed.so
Saved parameters to /tmp/tmp7kpedlni/model_params.npz
Connected to remote device at 127.0.0.1:9090
Uploaded library and parameters to remote device.
Average inference time: 0.19 ms
terminate called after throwing an instance of 'tvm::runtime::InternalError'
  what():  Check failed: (code == RPCCode::kReturn) is false: code=1

Any environment details, such as: Operating System, TVM version, etc
ubuntu 24.04
tvm: 0.23.0
llvm-config: 21.1.8

Steps to reproduce

set target as below

 target = tvm.target.Target("cuda -arch=sm_90", 
                                   host="llvm -mtriple=aarch64-linux-gnu")

and cross compile .so for thor

Triage

Please refer to the list of label tags here to find the relevant tags and add them below in a bullet format (example below).

  • needs-triage

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triagePRs or issues that need to be investigated by maintainers to find the right assignees to address ittype: bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions