Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNXRuntime backend (WIP) #341

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

isty2e
Copy link
Contributor

@isty2e isty2e commented Nov 8, 2020

Introduction

This backend enables KataGo to run network files in .onnx format, which is an open standard for exchanging neural networks, with ONNXRuntime backend. Currently vesion 8 networks can be converted to .onnx by https://github.com/isty2e/KataGoONNX, and converted network files are available here.

The main motivation for running .onnx models is that new architectures or network details can be tested easily by exporting any trained model to .onnx, without implementing every detail in CUDA/OpenCL backend. For the time being, there is no advantage of using this backend for normal users: It will be slower by 10-20% than CUDA/OpenCL version for 19x19 boards, though it can be faster for smaller boards.

Execution providers

Currently, four execution providers are available for this backend:

  • CUDA
  • TensorRT - It is dreadfully slow for unknown reasons
  • DirectML
  • MIGraphX - I couldn't really test it, and there can be problems building

For windows systems, DirectML is considered the best in general. AMD cards can make use of MIGraphX, and ROCm execution provider can be supported once it is fully integrated in ONNXRuntime.

Building

GCC-9 or above is recommended for Linux systems.

  • First of all, you need to download ONNXRuntime binary, or build ONNXRuntime by yourself. Considering that there is no merit of using TensorRT at this point, you can just download the binary.
  • Then build KataGo as per usual, but with additional CMake flags:
    • ORT_LIB_DIR: ONNXRuntime library location
    • ORT_INCLUDE_DIR: ONNXRuntime header file location
    • ORT_CUDA, ORT_TENSORRT, ORT_DIRECTML, ORT_MIGRAPHX: Whether to support specific execution providers
    • TENSORRT_LIB_DIR, TENSORRT_INCLUDE_DIR: Library and header file locations for TensorRT
      For example, if you want to build ONNXRuntime with CUDA and DirectML support, then your CMake configuration will be like this:
cmake -DUSE_BACKEND=ONNXRUNTIME -DORT_CUDA=1 -DORT_DIRECTML=1 -DORT_LIB_DIR=/foo/bar/lib -DORT_INCLUDE_DIR=/foo/bar/include

Configuration

There are be two options in .cfg file for ONNXRuntime backend: onnxOptModelFile and onnxRuntimeExecutionProvider. onnxOptModelFile is the path for a cached graph-optimized onnx file. onnxRuntimeExecutionProvider is one of the execution providers - CUDA, TensorRT, DirectML, or MIGraphX. These options can be properly set by running genconfig.

For FP16 inference, you will need to use FP16 models instead of normal FP32 models. There is no advantage of FP16 inference for non-RTX cards, but you will want to use FP16 models instead for RTX cards.

TODO

  • Somehow verify that MIGraphX version compiles and runs
  • Maybe a cleaner CMakeLists.txt
  • Figure out why TensorRT execution provider is so slow
  • Improve code quality in general
  • For some cases, it might be the case that visit/s is not properly measured: NPS was constantly increasing so maybe initialization time should be compensated
  • There can be some duplicated or unnecessary operations, so reduce overhead if any

@isty2e isty2e mentioned this pull request Nov 9, 2020
@lp200
Copy link

lp200 commented Dec 25, 2020

TensorRT provider seems to reach normal speed after a few minutes of running (for optimization).
I used TensorRT enabled onnxruntime.
and delete SetOptimizedModelFilePath

I used the following libraries and options

CUDA 11.0
cuDNN 8.0.5
TensorRT-7.2.1.6

ORT_TENSORRT_ENGINE_CACHE_ENABLE=1
ORT_TENSORRT_FP16_ENABLE=1

@ailuoku6
Copy link

isty2e

Hi, the onnx network file is deleted, could you upload it again? I'm looking forward to run katago in webpage by useing webonnxruntime, think you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants