MIGraphX as backend for Triton Inference Server #178

attila-dusnoki-htec · 2024-03-06T09:46:00Z

The idea here is to use the Triton Inference Server to perform Inferences via MIGraphX.

The first issue to tackle is to enable it without the official docker, and use a rocm based.

The next would be to add MGX as a backend.
There are multiple repos to check how to do it.

This can be worked on parallel, without a working docker.

gyulaz-htec · 2024-04-23T13:58:13Z

Steps to start the minimal triton example:

Building minimal backend

git clone https://github.com/triton-inference-server/backend
cd backend
git checkout r23.04 
cd examples/backends/minimal/
mkdir build
cd build
# This rapidjson-dev dependency was missing for me during install, but it was not listed as a requirement
sudo apt-get install rapidjson-dev
cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install .. 
make install

libtriton_minimal.so will be generated in: backend/examples/backends/minimal/build/install/backends/minimal

Setting up tritonserver docker image with a custom backend

git clone https://github.com/triton-inference-server/server
cd server 
# This will generate Docker.compose file
# I had to use an existing backend because it only works with that
python3 ./compose.py --backend onnxruntime --repoagent checksum
# I had to copy the result folder from the previous step next to the compose.py script to the next step to work
cp -r /home/htec/gyulaz/triton/backend/examples/backends/minimal/build/install/backends/minimal ./minimal

Modify Docker.compose:

Replace
ENV LD_LIBRARY_PATH /opt/tritonserver/backends/onnxruntime:${LD_LIBRARY_PATH}
line with
ENV LD_LIBRARY_PATH /opt/tritonserver/backends/minimal:${LD_LIBRARY_PATH}

Replace
COPY --chown=1000:1000 --from=full /opt/tritonserver/backends/onnxruntime /opt/tritonserver/backends/onnxruntime
line with:
COPY ./minimal /opt/tritonserver/backends/minimal

# Finaly build the docker image
docker build -t tritonserver_custom -f Dockerfile.compose .

Start triton server inside docker

docker run --rm -it --net=host -v /home/htec/gyulaz/triton/backend/:/backend tritonserver_custom tritonserver --model-repositor=/backend/examples/model_repos/minimal_models/

Testing the minimal backend from another docker

I've used the same docker image but I think it's not neccesarry. I still had to install some missing dependencies inside the container to start the test script.

# Start docker from another terminal
docker run --rm -it --net=host -v /home/htec/gyulaz/triton/backend/:/backend tritonserver_custom
# Install some missing dependencies
apt-get install python3-pip
pip3 install numpy tritonclient  gevent  geventhttpclient
# this was just named `minimal_client` in the repo, I've added the file extension so I could start it
python3 /backend/examples/clients/minimal_client.py

You should see

=========
Sending request to nonbatching model: IN0 = [1 2 3 4]
Response: {'model_name': 'nonbatching', 'model_version': '1', 'outputs': [{'name': 'OUT0', 'datatype': 'INT32', 'shape': [4], 'parameters': {'binary_data_size': 16}}]}
OUT0 = [1 2 3 4]

=========
Sending request to batching model: IN0 = [[10 11 12 13]]
Sending request to batching model: IN0 = [[20 21 22 23]]
Response: {'model_name': 'batching', 'model_version': '1', 'outputs': [{'name': 'OUT0', 'datatype': 'INT32', 'shape': [1, 4], 'parameters': {'binary_data_size': 16}}]}
OUT0 = [[10 11 12 13]]
Response: {'model_name': 'batching', 'model_version': '1', 'outputs': [{'name': 'OUT0', 'datatype': 'INT32', 'shape': [1, 4], 'parameters': {'binary_data_size': 16}}]}
OUT0 = [[20 21 22 23]]

gyulaz-htec · 2024-05-07T11:03:56Z

Planned steps for POC:

Hipify server API
Hipify backend API
Reimplement minimal backend with MIGRaphX
- TRITONBACKEND_Backend,
- TRITONBACKEND_Model,
- TRITONBACKEND_ModelInstance
Extend server docker image generation with migraphx dependencies

gyulaz-htec · 2024-05-14T10:18:44Z

As we agreed with the AMD team first we should update their WIP solution to provide MIGraphX as an execution provider for onnxruntime triton inefrence backend.
The CPU inference is done by them, we have to figure out why the GPU is not chosen for inference.
The AMD issue describing their progress: ROCm#2411

gyulaz-htec · 2024-05-30T11:31:48Z

attila-dusnoki-htec added this to MIGraphX ONNX support Mar 6, 2024

attila-dusnoki-htec converted this from a draft issue Mar 6, 2024

attila-dusnoki-htec assigned gyulaz-htec Apr 12, 2024

attila-dusnoki-htec moved this from 🔖 Ready to 🏗 In progress in MIGraphX ONNX support May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MIGraphX as backend for Triton Inference Server #178

MIGraphX as backend for Triton Inference Server #178

attila-dusnoki-htec commented Mar 6, 2024

gyulaz-htec commented Apr 23, 2024 •

edited

Loading

gyulaz-htec commented May 7, 2024 •

edited

Loading

gyulaz-htec commented May 14, 2024

gyulaz-htec commented May 30, 2024 •

edited

Loading

MIGraphX as backend for Triton Inference Server #178

MIGraphX as backend for Triton Inference Server #178

Comments

attila-dusnoki-htec commented Mar 6, 2024

gyulaz-htec commented Apr 23, 2024 • edited Loading

Steps to start the minimal triton example:

Building minimal backend

Setting up tritonserver docker image with a custom backend

Start triton server inside docker

Testing the minimal backend from another docker

gyulaz-htec commented May 7, 2024 • edited Loading

gyulaz-htec commented May 14, 2024

gyulaz-htec commented May 30, 2024 • edited Loading

Remaining tasks

Server​

Core​

Backend​

ORT Backend​

TBD

gyulaz-htec commented Apr 23, 2024 •

edited

Loading

gyulaz-htec commented May 7, 2024 •

edited

Loading

gyulaz-htec commented May 30, 2024 •

edited

Loading

Server

Core

Backend

ORT Backend