Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIGraphX as backend for Triton Inference Server #178

Open
attila-dusnoki-htec opened this issue Mar 6, 2024 · 4 comments
Open

MIGraphX as backend for Triton Inference Server #178

attila-dusnoki-htec opened this issue Mar 6, 2024 · 4 comments
Assignees

Comments

@attila-dusnoki-htec
Copy link

The idea here is to use the Triton Inference Server to perform Inferences via MIGraphX.

The first issue to tackle is to enable it without the official docker, and use a rocm based.

The next would be to add MGX as a backend.
There are multiple repos to check how to do it.

This can be worked on parallel, without a working docker.

@gyulaz-htec
Copy link

gyulaz-htec commented Apr 23, 2024

Steps to start the minimal triton example:

Building minimal backend

git clone https://github.com/triton-inference-server/backend
cd backend
git checkout r23.04 
cd examples/backends/minimal/
mkdir build
cd build
# This rapidjson-dev dependency was missing for me during install, but it was not listed as a requirement
sudo apt-get install rapidjson-dev
cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install .. 
make install 

libtriton_minimal.so will be generated in: backend/examples/backends/minimal/build/install/backends/minimal

Setting up tritonserver docker image with a custom backend

git clone https://github.com/triton-inference-server/server
cd server 
# This will generate Docker.compose file
# I had to use an existing backend because it only works with that
python3 ./compose.py --backend onnxruntime --repoagent checksum
# I had to copy the result folder from the previous step next to the compose.py script to the next step to work
cp -r /home/htec/gyulaz/triton/backend/examples/backends/minimal/build/install/backends/minimal ./minimal

Modify Docker.compose:

Replace
ENV LD_LIBRARY_PATH /opt/tritonserver/backends/onnxruntime:${LD_LIBRARY_PATH}
line with
ENV LD_LIBRARY_PATH /opt/tritonserver/backends/minimal:${LD_LIBRARY_PATH}

Replace
COPY --chown=1000:1000 --from=full /opt/tritonserver/backends/onnxruntime /opt/tritonserver/backends/onnxruntime
line with:
COPY ./minimal /opt/tritonserver/backends/minimal

# Finaly build the docker image
docker build -t tritonserver_custom -f Dockerfile.compose . 

Start triton server inside docker

docker run --rm -it --net=host -v /home/htec/gyulaz/triton/backend/:/backend tritonserver_custom tritonserver --model-repositor=/backend/examples/model_repos/minimal_models/ 

Testing the minimal backend from another docker

I've used the same docker image but I think it's not neccesarry. I still had to install some missing dependencies inside the container to start the test script.

# Start docker from another terminal
docker run --rm -it --net=host -v /home/htec/gyulaz/triton/backend/:/backend tritonserver_custom
# Install some missing dependencies
apt-get install python3-pip
pip3 install numpy tritonclient  gevent  geventhttpclient
# this was just named `minimal_client` in the repo, I've added the file extension so I could start it
python3 /backend/examples/clients/minimal_client.py 

You should see

=========
Sending request to nonbatching model: IN0 = [1 2 3 4]
Response: {'model_name': 'nonbatching', 'model_version': '1', 'outputs': [{'name': 'OUT0', 'datatype': 'INT32', 'shape': [4], 'parameters': {'binary_data_size': 16}}]}
OUT0 = [1 2 3 4]

=========
Sending request to batching model: IN0 = [[10 11 12 13]]
Sending request to batching model: IN0 = [[20 21 22 23]]
Response: {'model_name': 'batching', 'model_version': '1', 'outputs': [{'name': 'OUT0', 'datatype': 'INT32', 'shape': [1, 4], 'parameters': {'binary_data_size': 16}}]}
OUT0 = [[10 11 12 13]]
Response: {'model_name': 'batching', 'model_version': '1', 'outputs': [{'name': 'OUT0', 'datatype': 'INT32', 'shape': [1, 4], 'parameters': {'binary_data_size': 16}}]}
OUT0 = [[20 21 22 23]]

@gyulaz-htec
Copy link

gyulaz-htec commented May 7, 2024

Planned steps for POC:

  • Hipify server API
  • Hipify backend API
  • Reimplement minimal backend with MIGRaphX
    • TRITONBACKEND_Backend,
    • TRITONBACKEND_Model,
    • TRITONBACKEND_ModelInstance
  • Extend server docker image generation with migraphx dependencies

@attila-dusnoki-htec attila-dusnoki-htec moved this from 🔖 Ready to 🏗 In progress in MIGraphX ONNX support May 14, 2024
@gyulaz-htec
Copy link

As we agreed with the AMD team first we should update their WIP solution to provide MIGraphX as an execution provider for onnxruntime triton inefrence backend.
The CPU inference is done by them, we have to figure out why the GPU is not chosen for inference.
The AMD issue describing their progress: ROCm#2411

@gyulaz-htec
Copy link

gyulaz-htec commented May 30, 2024

Remaining tasks

Server​

  • Update hipify part in build.py script​
  • Clean-up

Core​

  • Add back skipped check for supported GPUs​
  • Fix skipped int64 field checks from config
  • Fix skipped cnmem code paths
  • Hipify? - WIP

Backend​

  • Make hipify part of the cmake build process​

ORT Backend​

  • Remove hardcoded GPU kind set as default​
  • Don't force the auto-config step​

TBD

  • Try to use device memory instead of pinned memory? - try to enable it from the config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

No branches or pull requests

2 participants