Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] triton inference server Dockerfile #4047

Open
geraldstanje opened this issue Jul 10, 2024 · 5 comments
Open

[question] triton inference server Dockerfile #4047

geraldstanje opened this issue Jul 10, 2024 · 5 comments

Comments

@geraldstanje
Copy link

geraldstanje commented Jul 10, 2024

hi,

where can i find documentation how to build triton inference server trt-llm 24.06 for sagemaker myself so i can run it on sagemaker?

Nvidia Image i want to use: nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3

Can you please post the Dockerfile you use to modify the nvidia container? I need the Dockerfile you have for creating: https://github.com/aws/deep-learning-containers/blob/master/available_images.md#nvidia-triton-inference-containers-sm-support-only - will than modify it and use the nvidia image above.

Info: i have created a trt llm model created myself - will copy it in the container.

cc @ohadkatz @nskool @sirutBuasai

@nikhil-sk
Copy link
Contributor

Hi @geraldstanje we don't support the TRT-LLM container for Triton on SM yet. Most changes to support SageMaker are already upstreamed and the above container should work with SageMaker directly. Functionally, you can just run the nvidia image on SageMaker, and shouldn't need the dockerfile to modify it.

@geraldstanje
Copy link
Author

geraldstanje commented Jul 10, 2024

@nskool can you please paste the Dockerfile of the latest trt container for triton you released (24.3) here? I will modify the Docker and add llm myself.

@nikhil-sk
Copy link
Contributor

@geraldstanje Based on your initial comment, you want to run TRT-LLM on SageMaker, is that correct? I'm trying to say that the nvidia TRT-LLM image will work just fine on SageMaker. Is there a specific reason you are looking for a SageMaker-provided Triton image (that doesn't supply TRTLLM), are building TRT-LLM yourself atop it?

@geraldstanje
Copy link
Author

geraldstanje commented Jul 23, 2024

@nskool it works - what i was looking for was the entrypoint: https://github.com/triton-inference-server/server/blob/main/docker/sagemaker/serve
i just used image: nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 - i didnt build it myself...

also it seems the metrics are not forwarded to sagemaker (entrypoint only has port 8080) - is there a solution for that?

does that mean you can run any docker container on sagemaker?
can also run vllm/vllm-openai:latest docker container on sagemaker? https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html

@geraldstanje
Copy link
Author

@nikhil-sk did you see ^^ by chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants