-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add first-pass at stability tritonserver-based imagegen comp #227
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
…omps into add_imagegen_comp
Hey maintainers - this is clearly not in a high-quality state yet. But I want to get architecture/code location/generic feedback on the approach before I spend the time to polish it. Any notes would be greatly appreciated! This Stable Diffusion model powers a card in the Intel AI Explorer - so upstreaming it to OPEA seemed like a reasonable next step. |
|
||
#!/bin/bash -ex | ||
|
||
# Copyright (C) 2024 Intel Corporation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The license header gets replicated twice. can you pls clean up it?
Please create distinct microservices for the pipeline components, for instance triton server should be in its own microservice. Data cleaning, embedding, model server, model etc. The genAIexample contains a pipeline composed of microservices in the GenAImicrocomps and we need e2e tests for everything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some documentation would be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @acwrenn
Thanks for this PR.
If you have any questions about some of my comments, Please let me know.
comps/imagegen/imagegen.py
Outdated
def generate_image(*, text, triton_endpoint): | ||
start = time.time() | ||
|
||
with httpclient.InferenceServerClient(triton_endpoint, network_timeout=1000 * 300) as client: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning for the 300s
timeout? Isn't that a bit too long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial ImageGen request can take a while - having the ImageGen non-model comp "warm up" the model service would let us reduce this, is that a pattern seen in any other comp I can lift?
comps/imagegen/imagegen.py
Outdated
inputs, | ||
request_id=str(1), | ||
outputs=outputs, | ||
timeout=1000 * 300, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Would it make more sense to set this to?
timeout=network_timeout
name="opea_service@imagegen", | ||
service_type=ServiceType.IMAGEGEN, | ||
endpoint="/v1/images/generate", | ||
host="0.0.0.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, I'd set this to localhost or 127.0.0.1
and then provide a mechanism for the user to decide if they want to run locally or listen to the world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lifted this from the TTS comp - I guess its a common question of "do it the same way as the codebase, or do what makes sense locally"
Should I stick to "do it locally correct" in this case?
|
||
ARG TRITON_VERSION=24.04 | ||
|
||
FROM nvcr.io/nvidia/tritonserver:${TRITON_VERSION}-py3 AS triton |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to check on this container.
|
||
ARG MODEL_NAME=stability | ||
|
||
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to check this one too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take a stab and building the triton containers from source - but I am not sure where that code would live. Probably not in this repo. And who would host it?
The Nvidia-distributed triton server container DOES contain a bunch of extra stuff we don't need.
|
||
DEVICES?=3 | ||
run: | ||
docker run \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here for proxies
-ti test | ||
|
||
test: | ||
docker run --runtime runc -ti --net host nvcr.io/nvidia/tritonserver:24.04-py3-sdk \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here.
comps/imagegen/triton/client.py
Outdated
inputs, | ||
request_id=str(1), | ||
outputs=outputs, | ||
timeout=1000 * 300, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout=network_timeout
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove these lines.
comps/imagegen/triton/install_efa.sh
Outdated
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove these lines.
This is an interesting question - because the tritonserver portion acts as a replacement for the TGI services in the LLM examples - which are not distinct comps contained in this repo. So - there seems to be a clear distinction between a service that performs inference, and a service that glues other parts together. I dont think that requiring the buisness-logic service onto an accelerator-having-host is a good idea. So - I guess my question is one looking for clarity. There should be a comp that is JUST the model server, and then keep the ImageGen API container as a different comp? |
for more information, see https://pre-commit.ci
…omps into add_imagegen_comp
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…omps into add_imagegen_comp
for more information, see https://pre-commit.ci
* refine chatqna test script Signed-off-by: letonghan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete comments Signed-off-by: letonghan <[email protected]> * modify expected result of embedding Signed-off-by: letonghan <[email protected]> * update rerank expected result Signed-off-by: letonghan <[email protected]> * update llm expected result Signed-off-by: letonghan <[email protected]> * update docker compose yaml Signed-off-by: letonghan <[email protected]> * fix conda issue Signed-off-by: letonghan <[email protected]> * add log_path for log collection Signed-off-by: letonghan <[email protected]> --------- Signed-off-by: letonghan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@acwrenn are you still actively working this PR or should it be closed? |
Description
This PR adds a new component for image generation using Stability. The API is currently a WIP - lots of feedback appreciated!
Issues
NA
Type of change
Dependencies
Tritonserver
Stable Diffusion = "Habana/stable-diffusion-2"
Tests
WIP