Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first-pass at stability tritonserver-based imagegen comp #227

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

acwrenn
Copy link

@acwrenn acwrenn commented Jun 20, 2024

Description

This PR adds a new component for image generation using Stability. The API is currently a WIP - lots of feedback appreciated!

Issues

NA

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • [ x] New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)

Dependencies

Tritonserver
Stable Diffusion = "Habana/stable-diffusion-2"

Tests

WIP

@acwrenn
Copy link
Author

acwrenn commented Jun 20, 2024

Hey maintainers - this is clearly not in a high-quality state yet. But I want to get architecture/code location/generic feedback on the approach before I spend the time to polish it. Any notes would be greatly appreciated!

This Stable Diffusion model powers a card in the Intel AI Explorer - so upstreaming it to OPEA seemed like a reasonable next step.

https://or-dev.dcs-tools-experiments.infra-host.com/explore


#!/bin/bash -ex

# Copyright (C) 2024 Intel Corporation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The license header gets replicated twice. can you pls clean up it?

@mkbhanda
Copy link
Collaborator

Please create distinct microservices for the pipeline components, for instance triton server should be in its own microservice. Data cleaning, embedding, model server, model etc. The genAIexample contains a pipeline composed of microservices in the GenAImicrocomps and we need e2e tests for everything.

Copy link
Contributor

@dbkinder dbkinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some documentation would be a good idea.

Copy link
Collaborator

@ashahba ashahba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @acwrenn
Thanks for this PR.
If you have any questions about some of my comments, Please let me know.

def generate_image(*, text, triton_endpoint):
start = time.time()

with httpclient.InferenceServerClient(triton_endpoint, network_timeout=1000 * 300) as client:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for the 300s timeout? Isn't that a bit too long?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial ImageGen request can take a while - having the ImageGen non-model comp "warm up" the model service would let us reduce this, is that a pattern seen in any other comp I can lift?

inputs,
request_id=str(1),
outputs=outputs,
timeout=1000 * 300,
Copy link
Collaborator

@ashahba ashahba Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Would it make more sense to set this to?
timeout=network_timeout

name="opea_service@imagegen",
service_type=ServiceType.IMAGEGEN,
endpoint="/v1/images/generate",
host="0.0.0.0",
Copy link
Collaborator

@ashahba ashahba Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, I'd set this to localhost or 127.0.0.1 and then provide a mechanism for the user to decide if they want to run locally or listen to the world.

Copy link
Author

@acwrenn acwrenn Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lifted this from the TTS comp - I guess its a common question of "do it the same way as the codebase, or do what makes sense locally"

Should I stick to "do it locally correct" in this case?


ARG TRITON_VERSION=24.04

FROM nvcr.io/nvidia/tritonserver:${TRITON_VERSION}-py3 AS triton
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check on this container.


ARG MODEL_NAME=stability

RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check this one too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take a stab and building the triton containers from source - but I am not sure where that code would live. Probably not in this repo. And who would host it?

The Nvidia-distributed triton server container DOES contain a bunch of extra stuff we don't need.


DEVICES?=3
run:
docker run \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here for proxies

-ti test

test:
docker run --runtime runc -ti --net host nvcr.io/nvidia/tritonserver:24.04-py3-sdk \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here.

inputs,
request_id=str(1),
outputs=outputs,
timeout=1000 * 300,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout=network_timeout

Comment on lines +1 to +3
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these lines.

Comment on lines 1 to 3
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these lines.

@acwrenn
Copy link
Author

acwrenn commented Jun 27, 2024

Please create distinct microservices for the pipeline components, for instance triton server should be in its own microservice. Data cleaning, embedding, model server, model etc. The genAIexample contains a pipeline composed of microservices in the GenAImicrocomps and we need e2e tests for everything.

This is an interesting question - because the tritonserver portion acts as a replacement for the TGI services in the LLM examples - which are not distinct comps contained in this repo.

So - there seems to be a clear distinction between a service that performs inference, and a service that glues other parts together. I dont think that requiring the buisness-logic service onto an accelerator-having-host is a good idea.

So - I guess my question is one looking for clarity. There should be a comp that is JUST the model server, and then keep the ImageGen API container as a different comp?

lkk12014402 pushed a commit that referenced this pull request Aug 8, 2024
* refine chatqna test script

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete comments

Signed-off-by: letonghan <[email protected]>

* modify expected result of embedding

Signed-off-by: letonghan <[email protected]>

* update rerank expected result

Signed-off-by: letonghan <[email protected]>

* update llm expected result

Signed-off-by: letonghan <[email protected]>

* update docker compose yaml

Signed-off-by: letonghan <[email protected]>

* fix conda issue

Signed-off-by: letonghan <[email protected]>

* add log_path for log collection

Signed-off-by: letonghan <[email protected]>

---------

Signed-off-by: letonghan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@dcmiddle
Copy link
Contributor

dcmiddle commented Oct 4, 2024

@acwrenn are you still actively working this PR or should it be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants