Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple Lora adapter replicas #129

Open
Jeffwan opened this issue Sep 5, 2024 · 5 comments · May be fixed by #205
Open

Support multiple Lora adapter replicas #129

Jeffwan opened this issue Sep 5, 2024 · 5 comments · May be fixed by #205
Assignees
Labels
area/lora kind/enhancement New feature or request priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@Jeffwan
Copy link
Collaborator

Jeffwan commented Sep 5, 2024

🚀 Feature Description and Motivation

In the initial version, to simplify the the model adapter autoscaling, we determine to support only 1 replica in the CRD. Technically, we should support multiple replicas to allow higher throughput.

Use Case

In my production deployment, it need higher throughput and I want multiple lora to be deployed in the environments.

Proposed Solution

  1. Enable replicas in the lora crd
  2. Make sure the scheduling algorithm can correctly schedule the lora. We need to handle some special cases like num of loras <= num of pods. It's meaningless to support > 1 loras on single pod.
  3. (Optional) support lora autoscaling
@Jeffwan Jeffwan added this to the v0.1.0 milestone Sep 5, 2024
@Jeffwan Jeffwan added kind/enhancement New feature or request area/lora priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Sep 5, 2024
@Jeffwan Jeffwan self-assigned this Sep 5, 2024
@Jeffwan Jeffwan modified the milestones: v0.1.0, v0.1.0-rc.2 Sep 11, 2024
@xieus
Copy link
Collaborator

xieus commented Sep 19, 2024

It's meaningless to support > 1 loras on single pod.

Quick q: did you mean support "< 1 loras on single pod"?

@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Sep 19, 2024

@xieus this is a constraints on the scheduling. single lora model adapter can be scheduled to the pod no more than 1 replica. 2 replicas on single pod won't be helpful from the throughput perspective

@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Sep 24, 2024

#205 becomes a large change and I notice there're some edge cases needs to cover. I will postpone this feature to rc3.

@Jeffwan Jeffwan modified the milestones: v0.1.0-rc.2, v0.1.0-rc.3 Sep 24, 2024
@Jeffwan Jeffwan modified the milestones: v0.1.0-rc.3, v0.2.0 Oct 2, 2024
@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Oct 2, 2024

It takes some time to refactor the current code base to improve the extensibility for such changes. I already move some refactor codes changes from #205 to #260 . This would be moved to v0.2.0

@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Jan 15, 2025

move to later release due to limited times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/lora kind/enhancement New feature or request priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants