Releases · vllm-project/aibrix

19 Feb 18:31

github-actions

v0.2.0

0a21d77

v0.2.0 Latest

Latest

Automatically generated release for tag v0.2.0.

🚀 New Features Highlights

Distributed KV Cache: Implemented support for managing KV cache across multiple nodes, enhancing performance.
Cost-Driven Heterogenous Serving: Improved scheduling and inference strategies for mixed GPU environments, optimizing cost and resource utilization. (#371 #430, #509, #598, #554, #598)
Optimizer Based Autoscaling: Leverage offline profiles of inference server to calculate the number of replicas. (#430, #500, #692, #508)
Prefix Cache Aware Routing: Added support for routing decisions based on prefix cache hits, improving inference efficiency. (#641, #657)

📊 Feature Enhancements

LoRA Scheduling Enhancements: Introduced multiple scheduling strategies, including bin packing, least latency, least throughput, and random. (#544)
Prefix Cache Aware Routing: Added support for routing decisions based on prefix cache hits, improving inference efficiency. (#641)
Gateway Enhancements: Improved request handling efficiency by enabling streaming in the Envoy gateway. (#377) Enhanced the handling of model registration and invalid cache scenarios. (#542), Introduced fallback strategies to ensure robust request allocation. (#445) Optimized cache store retrieval, reducing unnecessary overhead. (#639) Addressed missing Prometheus config preventing gateway startup. (#441)
PodAutoscaler Scaling improvements: Improved scaling logic to handle edge cases more efficiently. (#508, #515)

🛠Infrastructure & CI/CD Upgrades

Parallelized Build Tasks: CI efficiency improvements by running builds in parallel. (#398)
CrashLoopBackOff Detection in CI: Added monitoring for pod failures in testing workflows. (#444)
Improved GitHub Actions Cost Efficiency: Optimized triggers and removed unnecessary nightly builds. (#411, #422)
Integration Tests for Core Components: Added integration tests for autoscalers, routing policies, and deployment configurations. (#616, #620)

What's Changed

Add envoy gateway streaming support by @varungup90 in #377
Add client traffic policy to increase per connection buffer size from 32kb to 256kb by @varungup90 in #395
Misc: add support to metricsSources property of podautoscaler by @zhangjyr in #371
[Misc] Update runtime server startup command in v0.1.0 by @brosoul in #396
[CI] improve the ci efficiency by parallelizing the build tasks by @nwangfw in #398
Fix the ticker interval by removing unnecessary ms by @Jeffwan in #415
[Misc] Disable specific endpoints logs by @Jeffwan in #418
[CI] Github Action trigger condition optimized for cost saving by @nwangfw in #411
[Misc] Fix the mocked app role permission issue by @Jeffwan in #416
[CI] Nightly tag removed for release branch by @nwangfw in #422
Enable setting PodAutoscaler configuration via YAML labels by @kr11 in #409
Update manifest to adopt v0.1.1 images by @Jeffwan in #429
[Bug]: duplicated http in rest metrics fetcher (#408) by @zhangjyr in #421
[MISC]: Improve Request Trace Granularity with Version Control by @zhangjyr in #431
Support histogram metrics from engine in cache by @Jeffwan in #424
Support fetching metrics from remote Prometheus server by @Jeffwan in #433
[CI] Add python wheel to release artifact by @Jeffwan in #434
Fix update cache pod issue and refactor updatePod handler by @Jeffwan in #439
Extract common metrics structure to types and utils by @Jeffwan in #438
Fix gateway startup issue due to missing prometheus config by @Jeffwan in #441
[feat]: GPU Optimizer and Simulator development app by @zhangjyr in #430
Add selectrandom fallback in routing and only scraping healthy pods by @Jeffwan in #445
AIBrix Workload Generator / Scenario Simulator by @happyandslow in #428
CrashLoopBackOff status detection in CI by @nwangfw in #444
Support installing individual controllers from giant controller-manager by @nwangfw in #442
Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs by @kr11 in #437
Support metrics multi labels for different models by @brosoul in #450
Add health check api interface for runtime by @Jeffwan in #451
Fix the service name override issue in rolebindings by @Jeffwan in #453
Reorganize docs/development and docs/tutorial structure by @Jeffwan in #455
Move tools to separate folders and update mocked app README.md by @Jeffwan in #457
Fix multi models metric result in PromQL by @brosoul in #458
Support Azure LLM trace in workload generator by @happyandslow in #462
Fix autoscaler scalingstrategy switching logic by @nwangfw in #475
Fix missing handle of PromQL scope is PodMetricScope by @brosoul in #479
[Misc] Consolidate app and simulator by @zhangjyr in #477
[Bug] Avoid including sensitive info in Dockerfile ENV by @zhangjyr in #487
Refactor generator to generate time-based traces by @happyandslow in #478
[CI] Update deploy workload script in installation test by @nwangfw in #499
[Bug] handle metricKey creation with MetricsSources by @nwangfw in #498
Adding Client for Workload Generator Workload File by @happyandslow in #501
[Feat] Integrate deployment configurations and fix autoscaler/gpu optimizer connectivity by @zhangjyr in #500
Fix some simulator format issue and add some TODOs by @Jeffwan in #505
[Bug] Fix the way how podautoscaler handle 0 pods. by @zhangjyr in #508
[Misc] Improve gpu optimizer debugging on podautoscaler. by @zhangjyr in #509
Optimize kustomize overlay for volcano engine deployment by @Jeffwan in #512
[perf] Refact tos downloader in Runtime by @brosoul in #510
Refactor metric source for customized protocol, port and path by @kr11 in #511
[Bug] Fixed the yaml of deployments in heterogenous GPU settings to make KPA scaling work as expected. by @zhangjyr in #513
[Misc] Heterogeneous GPU Optimizer Logging Clean Up by @nwangfw in #514
Fix KPA bug, and an elaborate KPA test case by @kr11 in #515
Cut v0.2.0-rc.1 release by @Jeffwan in #516
[Bug] Accumulated bug fix on controller manager, mock app configuration, and gpu optimizer. by @zhangjyr in #522
[Misc] Reduced runtime's container image size by @nwangfw in #518
clean memory scaler object when pa crd is deleted by @kr11 in #520
Configure autoscaler http client to skip certificate check by @Jeffwan in #530
[Doc] Update aibrix documentation by @Jeffwan in #533
Refactor the gateway-plugin and metadata service manifests by @Jeffwan in #531
Fix the GITHUB_WORKSPACE artifact sharing issue in release workflow by @Jeffwan in #532
[Misc] Polish the benchmark scripts by @Jeffwan in #525
Fix APA bugs in creation, add test and demo yaml by @kr11 in #536
Add VKE IPv4 Testing Cluster Config by @nwangfw in #537
Support for request length internal trace by @happyandslow in #538
[Feat] Add download status into runtime downloader by @brosoul in #539
[Feat] Add runtime model management api by @brosoul in #540
[gateway] handle the wrong model name and cache inconsistency case by @Jeffwan in #542
[Docs] fix: update the parameters instruction in readme by @scarlet25151 in #548
add lora schedulers - bin pack, least latency, least throughput, random by @Aspirin96 in #544
add request routers - least kv cache, least expected latency by @Aspirin96 in #543
[Docs] heterogenous gpu docs added by ...

Contributors

zhangjyr, Jeffwan, and 10 other contributors

Assets 5

23 Jan 22:23

github-actions

v0.2.0-rc.2

6ee2f11

v0.2.0-rc.2 Pre-release

Pre-release

Automatically generated release for tag v0.2.0-rc.2.

What's Changed

[Bug] Accumulated bug fix on controller manager, mock app configuration, and gpu optimizer. by @zhangjyr in #522
[Misc] Reduced runtime's container image size by @nwangfw in #518
clean memory scaler object when pa crd is deleted by @kr11 in #520
Configure autoscaler http client to skip certificate check by @Jeffwan in #530
[Doc] Update aibrix documentation by @Jeffwan in #533
Refactor the gateway-plugin and metadata service manifests by @Jeffwan in #531
Fix the GITHUB_WORKSPACE artifact sharing issue in release workflow by @Jeffwan in #532
[Misc] Polish the benchmark scripts by @Jeffwan in #525
Fix APA bugs in creation, add test and demo yaml by @kr11 in #536
Add VKE IPv4 Testing Cluster Config by @nwangfw in #537
Support for request length internal trace by @happyandslow in #538
[Feat] Add download status into runtime downloader by @brosoul in #539
[Feat] Add runtime model management api by @brosoul in #540
[gateway] handle the wrong model name and cache inconsistency case by @Jeffwan in #542
[Docs] fix: update the parameters instruction in readme by @scarlet25151 in #548
add lora schedulers - bin pack, least latency, least throughput, random by @Aspirin96 in #544
add request routers - least kv cache, least expected latency by @Aspirin96 in #543
[Docs] heterogenous gpu docs added by @nwangfw in #545
Fix race condition in cache by @varungup90 in #550
Fix pod internal cache delete handling by @varungup90 in #552
Handle terminating pod for request routing by @varungup90 in #549
Support absolute path as lora adapter artifact path by @Jeffwan in #556
Deadlock fix for cache by @varungup90 in #557
Mock app log fix for missing metrics warning by @varungup90 in #564
Add vllm graceful termination configuration by @nwangfw in #568
Enhance dynamic lora adapter support for auth enabled scenario by @Jeffwan in #571
Update pyproject.toml to support python 3.12 by @Jeffwan in #579
[Docs ]Update ai runtime management api and downloader docs by @Jeffwan in #577
Check the HPA ownerReference in request enqueue by @Jeffwan in #582
Add request length for traces by @happyandslow in #569
Support model registration flow using aibrix runtime api by @Jeffwan in #580
Gateway plugin report total incoming requests and pending requests by @zhangjyr in #554
Support distributed kv cache orchestration by @Jeffwan in #583
Grant workflow action permission to write packages by @Jeffwan in #586
Update routers to use GetPodModelMetric api and misc cleanup in metri… by @varungup90 in #590
Update upload/download artifact github actions version to v4 by @varungup90 in #591
Update version in aibrix/python to 0.2.0-rc.2 by @varungup90 in #594

New Contributors

@scarlet25151 made their first contribution in #548
@Aspirin96 made their first contribution in #544

Full Changelog: v0.2.0-rc.1...v0.2.0-rc.2

Contributors

zhangjyr, Jeffwan, and 7 other contributors

Assets 5

09 Jan 06:44

Jeffwan

v0.1.2

b0766a9

v0.1.2

What's Changed

Support absolute path as lora adapter artifact path (#556) by @Jeffwan in #558
Cherry pick streaming and client traffic policy by @varungup90 in #560
Cut v0.1.2 release by @Jeffwan in #561

Full Changelog: v0.1.1...v0.1.2

Contributors

Jeffwan and varungup90

Assets 4

10 Dec 20:16

Jeffwan

v0.2.0-rc.1

0d40fbd

v0.2.0-rc.1 Pre-release

Pre-release

What's Changed

Add envoy gateway streaming support by @varungup90 in #377
Add client traffic policy to increase per connection buffer size from 32kb to 256kb by @varungup90 in #395
Misc: add support to metricsSources property of podautoscaler by @zhangjyr in #371
[Misc] Update runtime server startup command in v0.1.0 by @brosoul in #396
[CI] improve the ci efficiency by parallelizing the build tasks by @nwangfw in #398
Fix the ticker interval by removing unnecessary ms by @Jeffwan in #415
[Misc] Disable specific endpoints logs by @Jeffwan in #418
[CI] Github Action trigger condition optimized for cost saving by @nwangfw in #411
[Misc] Fix the mocked app role permission issue by @Jeffwan in #416
[CI] Nightly tag removed for release branch by @nwangfw in #422
Enable setting PodAutoscaler configuration via YAML labels by @kr11 in #409
Update manifest to adopt v0.1.1 images by @Jeffwan in #429
[Bug]: duplicated http in rest metrics fetcher (#408) by @zhangjyr in #421
[MISC]: Improve Request Trace Granularity with Version Control by @zhangjyr in #431
Support histogram metrics from engine in cache by @Jeffwan in #424
Support fetching metrics from remote Prometheus server by @Jeffwan in #433
[CI] Add python wheel to release artifact by @Jeffwan in #434
Fix update cache pod issue and refactor updatePod handler by @Jeffwan in #439
Extract common metrics structure to types and utils by @Jeffwan in #438
Fix gateway startup issue due to missing prometheus config by @Jeffwan in #441
[feat]: GPU Optimizer and Simulator development app by @zhangjyr in #430
Add selectrandom fallback in routing and only scraping healthy pods by @Jeffwan in #445
AIBrix Workload Generator / Scenario Simulator by @happyandslow in #428
CrashLoopBackOff status detection in CI by @nwangfw in #444
Support installing individual controllers from giant controller-manager by @nwangfw in #442
Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs by @kr11 in #437
Support metrics multi labels for different models by @brosoul in #450
Add health check api interface for runtime by @Jeffwan in #451
Fix the service name override issue in rolebindings by @Jeffwan in #453
Reorganize docs/development and docs/tutorial structure by @Jeffwan in #455
Move tools to separate folders and update mocked app README.md by @Jeffwan in #457
Fix multi models metric result in PromQL by @brosoul in #458
Support Azure LLM trace in workload generator by @happyandslow in #462
Fix autoscaler scalingstrategy switching logic by @nwangfw in #475
Fix missing handle of PromQL scope is PodMetricScope by @brosoul in #479
[Misc] Consolidate app and simulator by @zhangjyr in #477
[Bug] Avoid including sensitive info in Dockerfile ENV by @zhangjyr in #487
Refactor generator to generate time-based traces by @happyandslow in #478
[CI] Update deploy workload script in installation test by @nwangfw in #499
[Bug] handle metricKey creation with MetricsSources by @nwangfw in #498
Adding Client for Workload Generator Workload File by @happyandslow in #501
[Feat] Integrate deployment configurations and fix autoscaler/gpu optimizer connectivity by @zhangjyr in #500
Fix some simulator format issue and add some TODOs by @Jeffwan in #505
[Bug] Fix the way how podautoscaler handle 0 pods. by @zhangjyr in #508
[Misc] Improve gpu optimizer debugging on podautoscaler. by @zhangjyr in #509
Optimize kustomize overlay for volcano engine deployment by @Jeffwan in #512
[perf] Refact tos downloader in Runtime by @brosoul in #510
Refactor metric source for customized protocol, port and path by @kr11 in #511
[Bug] Fixed the yaml of deployments in heterogenous GPU settings to make KPA scaling work as expected. by @zhangjyr in #513
[Misc] Heterogeneous GPU Optimizer Logging Clean Up by @nwangfw in #514
Fix KPA bug, and an elaborate KPA test case by @kr11 in #515
Cut v0.2.0-rc.1 release by @Jeffwan in #516

Full Changelog: v0.1.1...v0.2.0-rc.1

Contributors

zhangjyr, Jeffwan, and 5 other contributors

Assets 6

21 Nov 23:02

github-actions

v0.1.1

1e7b918

v0.1.1

Automatically generated release for tag v0.1.1.

What's Changed

Cherry-pick - Fix the ticker interval by removing unnecessary ms by @Jeffwan in #425
Cut v0.1.1 release by @Jeffwan in #427

Full Changelog: v0.1.0...v0.1.1

Contributors

Jeffwan

Assets 4

12 Nov 22:33

github-actions

v0.1.0

d885131

v0.1.0

Feature Highlights

1. Dynamic LoRa Adapter

The Dynamic LoRa Adapter introduces a flexible approach to model adaptation, allowing dynamic management of LoRa models within Kubernetes. This new functionality includes efficient handling of model registration, unloading, and routing, significantly enhancing operational control and scalability for production environments.

2. Gateway Extension Server with Multi-Algorithm Routing Support

We extend the Envoy Gateway through an extension server and the external processing service can inspect and mutate requests and responses. We use this way to extend some features not directly supported in kubernetes service like various routing algorithms, such as least request, least throughput, and random and rate limit feature. This flexibility allows users to fine-tune routing strategies based on their specific application needs, ultimately improving traffic distribution and system performance.

3. LLM-specific Autoscaler

This release integrates multiple autoscaling algorithms, including HPA (Horizontal Pod Autoscaler), KPA (Knative Pod Autoscaler), and APA (AIBrix Pod Autoscaler). The autoscaling framework now features a direct connection to fetch metrics from pods, enabling real-time adjustments based on load and optimized resource utilization.

4. Unified AI Runtime

The AI runtime has been created to support faster model downloading through GPU streaming way, streamlined metrics aggregation, and efficient LoRa request delegation to abstract underlying engine complexities. This runtime provides an optimized environment for deploying and managing machine learning models, making it easier to handle high-volume requests.

Additional Enhancements:

Doc website: Updated documents, including quick-start guides, installation instructions, and tutorials for autoscaling, make setup and onboarding smoother.
Benchmarking and Performance Analysis Tools: Integrated tools for benchmarking autoscalers, gateways and lora to monitor and improve system efficiency and performance.
CI/CD Workflow: The new CI/CD pipeline includes automated image builds, GitHub Actions for testing and linting, and release pipelines for simplified deployment.

What's Changed

Add common project documents and skeleton folders by @Jeffwan in #4
Scaffolding aibrix project using kubebuilder by @Jeffwan in #17
Optimize project layouts by moving controllers to pkg folder by @Jeffwan in #21
Create Lora api and controller by @Jeffwan in #23
Rename LoraAdapter to ModelAdapter by @Jeffwan in #25
Add ModelAdapter API by @Jeffwan in #26
Use better way to set up controller with Manager by @Jeffwan in #27
Initial model adapter controller implementation by @Jeffwan in #32
Add mocked model container for lora adapter fast prototyping by @Jeffwan in #33
[Misc] Add the PR and issues template by @jsw-zorro in #38
[Docs] Add example to run vLLM distributed inference using Ray by @Jeffwan in #39
[Doc] Improve the model adapter mock service by @Jeffwan in #45
[Misc] Simplify the feature/bug/enhancement template. by @jsw-zorro in #48
[Misc] Make model adapter controller e2e work by @Jeffwan in #50
[Docs] A draft version of the contributing guideline document by @kr11 in #47
[Core] Improve model adapter controller by handling existing resources by @Jeffwan in #54
[Feat] Initial Implementation of PodAutoscaler Reconciler by @kr11 in #55
[Docs] Move the sample mocked application to common folder by @Jeffwan in #64
[Misc] Minor refactor the PodAutoscaler codes by @Jeffwan in #68
[Core] Add model router controller by @varungup90 in #57
Add rbac rules in model router by @varungup90 in #71
[bugs] Add autoscaler RBAC to successfully list horizontalpodautoscalers by @kr11 in #72
[Misc] Update license info; Add license check by @happyandslow in #73
add github workflow to lint & test code by @M00nF1sh in #74
[CI] Fix the golang lint issues by @Jeffwan in #77
[CI] fix the failures from make test by @Jeffwan in #80
[Misc] Add code-generator and openapi-gen as dependencies by @Jeffwan in #59
[Misc] Reconcile hpa, kpa and apa separately by @Jeffwan in #83
[feat] Add rpm/tpm extension proc plugin by @varungup90 in #79
Add kpa scale algorithm implementation by @kr11 in #87
Add host override to query specific pod by @varungup90 in #86
[Core] init aibrix runtime framework by @brosoul in #88
Support kpa/apa autoscaling workflow part I by @Jeffwan in #85
Fix Dockerfile Packaging Issues Related to Go Version and Missing Utils by @kr11 in #92
Autoscaling Workflow Enhancement - Part 2 by @kr11 in #94
Add custom CRD clientset by @varungup90 in #97
Autoscaling Workflow Enhancement - Part 3 by @kr11 in #101
[Core] Add Downloader implementation for runtime by @brosoul in #96
Add RayClusterReplicaSet and RayClusterFleet apis by @Jeffwan in #103
Apply crd:maxDescLen=0 in manifest generation by @Jeffwan in #108
Apply filter to objects owned by model adapters by @varungup90 in #111
Add custom cache and interface for model adapter scheduling by @varungup90 in #100
Refactor gateway package by @varungup90 in #112
BatchAPI storage component together with test by @xinchen384 in #104
Update the installation guidance and README.md by @Jeffwan in #115
[CI] Package AI Runtime by @brosoul in #118
Add gateway installation by @varungup90 in #122
[CI] Support container image build and push in CI by @Jeffwan in #120
[CI] Fix nightly image push error by @Jeffwan in #127
[Bug] Fix download bugs during download benchmark by @brosoul in #134
Autoscaling Workflow Enhancement - Part 4: Integrating MetricClient into Autoscaling Workflow by @kr11 in #116
Update make generate by @varungup90 in #132
Model adapter controller improvement and refactor by @Jeffwan in #135
Improve the aibrix installation scripts by @Jeffwan in #141
[CI] Support python package publish by @brosoul in #138
Fix some typo and naming issues by @Jeffwan in #150
Fix gateway bootstrap issues by @varungup90 in #154
Add kubeconfig flag for cache initialization by @varungup90 in #155
Using sphinx to generate html pages for our project static site by @xinchen384 in #153
Add finalizer and handle the model unload requests by @Jeffwan in #152
Fix kubeConfig redefined issue and update imagePullPolicy by @Jeffwan in #158
Add expectation lib to allows us to set and wait on expectations by @Jeffwan in #164
Add routing algorithms by @varungup90 in #143
Add readthedocs configuration for CI builds and update theme by @Jeffwan in #169
Add RayClusterReplicaSet initial implementation by @Jeffwan in #165
Add template page for the docs by @Jeffwan in #170
Remove myst_parser from sphinx extensions by @Jeffwan in #172
Update quickstart in the doc by @Jeffwan in #174
Metric standardizing in ai runtime by @brosoul in #163
[Misc] Rename env in runtime by @brosoul in #176
Add readiness check for redis in gateway plugin by @varungup90 in #173
[batch] job manager handles job state transition by @xinchen384 in #180
Add users CRUD API by @varungup90 in #181
Add routing for model adapter by @varungup90 in https:/...

Contributors

Jeffwan, kr11, and 9 other contributors

Assets 4

12 Nov 02:20

github-actions

v0.1.0-rc.5

ac33d8a

v0.1.0-rc.5 Pre-release

Pre-release

Automatically generated release for tag v0.1.0-rc.5.

What's Changed

[doc] update runtime readme by @brosoul in #318
Add env for routing strategy override by @varungup90 in #323
Fix pod autoscaler enqueue issues by @Jeffwan in #329
Autoscaling benchmark by @kr11 in #337
Initial lora benchmark result by @Jeffwan in #321
Adding plotting script by @happyandslow in #338
Update the downloader performance plot by @Jeffwan in #341
Reduce pod metrics refresh interval by @varungup90 in #343
Enable ipv6 for envoy proxy by @varungup90 in #342
Add benchmark scrips for gateway client side changes by @Jeffwan in #340
Update the plots based on feedback by @Jeffwan in #346
[batch] use volcano TOS as batch storage by @xinchen384 in #344
Add check if no pods are present by @varungup90 in #345
Add model exists check by @varungup90 in #353
[Misc] Disable fastapi docs in runtime default action by @brosoul in #350
Add check for acceptable routing strategies by @varungup90 in #352
optimize PA messages: const 'HPA' -> actual pa type by @kr11 in #354
[Misc] Runtime server startup with args by @brosoul in #355
[Misc] Add python format script by @brosoul in #357
optimize benchmark scripts for autoscaler, add more logs by @kr11 in #356
Update the mocked app to cleaner state by @Jeffwan in #361
Update manifests & docs about service httproute naming trick by @Jeffwan in #362
Add reference grant to support httprouting for different namespace by @varungup90 in #347
Validate routing strategy bug fix by @varungup90 in #364
Bug fix for setting routing strategy via env var by @varungup90 in #369
Improve the routing env value & flag retrieval by @Jeffwan in #373
Sync main branch changes to release-0.1 branch by @Jeffwan in #375
Cut v0.1.0-rc.5 release by @Jeffwan in #376

Full Changelog: v0.1.0-rc.4...v0.1.0-rc.5

Contributors

Jeffwan, kr11, and 4 other contributors

Assets 4

22 Oct 20:56

Jeffwan

v0.1.0-rc.4

a875f40

v0.1.0-rc.4 Pre-release

Pre-release

What's Changed

[Misc] Add sync images step and scripts in release process by @Jeffwan in #283
[batch] E2E works with driver and request proxy by @xinchen384 in #272
Fix address already in use when AIRuntime start in pod by @brosoul in #289
Read model name from request body by @varungup90 in #290
Fix redis bootstrap flaky connection issue by @varungup90 in #293
skip docs CI if no changes in /docs dir by @varungup90 in #294
Improve Rayclusterreplicaset Status by @Yicheng-Lu-llll in #295
Add request trace for profiling by @varungup90 in #291
Update the crd definiton due to runtime upgrade by @Jeffwan in #298
Push images to Github registry in release pipeline by @Jeffwan in #301
Build autoscaler abstractions like fetcher, client and scaler by @Jeffwan in #300
Support pod autoscaler periodically check by @Jeffwan in #306
Add timeout in nc check for redis bootstrap by @varungup90 in #309
Refactor AutoScaler: metricClient, context, reconcile by @kr11 in #308
Cut v0.1.0-rc.4 release by @Jeffwan in #314

New Contributors

@Yicheng-Lu-llll made their first contribution in #295

Full Changelog: v0.1.0-rc.3...v0.1.0-rc.4

Contributors

Jeffwan, kr11, and 4 other contributors

Assets 2

09 Oct 04:58

github-actions

v0.1.0-rc.3

5165d11

v0.1.0-rc.3 Pre-release

Pre-release

Automatically generated release for tag v0.1.0-rc.3.

What's Changed

Add model adapter and multi-node inference docs by @Jeffwan in #222
add gateway docs by @varungup90 in #232
[Misc] add Runtime dependency for hf_transfer by @brosoul in #240
Add validation for username and rpm/tpm negative value by @varungup90 in #241
[CI] Merge python wheel publish process to release build pipeline by @brosoul in #247
[CI] Push images to Github container registry by @Jeffwan in #246
[CI] Fix post-submit container push failure by @Jeffwan in #249
[Misc] Infer model name from model_uri and check AWS credential by @brosoul in #250
[Misc ]Add runtime api metrics by @brosoul in #251
[doc] Update release/contribution/quickstart docs by @Jeffwan in #242
[batch] job FIFO scheduler as baseline by @xinchen384 in #231
[Misc] Improve the installation component sequence by @Jeffwan in #252
Fix concurrency issue with gateway RPM plugin by @varungup90 in #244
Improve model adapter reliability and stability by @Jeffwan in #257
Remove underscore from dir names and remove account word in rate limiter by @varungup90 in #271
[Misc] Use klog as the logr implementation by @Jeffwan in #264
[CI] Unify Dockerfile names and simplify the build scripts by @Jeffwan in #263
Improve model adapter reconcile workflow stability by @Jeffwan in #260
Add container override for images by @varungup90 in #273
Add AIBrix Custom Autoscaling Algorithm APA by @kr11 in #223
Use vllm metrics for routing by @varungup90 in #274
Update random routing section and add support for anonymous user by @varungup90 in #276
Add image build details and examples for multi-host inference by @Jeffwan in #278
Cut v0.1.0-rc.3 release by @Jeffwan in #280

Full Changelog: v0.1.0-rc.2...v0.1.0-rc.3

Contributors

Jeffwan, kr11, and 3 other contributors

Assets 4

25 Sep 18:20

github-actions

v0.1.0-rc.2

bd39d38

v0.1.0-rc.2 Pre-release

Pre-release

Automatically generated release for tag v0.1.0-rc.2.

What's Changed

Fix kubeConfig redefined issue and update imagePullPolicy by @Jeffwan in #158
Add expectation lib to allows us to set and wait on expectations by @Jeffwan in #164
Add routing algorithms by @varungup90 in #143
Add readthedocs configuration for CI builds and update theme by @Jeffwan in #169
Add RayClusterReplicaSet initial implementation by @Jeffwan in #165
Add template page for the docs by @Jeffwan in #170
Remove myst_parser from sphinx extensions by @Jeffwan in #172
Update quickstart in the doc by @Jeffwan in #174
Metric standardizing in ai runtime by @brosoul in #163
[Misc] Rename env in runtime by @brosoul in #176
Add readiness check for redis in gateway plugin by @varungup90 in #173
[batch] job manager handles job state transition by @xinchen384 in #180
Add users CRUD API by @varungup90 in #181
Add routing for model adapter by @varungup90 in #183
Add installation tests and refactor some CI jobs by @Jeffwan in #188
Add release pipeline for images and manifests by @Jeffwan in #189
[Docs] Update Readme on project intro by @xieus in #191
[CI] Add AI Runtime test case by @brosoul in #197
Add AI Runtime exist model check by @brosoul in #198
Implement rayclusterfleet controller by @Jeffwan in #194
klog Level Standardization by @kr11 in #202
Fix RayClusterReplicaSet e2e running issues by @Jeffwan in #200
Add lora adapter management API by @brosoul in #201
Add kuberay manifest as installation dependencies by @Jeffwan in #203
[doc] fix autoscaling readme by @kr11 in #215
[doc] update runtime feature doc by @brosoul in #216
Fix the annotation missing issue for ray workload by @Jeffwan in #218
[CI]: Add python test on different python version by @brosoul in #219
Add Autoscaling Tutorials in format of rst by @kr11 in #225
[Misc] Check AI Runtime download env settings by @brosoul in #221
Cut v0.1.0-rc.2 release by @Jeffwan in #226

New Contributors

@xieus made their first contribution in #191

Full Changelog: v0.1.0-rc.1...v0.1.0-rc.2

Contributors

Jeffwan, kr11, and 4 other contributors

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 New Features Highlights

📊 Feature Enhancements

🛠Infrastructure & CI/CD Upgrades

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Feature Highlights

1. Dynamic LoRa Adapter

2. Gateway Extension Server with Multi-Algorithm Routing Support

3. LLM-specific Autoscaler

4. Unified AI Runtime

Additional Enhancements:

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: vllm-project/aibrix

v0.2.0

🚀 New Features Highlights

📊 Feature Enhancements

🛠Infrastructure & CI/CD Upgrades

What's Changed

Contributors

v0.2.0-rc.2

What's Changed

New Contributors

Contributors

v0.1.2

What's Changed

Contributors

v0.2.0-rc.1

What's Changed

Contributors

v0.1.1

What's Changed

Contributors

v0.1.0

Feature Highlights

1. Dynamic LoRa Adapter

2. Gateway Extension Server with Multi-Algorithm Routing Support

3. LLM-specific Autoscaler

4. Unified AI Runtime

Additional Enhancements:

What's Changed

Contributors

v0.1.0-rc.5

What's Changed

Contributors

v0.1.0-rc.4

What's Changed

New Contributors

Contributors

v0.1.0-rc.3

What's Changed

Contributors

v0.1.0-rc.2

What's Changed

New Contributors

Contributors