Releases: vllm-project/aibrix
v0.2.0
Automatically generated release for tag v0.2.0.
🚀 New Features Highlights
- Distributed KV Cache: Implemented support for managing KV cache across multiple nodes, enhancing performance.
- Cost-Driven Heterogenous Serving: Improved scheduling and inference strategies for mixed GPU environments, optimizing cost and resource utilization. (#371 #430, #509, #598, #554, #598)
- Optimizer Based Autoscaling: Leverage offline profiles of inference server to calculate the number of replicas. (#430, #500, #692, #508)
- Prefix Cache Aware Routing: Added support for routing decisions based on prefix cache hits, improving inference efficiency. (#641, #657)
📊 Feature Enhancements
- LoRA Scheduling Enhancements: Introduced multiple scheduling strategies, including bin packing, least latency, least throughput, and random. (#544)
- Prefix Cache Aware Routing: Added support for routing decisions based on prefix cache hits, improving inference efficiency. (#641)
- Gateway Enhancements: Improved request handling efficiency by enabling streaming in the Envoy gateway. (#377) Enhanced the handling of model registration and invalid cache scenarios. (#542), Introduced fallback strategies to ensure robust request allocation. (#445) Optimized cache store retrieval, reducing unnecessary overhead. (#639) Addressed missing Prometheus config preventing gateway startup. (#441)
- PodAutoscaler Scaling improvements: Improved scaling logic to handle edge cases more efficiently. (#508, #515)
🛠Infrastructure & CI/CD Upgrades
- Parallelized Build Tasks: CI efficiency improvements by running builds in parallel. (#398)
- CrashLoopBackOff Detection in CI: Added monitoring for pod failures in testing workflows. (#444)
- Improved GitHub Actions Cost Efficiency: Optimized triggers and removed unnecessary nightly builds. (#411, #422)
- Integration Tests for Core Components: Added integration tests for autoscalers, routing policies, and deployment configurations. (#616, #620)
What's Changed
- Add envoy gateway streaming support by @varungup90 in #377
- Add client traffic policy to increase per connection buffer size from 32kb to 256kb by @varungup90 in #395
- Misc: add support to metricsSources property of podautoscaler by @zhangjyr in #371
- [Misc] Update runtime server startup command in v0.1.0 by @brosoul in #396
- [CI] improve the ci efficiency by parallelizing the build tasks by @nwangfw in #398
- Fix the ticker interval by removing unnecessary ms by @Jeffwan in #415
- [Misc] Disable specific endpoints logs by @Jeffwan in #418
- [CI] Github Action trigger condition optimized for cost saving by @nwangfw in #411
- [Misc] Fix the mocked app role permission issue by @Jeffwan in #416
- [CI] Nightly tag removed for release branch by @nwangfw in #422
- Enable setting PodAutoscaler configuration via YAML labels by @kr11 in #409
- Update manifest to adopt v0.1.1 images by @Jeffwan in #429
- [Bug]: duplicated http in rest metrics fetcher (#408) by @zhangjyr in #421
- [MISC]: Improve Request Trace Granularity with Version Control by @zhangjyr in #431
- Support histogram metrics from engine in cache by @Jeffwan in #424
- Support fetching metrics from remote Prometheus server by @Jeffwan in #433
- [CI] Add python wheel to release artifact by @Jeffwan in #434
- Fix update cache pod issue and refactor updatePod handler by @Jeffwan in #439
- Extract common metrics structure to types and utils by @Jeffwan in #438
- Fix gateway startup issue due to missing prometheus config by @Jeffwan in #441
- [feat]: GPU Optimizer and Simulator development app by @zhangjyr in #430
- Add selectrandom fallback in routing and only scraping healthy pods by @Jeffwan in #445
- AIBrix Workload Generator / Scenario Simulator by @happyandslow in #428
- CrashLoopBackOff status detection in CI by @nwangfw in #444
- Support installing individual controllers from giant controller-manager by @nwangfw in #442
- Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs by @kr11 in #437
- Support metrics multi labels for different models by @brosoul in #450
- Add health check api interface for runtime by @Jeffwan in #451
- Fix the service name override issue in rolebindings by @Jeffwan in #453
- Reorganize docs/development and docs/tutorial structure by @Jeffwan in #455
- Move tools to separate folders and update mocked app README.md by @Jeffwan in #457
- Fix multi models metric result in PromQL by @brosoul in #458
- Support Azure LLM trace in workload generator by @happyandslow in #462
- Fix autoscaler scalingstrategy switching logic by @nwangfw in #475
- Fix missing handle of PromQL scope is PodMetricScope by @brosoul in #479
- [Misc] Consolidate app and simulator by @zhangjyr in #477
- [Bug] Avoid including sensitive info in Dockerfile ENV by @zhangjyr in #487
- Refactor generator to generate time-based traces by @happyandslow in #478
- [CI] Update deploy workload script in installation test by @nwangfw in #499
- [Bug] handle metricKey creation with MetricsSources by @nwangfw in #498
- Adding Client for Workload Generator Workload File by @happyandslow in #501
- [Feat] Integrate deployment configurations and fix autoscaler/gpu optimizer connectivity by @zhangjyr in #500
- Fix some simulator format issue and add some TODOs by @Jeffwan in #505
- [Bug] Fix the way how podautoscaler handle 0 pods. by @zhangjyr in #508
- [Misc] Improve gpu optimizer debugging on podautoscaler. by @zhangjyr in #509
- Optimize kustomize overlay for volcano engine deployment by @Jeffwan in #512
- [perf] Refact tos downloader in Runtime by @brosoul in #510
- Refactor metric source for customized protocol, port and path by @kr11 in #511
- [Bug] Fixed the yaml of deployments in heterogenous GPU settings to make KPA scaling work as expected. by @zhangjyr in #513
- [Misc] Heterogeneous GPU Optimizer Logging Clean Up by @nwangfw in #514
- Fix KPA bug, and an elaborate KPA test case by @kr11 in #515
- Cut v0.2.0-rc.1 release by @Jeffwan in #516
- [Bug] Accumulated bug fix on controller manager, mock app configuration, and gpu optimizer. by @zhangjyr in #522
- [Misc] Reduced runtime's container image size by @nwangfw in #518
- clean memory scaler object when pa crd is deleted by @kr11 in #520
- Configure autoscaler http client to skip certificate check by @Jeffwan in #530
- [Doc] Update aibrix documentation by @Jeffwan in #533
- Refactor the gateway-plugin and metadata service manifests by @Jeffwan in #531
- Fix the GITHUB_WORKSPACE artifact sharing issue in release workflow by @Jeffwan in #532
- [Misc] Polish the benchmark scripts by @Jeffwan in #525
- Fix APA bugs in creation, add test and demo yaml by @kr11 in #536
- Add VKE IPv4 Testing Cluster Config by @nwangfw in #537
- Support for request length internal trace by @happyandslow in #538
- [Feat] Add download status into runtime downloader by @brosoul in #539
- [Feat] Add runtime model management api by @brosoul in #540
- [gateway] handle the wrong model name and cache inconsistency case by @Jeffwan in #542
- [Docs] fix: update the parameters instruction in readme by @scarlet25151 in #548
- add lora schedulers - bin pack, least latency, least throughput, random by @Aspirin96 in #544
- add request routers - least kv cache, least expected latency by @Aspirin96 in #543
- [Docs] heterogenous gpu docs added by ...
v0.2.0-rc.2
Automatically generated release for tag v0.2.0-rc.2.
What's Changed
- [Bug] Accumulated bug fix on controller manager, mock app configuration, and gpu optimizer. by @zhangjyr in #522
- [Misc] Reduced runtime's container image size by @nwangfw in #518
- clean memory scaler object when pa crd is deleted by @kr11 in #520
- Configure autoscaler http client to skip certificate check by @Jeffwan in #530
- [Doc] Update aibrix documentation by @Jeffwan in #533
- Refactor the gateway-plugin and metadata service manifests by @Jeffwan in #531
- Fix the GITHUB_WORKSPACE artifact sharing issue in release workflow by @Jeffwan in #532
- [Misc] Polish the benchmark scripts by @Jeffwan in #525
- Fix APA bugs in creation, add test and demo yaml by @kr11 in #536
- Add VKE IPv4 Testing Cluster Config by @nwangfw in #537
- Support for request length internal trace by @happyandslow in #538
- [Feat] Add download status into runtime downloader by @brosoul in #539
- [Feat] Add runtime model management api by @brosoul in #540
- [gateway] handle the wrong model name and cache inconsistency case by @Jeffwan in #542
- [Docs] fix: update the parameters instruction in readme by @scarlet25151 in #548
- add lora schedulers - bin pack, least latency, least throughput, random by @Aspirin96 in #544
- add request routers - least kv cache, least expected latency by @Aspirin96 in #543
- [Docs] heterogenous gpu docs added by @nwangfw in #545
- Fix race condition in cache by @varungup90 in #550
- Fix pod internal cache delete handling by @varungup90 in #552
- Handle terminating pod for request routing by @varungup90 in #549
- Support absolute path as lora adapter artifact path by @Jeffwan in #556
- Deadlock fix for cache by @varungup90 in #557
- Mock app log fix for missing metrics warning by @varungup90 in #564
- Add vllm graceful termination configuration by @nwangfw in #568
- Enhance dynamic lora adapter support for auth enabled scenario by @Jeffwan in #571
- Update pyproject.toml to support python 3.12 by @Jeffwan in #579
- [Docs ]Update ai runtime management api and downloader docs by @Jeffwan in #577
- Check the HPA ownerReference in request enqueue by @Jeffwan in #582
- Add request length for traces by @happyandslow in #569
- Support model registration flow using aibrix runtime api by @Jeffwan in #580
- Gateway plugin report total incoming requests and pending requests by @zhangjyr in #554
- Support distributed kv cache orchestration by @Jeffwan in #583
- Grant workflow action permission to write packages by @Jeffwan in #586
- Update routers to use GetPodModelMetric api and misc cleanup in metri… by @varungup90 in #590
- Update upload/download artifact github actions version to v4 by @varungup90 in #591
- Update version in aibrix/python to 0.2.0-rc.2 by @varungup90 in #594
New Contributors
- @scarlet25151 made their first contribution in #548
- @Aspirin96 made their first contribution in #544
Full Changelog: v0.2.0-rc.1...v0.2.0-rc.2
v0.1.2
v0.2.0-rc.1
What's Changed
- Add envoy gateway streaming support by @varungup90 in #377
- Add client traffic policy to increase per connection buffer size from 32kb to 256kb by @varungup90 in #395
- Misc: add support to metricsSources property of podautoscaler by @zhangjyr in #371
- [Misc] Update runtime server startup command in v0.1.0 by @brosoul in #396
- [CI] improve the ci efficiency by parallelizing the build tasks by @nwangfw in #398
- Fix the ticker interval by removing unnecessary ms by @Jeffwan in #415
- [Misc] Disable specific endpoints logs by @Jeffwan in #418
- [CI] Github Action trigger condition optimized for cost saving by @nwangfw in #411
- [Misc] Fix the mocked app role permission issue by @Jeffwan in #416
- [CI] Nightly tag removed for release branch by @nwangfw in #422
- Enable setting PodAutoscaler configuration via YAML labels by @kr11 in #409
- Update manifest to adopt v0.1.1 images by @Jeffwan in #429
- [Bug]: duplicated http in rest metrics fetcher (#408) by @zhangjyr in #421
- [MISC]: Improve Request Trace Granularity with Version Control by @zhangjyr in #431
- Support histogram metrics from engine in cache by @Jeffwan in #424
- Support fetching metrics from remote Prometheus server by @Jeffwan in #433
- [CI] Add python wheel to release artifact by @Jeffwan in #434
- Fix update cache pod issue and refactor updatePod handler by @Jeffwan in #439
- Extract common metrics structure to types and utils by @Jeffwan in #438
- Fix gateway startup issue due to missing prometheus config by @Jeffwan in #441
- [feat]: GPU Optimizer and Simulator development app by @zhangjyr in #430
- Add selectrandom fallback in routing and only scraping healthy pods by @Jeffwan in #445
- AIBrix Workload Generator / Scenario Simulator by @happyandslow in #428
- CrashLoopBackOff status detection in CI by @nwangfw in #444
- Support installing individual controllers from giant controller-manager by @nwangfw in #442
- Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs by @kr11 in #437
- Support metrics multi labels for different models by @brosoul in #450
- Add health check api interface for runtime by @Jeffwan in #451
- Fix the service name override issue in rolebindings by @Jeffwan in #453
- Reorganize docs/development and docs/tutorial structure by @Jeffwan in #455
- Move tools to separate folders and update mocked app README.md by @Jeffwan in #457
- Fix multi models metric result in PromQL by @brosoul in #458
- Support Azure LLM trace in workload generator by @happyandslow in #462
- Fix autoscaler scalingstrategy switching logic by @nwangfw in #475
- Fix missing handle of PromQL scope is PodMetricScope by @brosoul in #479
- [Misc] Consolidate app and simulator by @zhangjyr in #477
- [Bug] Avoid including sensitive info in Dockerfile ENV by @zhangjyr in #487
- Refactor generator to generate time-based traces by @happyandslow in #478
- [CI] Update deploy workload script in installation test by @nwangfw in #499
- [Bug] handle metricKey creation with MetricsSources by @nwangfw in #498
- Adding Client for Workload Generator Workload File by @happyandslow in #501
- [Feat] Integrate deployment configurations and fix autoscaler/gpu optimizer connectivity by @zhangjyr in #500
- Fix some simulator format issue and add some TODOs by @Jeffwan in #505
- [Bug] Fix the way how podautoscaler handle 0 pods. by @zhangjyr in #508
- [Misc] Improve gpu optimizer debugging on podautoscaler. by @zhangjyr in #509
- Optimize kustomize overlay for volcano engine deployment by @Jeffwan in #512
- [perf] Refact tos downloader in Runtime by @brosoul in #510
- Refactor metric source for customized protocol, port and path by @kr11 in #511
- [Bug] Fixed the yaml of deployments in heterogenous GPU settings to make KPA scaling work as expected. by @zhangjyr in #513
- [Misc] Heterogeneous GPU Optimizer Logging Clean Up by @nwangfw in #514
- Fix KPA bug, and an elaborate KPA test case by @kr11 in #515
- Cut v0.2.0-rc.1 release by @Jeffwan in #516
Full Changelog: v0.1.1...v0.2.0-rc.1
v0.1.1
v0.1.0
Feature Highlights
1. Dynamic LoRa Adapter
The Dynamic LoRa Adapter introduces a flexible approach to model adaptation, allowing dynamic management of LoRa models within Kubernetes. This new functionality includes efficient handling of model registration, unloading, and routing, significantly enhancing operational control and scalability for production environments.
2. Gateway Extension Server with Multi-Algorithm Routing Support
We extend the Envoy Gateway through an extension server and the external processing service can inspect and mutate requests and responses. We use this way to extend some features not directly supported in kubernetes service like various routing algorithms, such as least request
, least throughput
, and random
and rate limit feature. This flexibility allows users to fine-tune routing strategies based on their specific application needs, ultimately improving traffic distribution and system performance.
3. LLM-specific Autoscaler
This release integrates multiple autoscaling algorithms, including HPA (Horizontal Pod Autoscaler), KPA (Knative Pod Autoscaler), and APA (AIBrix Pod Autoscaler). The autoscaling framework now features a direct connection to fetch metrics from pods, enabling real-time adjustments based on load and optimized resource utilization.
4. Unified AI Runtime
The AI runtime has been created to support faster model downloading through GPU streaming way, streamlined metrics aggregation, and efficient LoRa request delegation to abstract underlying engine complexities. This runtime provides an optimized environment for deploying and managing machine learning models, making it easier to handle high-volume requests.
Additional Enhancements:
- Doc website: Updated documents, including quick-start guides, installation instructions, and tutorials for autoscaling, make setup and onboarding smoother.
- Benchmarking and Performance Analysis Tools: Integrated tools for benchmarking autoscalers, gateways and lora to monitor and improve system efficiency and performance.
- CI/CD Workflow: The new CI/CD pipeline includes automated image builds, GitHub Actions for testing and linting, and release pipelines for simplified deployment.
What's Changed
- Add common project documents and skeleton folders by @Jeffwan in #4
- Scaffolding aibrix project using kubebuilder by @Jeffwan in #17
- Optimize project layouts by moving controllers to pkg folder by @Jeffwan in #21
- Create Lora api and controller by @Jeffwan in #23
- Rename LoraAdapter to ModelAdapter by @Jeffwan in #25
- Add ModelAdapter API by @Jeffwan in #26
- Use better way to set up controller with Manager by @Jeffwan in #27
- Initial model adapter controller implementation by @Jeffwan in #32
- Add mocked model container for lora adapter fast prototyping by @Jeffwan in #33
- [Misc] Add the PR and issues template by @jsw-zorro in #38
- [Docs] Add example to run vLLM distributed inference using Ray by @Jeffwan in #39
- [Doc] Improve the model adapter mock service by @Jeffwan in #45
- [Misc] Simplify the feature/bug/enhancement template. by @jsw-zorro in #48
- [Misc] Make model adapter controller e2e work by @Jeffwan in #50
- [Docs] A draft version of the contributing guideline document by @kr11 in #47
- [Core] Improve model adapter controller by handling existing resources by @Jeffwan in #54
- [Feat] Initial Implementation of PodAutoscaler Reconciler by @kr11 in #55
- [Docs] Move the sample mocked application to common folder by @Jeffwan in #64
- [Misc] Minor refactor the PodAutoscaler codes by @Jeffwan in #68
- [Core] Add model router controller by @varungup90 in #57
- Add rbac rules in model router by @varungup90 in #71
- [bugs] Add autoscaler RBAC to successfully list horizontalpodautoscalers by @kr11 in #72
- [Misc] Update license info; Add license check by @happyandslow in #73
- add github workflow to lint & test code by @M00nF1sh in #74
- [CI] Fix the golang lint issues by @Jeffwan in #77
- [CI] fix the failures from make test by @Jeffwan in #80
- [Misc] Add code-generator and openapi-gen as dependencies by @Jeffwan in #59
- [Misc] Reconcile hpa, kpa and apa separately by @Jeffwan in #83
- [feat] Add rpm/tpm extension proc plugin by @varungup90 in #79
- Add kpa scale algorithm implementation by @kr11 in #87
- Add host override to query specific pod by @varungup90 in #86
- [Core] init aibrix runtime framework by @brosoul in #88
- Support kpa/apa autoscaling workflow part I by @Jeffwan in #85
- Fix Dockerfile Packaging Issues Related to Go Version and Missing Utils by @kr11 in #92
- Autoscaling Workflow Enhancement - Part 2 by @kr11 in #94
- Add custom CRD clientset by @varungup90 in #97
- Autoscaling Workflow Enhancement - Part 3 by @kr11 in #101
- [Core] Add Downloader implementation for runtime by @brosoul in #96
- Add RayClusterReplicaSet and RayClusterFleet apis by @Jeffwan in #103
- Apply crd:maxDescLen=0 in manifest generation by @Jeffwan in #108
- Apply filter to objects owned by model adapters by @varungup90 in #111
- Add custom cache and interface for model adapter scheduling by @varungup90 in #100
- Refactor gateway package by @varungup90 in #112
- BatchAPI storage component together with test by @xinchen384 in #104
- Update the installation guidance and README.md by @Jeffwan in #115
- [CI] Package AI Runtime by @brosoul in #118
- Add gateway installation by @varungup90 in #122
- [CI] Support container image build and push in CI by @Jeffwan in #120
- [CI] Fix nightly image push error by @Jeffwan in #127
- [Bug] Fix download bugs during download benchmark by @brosoul in #134
- Autoscaling Workflow Enhancement - Part 4: Integrating MetricClient into Autoscaling Workflow by @kr11 in #116
- Update make generate by @varungup90 in #132
- Model adapter controller improvement and refactor by @Jeffwan in #135
- Improve the aibrix installation scripts by @Jeffwan in #141
- [CI] Support python package publish by @brosoul in #138
- Fix some typo and naming issues by @Jeffwan in #150
- Fix gateway bootstrap issues by @varungup90 in #154
- Add kubeconfig flag for cache initialization by @varungup90 in #155
- Using sphinx to generate html pages for our project static site by @xinchen384 in #153
- Add finalizer and handle the model unload requests by @Jeffwan in #152
- Fix kubeConfig redefined issue and update imagePullPolicy by @Jeffwan in #158
- Add expectation lib to allows us to set and wait on expectations by @Jeffwan in #164
- Add routing algorithms by @varungup90 in #143
- Add readthedocs configuration for CI builds and update theme by @Jeffwan in #169
- Add RayClusterReplicaSet initial implementation by @Jeffwan in #165
- Add template page for the docs by @Jeffwan in #170
- Remove myst_parser from sphinx extensions by @Jeffwan in #172
- Update quickstart in the doc by @Jeffwan in #174
- Metric standardizing in ai runtime by @brosoul in #163
- [Misc] Rename env in runtime by @brosoul in #176
- Add readiness check for redis in gateway plugin by @varungup90 in #173
- [batch] job manager handles job state transition by @xinchen384 in #180
- Add users CRUD API by @varungup90 in #181
- Add routing for model adapter by @varungup90 in https:/...
v0.1.0-rc.5
Automatically generated release for tag v0.1.0-rc.5.
What's Changed
- [doc] update runtime readme by @brosoul in #318
- Add env for routing strategy override by @varungup90 in #323
- Fix pod autoscaler enqueue issues by @Jeffwan in #329
- Autoscaling benchmark by @kr11 in #337
- Initial lora benchmark result by @Jeffwan in #321
- Adding plotting script by @happyandslow in #338
- Update the downloader performance plot by @Jeffwan in #341
- Reduce pod metrics refresh interval by @varungup90 in #343
- Enable ipv6 for envoy proxy by @varungup90 in #342
- Add benchmark scrips for gateway client side changes by @Jeffwan in #340
- Update the plots based on feedback by @Jeffwan in #346
- [batch] use volcano TOS as batch storage by @xinchen384 in #344
- Add check if no pods are present by @varungup90 in #345
- Add model exists check by @varungup90 in #353
- [Misc] Disable fastapi docs in runtime default action by @brosoul in #350
- Add check for acceptable routing strategies by @varungup90 in #352
- optimize PA messages: const 'HPA' -> actual pa type by @kr11 in #354
- [Misc] Runtime server startup with args by @brosoul in #355
- [Misc] Add python format script by @brosoul in #357
- optimize benchmark scripts for autoscaler, add more logs by @kr11 in #356
- Update the mocked app to cleaner state by @Jeffwan in #361
- Update manifests & docs about service httproute naming trick by @Jeffwan in #362
- Add reference grant to support httprouting for different namespace by @varungup90 in #347
- Validate routing strategy bug fix by @varungup90 in #364
- Bug fix for setting routing strategy via env var by @varungup90 in #369
- Improve the routing env value & flag retrieval by @Jeffwan in #373
- Sync main branch changes to release-0.1 branch by @Jeffwan in #375
- Cut v0.1.0-rc.5 release by @Jeffwan in #376
Full Changelog: v0.1.0-rc.4...v0.1.0-rc.5
v0.1.0-rc.4
What's Changed
- [Misc] Add sync images step and scripts in release process by @Jeffwan in #283
- [batch] E2E works with driver and request proxy by @xinchen384 in #272
- Fix address already in use when AIRuntime start in pod by @brosoul in #289
- Read model name from request body by @varungup90 in #290
- Fix redis bootstrap flaky connection issue by @varungup90 in #293
- skip docs CI if no changes in /docs dir by @varungup90 in #294
- Improve Rayclusterreplicaset Status by @Yicheng-Lu-llll in #295
- Add request trace for profiling by @varungup90 in #291
- Update the crd definiton due to runtime upgrade by @Jeffwan in #298
- Push images to Github registry in release pipeline by @Jeffwan in #301
- Build autoscaler abstractions like fetcher, client and scaler by @Jeffwan in #300
- Support pod autoscaler periodically check by @Jeffwan in #306
- Add timeout in nc check for redis bootstrap by @varungup90 in #309
- Refactor AutoScaler: metricClient, context, reconcile by @kr11 in #308
- Cut v0.1.0-rc.4 release by @Jeffwan in #314
New Contributors
- @Yicheng-Lu-llll made their first contribution in #295
Full Changelog: v0.1.0-rc.3...v0.1.0-rc.4
v0.1.0-rc.3
Automatically generated release for tag v0.1.0-rc.3.
What's Changed
- Add model adapter and multi-node inference docs by @Jeffwan in #222
- add gateway docs by @varungup90 in #232
- [Misc] add Runtime dependency for hf_transfer by @brosoul in #240
- Add validation for username and rpm/tpm negative value by @varungup90 in #241
- [CI] Merge python wheel publish process to release build pipeline by @brosoul in #247
- [CI] Push images to Github container registry by @Jeffwan in #246
- [CI] Fix post-submit container push failure by @Jeffwan in #249
- [Misc] Infer model name from model_uri and check AWS credential by @brosoul in #250
- [Misc ]Add runtime api metrics by @brosoul in #251
- [doc] Update release/contribution/quickstart docs by @Jeffwan in #242
- [batch] job FIFO scheduler as baseline by @xinchen384 in #231
- [Misc] Improve the installation component sequence by @Jeffwan in #252
- Fix concurrency issue with gateway RPM plugin by @varungup90 in #244
- Improve model adapter reliability and stability by @Jeffwan in #257
- Remove underscore from dir names and remove account word in rate limiter by @varungup90 in #271
- [Misc] Use klog as the logr implementation by @Jeffwan in #264
- [CI] Unify Dockerfile names and simplify the build scripts by @Jeffwan in #263
- Improve model adapter reconcile workflow stability by @Jeffwan in #260
- Add container override for images by @varungup90 in #273
- Add AIBrix Custom Autoscaling Algorithm APA by @kr11 in #223
- Use vllm metrics for routing by @varungup90 in #274
- Update random routing section and add support for anonymous user by @varungup90 in #276
- Add image build details and examples for multi-host inference by @Jeffwan in #278
- Cut v0.1.0-rc.3 release by @Jeffwan in #280
Full Changelog: v0.1.0-rc.2...v0.1.0-rc.3
v0.1.0-rc.2
Automatically generated release for tag v0.1.0-rc.2.
What's Changed
- Fix kubeConfig redefined issue and update imagePullPolicy by @Jeffwan in #158
- Add expectation lib to allows us to set and wait on expectations by @Jeffwan in #164
- Add routing algorithms by @varungup90 in #143
- Add readthedocs configuration for CI builds and update theme by @Jeffwan in #169
- Add RayClusterReplicaSet initial implementation by @Jeffwan in #165
- Add template page for the docs by @Jeffwan in #170
- Remove myst_parser from sphinx extensions by @Jeffwan in #172
- Update quickstart in the doc by @Jeffwan in #174
- Metric standardizing in ai runtime by @brosoul in #163
- [Misc] Rename env in runtime by @brosoul in #176
- Add readiness check for redis in gateway plugin by @varungup90 in #173
- [batch] job manager handles job state transition by @xinchen384 in #180
- Add users CRUD API by @varungup90 in #181
- Add routing for model adapter by @varungup90 in #183
- Add installation tests and refactor some CI jobs by @Jeffwan in #188
- Add release pipeline for images and manifests by @Jeffwan in #189
- [Docs] Update Readme on project intro by @xieus in #191
- [CI] Add AI Runtime test case by @brosoul in #197
- Add AI Runtime exist model check by @brosoul in #198
- Implement rayclusterfleet controller by @Jeffwan in #194
- klog Level Standardization by @kr11 in #202
- Fix RayClusterReplicaSet e2e running issues by @Jeffwan in #200
- Add lora adapter management API by @brosoul in #201
- Add kuberay manifest as installation dependencies by @Jeffwan in #203
- [doc] fix autoscaling readme by @kr11 in #215
- [doc] update runtime feature doc by @brosoul in #216
- Fix the annotation missing issue for ray workload by @Jeffwan in #218
- [CI]: Add python test on different python version by @brosoul in #219
- Add Autoscaling Tutorials in format of rst by @kr11 in #225
- [Misc] Check AI Runtime download env settings by @brosoul in #221
- Cut v0.1.0-rc.2 release by @Jeffwan in #226
New Contributors
Full Changelog: v0.1.0-rc.1...v0.1.0-rc.2