added rl config (Use RL for model training #586) #650

fahuddin · 2025-11-14T06:15:13Z

[Feat] Add Reinforcement Learning Configuration Options for Classifier Training

FIX

Link to related issue if applicable

Summary

This PR adds foundational support for Reinforcement Learning (RL) options in classifier model training. The implementation introduces a configuration schema and parsing infrastructure that allows RL-based training to be toggled and configured from config/config.yaml without disrupting the existing supervised LoRA training pipeline.

Changes

1. Core Configuration Infrastructure

`candle-binding/src/core/config_loader.rs`

Added RLConfig struct with fields:
- enabled: Toggle RL training on/off
- algorithm: Algorithm selection (e.g., "ppo", "a2c", "dqn")
- learning_rate: Learning rate for RL policy updates (default: 1e-5)
- gamma: Discount factor for reward accumulation (default: 0.99)
- batch_size: Batch size for RL training (default: 16)
- update_epochs: Number of policy update epochs per rollout (default: 4)
- reward_metric: Metric to compute reward signals (e.g., "accuracy", "f1", default: "accuracy")
Added GlobalConfigLoader::load_classifier_rl_config() method to parse RL options from config/config.yaml under classifier.rl_training key
Added GlobalConfigLoader::load_classifier_rl_config_safe() safe wrapper with sensible defaults fallback
All parsing uses existing YAML hierarchical path extraction for consistency with other config loaders

2. YAML Configuration Schema

`config/config.yaml`

Added classifier.rl_training block with documented defaults:

classifier:
  rl_training:
    enabled: false            # RL training toggle
    algorithm: "ppo"        # Algorithm choice
    learning_rate: 1e-05
    gamma: 0.99
    batch_size: 16
    update_epochs: 4
    reward_metric: "accuracy"

netlify · 2025-11-14T06:15:18Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`46ea0fe`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/69180cce69b7f00008faf1f4
😎 Deploy Preview	https://deploy-preview-650--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2025-11-14T06:15:26Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `Root Directory`

Owners: @rootfs, @Xunzhuo
Files changed:

docs/RL_IMPLEMENTATION_GUIDE.md
docs/RL_INTEGRATION_SUMMARY.md
docs/RL_QUICKSTART.md
docs/RL_WHAT_WAS_DELIVERED.md
tests/test_intent_rl.py

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/training/training_lora/rl_ppo_trainer.py
src/training/training_lora/rl_utils.py
src/training/training_lora/train_with_rl_example.py
src/training/training_lora/README.md
src/training/training_lora/classifier_model_fine_tuning_lora/ft_linear_lora.py

📁 `candle-binding`

Owners: @rootfs
Files changed:

candle-binding/src/core/config_loader.rs

📁 `config`

Owners: @rootfs, @Xunzhuo
Files changed:

config/config.yaml

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Xunzhuo · 2025-11-14T14:40:03Z

@fahuddin can you explain what you are trying to do in this PR?

fahuddin · 2025-11-14T18:45:20Z

Hi. Based on issue Use RL for model training #586, I just set up a RLConfig struct and added new fields to accomdate for RL training. Please correct me if I'm wrong.

…into add-r1

added rl config

4a3f337

fahuddin requested review from Xunzhuo, rootfs and wangchen615 as code owners November 14, 2025 06:15

github-actions bot assigned rootfs, wangchen615 and Xunzhuo Nov 14, 2025

fahuddin and others added 3 commits November 15, 2025 10:03

Merge branch 'main' into add-r1

d37bd09

added docs and tests

3b5bd56

Merge branch 'add-r1' of https://github.com/fahuddin/semantic-router-1 …

46ea0fe

…into add-r1

github-actions bot deleted a comment from blaji-villeb106 Nov 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

added rl config (Use RL for model training #586) #650

added rl config (Use RL for model training #586) #650

fahuddin commented Nov 14, 2025

Uh oh!

netlify bot commented Nov 14, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 14, 2025 •

edited

Loading

Uh oh!

Xunzhuo commented Nov 14, 2025

Uh oh!

fahuddin commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

added rl config (Use RL for model training #586) #650

Are you sure you want to change the base?

added rl config (Use RL for model training #586) #650

Conversation

fahuddin commented Nov 14, 2025

[Feat] Add Reinforcement Learning Configuration Options for Classifier Training

FIX

Summary

Changes

1. Core Configuration Infrastructure

candle-binding/src/core/config_loader.rs

2. YAML Configuration Schema

config/config.yaml

Uh oh!

netlify bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 Root Directory

📁 src

📁 candle-binding

📁 config

🎉 Thanks for your contributions!

Uh oh!

Xunzhuo commented Nov 14, 2025

Uh oh!

fahuddin commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

`candle-binding/src/core/config_loader.rs`

`config/config.yaml`

netlify bot commented Nov 14, 2025 •

edited

Loading

github-actions bot commented Nov 14, 2025 •

edited

Loading

📁 `Root Directory`

📁 `src`

📁 `candle-binding`

📁 `config`