Skip to content

Conversation

@fahuddin
Copy link

[Feat] Add Reinforcement Learning Configuration Options for Classifier Training

FIX

Link to related issue if applicable

Summary

This PR adds foundational support for Reinforcement Learning (RL) options in classifier model training. The implementation introduces a configuration schema and parsing infrastructure that allows RL-based training to be toggled and configured from config/config.yaml without disrupting the existing supervised LoRA training pipeline.

Changes

1. Core Configuration Infrastructure

candle-binding/src/core/config_loader.rs

  • Added RLConfig struct with fields:

    • enabled: Toggle RL training on/off
    • algorithm: Algorithm selection (e.g., "ppo", "a2c", "dqn")
    • learning_rate: Learning rate for RL policy updates (default: 1e-5)
    • gamma: Discount factor for reward accumulation (default: 0.99)
    • batch_size: Batch size for RL training (default: 16)
    • update_epochs: Number of policy update epochs per rollout (default: 4)
    • reward_metric: Metric to compute reward signals (e.g., "accuracy", "f1", default: "accuracy")
  • Added GlobalConfigLoader::load_classifier_rl_config() method to parse RL options from config/config.yaml under classifier.rl_training key

  • Added GlobalConfigLoader::load_classifier_rl_config_safe() safe wrapper with sensible defaults fallback

  • All parsing uses existing YAML hierarchical path extraction for consistency with other config loaders

2. YAML Configuration Schema

config/config.yaml

  • Added classifier.rl_training block with documented defaults:
    classifier:
      rl_training:
        enabled: false            # RL training toggle
        algorithm: "ppo"        # Algorithm choice
        learning_rate: 1e-05
        gamma: 0.99
        batch_size: 16
        update_epochs: 4
        reward_metric: "accuracy"

@netlify
Copy link

netlify bot commented Nov 14, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 46ea0fe
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/69180cce69b7f00008faf1f4
😎 Deploy Preview https://deploy-preview-650--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

github-actions bot commented Nov 14, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • docs/RL_IMPLEMENTATION_GUIDE.md
  • docs/RL_INTEGRATION_SUMMARY.md
  • docs/RL_QUICKSTART.md
  • docs/RL_WHAT_WAS_DELIVERED.md
  • tests/test_intent_rl.py

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/training/training_lora/rl_ppo_trainer.py
  • src/training/training_lora/rl_utils.py
  • src/training/training_lora/train_with_rl_example.py
  • src/training/training_lora/README.md
  • src/training/training_lora/classifier_model_fine_tuning_lora/ft_linear_lora.py

📁 candle-binding

Owners: @rootfs
Files changed:

  • candle-binding/src/core/config_loader.rs

📁 config

Owners: @rootfs, @Xunzhuo
Files changed:

  • config/config.yaml

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@Xunzhuo
Copy link
Member

Xunzhuo commented Nov 14, 2025

@fahuddin can you explain what you are trying to do in this PR?

@fahuddin
Copy link
Author

Hi. Based on issue Use RL for model training #586, I just set up a RLConfig struct and added new fields to accomdate for RL training. Please correct me if I'm wrong.

@github-actions github-actions bot deleted a comment from blaji-villeb106 Nov 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants