Skip to content

Conversation

@rootfs
Copy link
Collaborator

@rootfs rootfs commented Oct 21, 2025

What type of PR is this?

This hybrid cache is combination of in-memory index and milvus persistent storage, allowing doc search in router's memory and only retrieving cached response if the similar doc is found, avoiding round-trip to milvus on every chat request. This is especially helpful when semantically similar contents are low.

graph TB
    subgraph "Client Layer"
        Client[LLM Client]
    end
    
    subgraph "Cache Layer"
        HC[Hybrid Cache]
        subgraph "In-Memory Components"
            HNSW[HNSW Graph Index]
            EMB[Embedding Cache]
            IDMAP[ID Mapping]
        end
        subgraph "External Storage"
            MILVUS[(Milvus Vector DB)]
        end
    end
    
    subgraph "LLM Layer"
        LLM[Large Language Model]
    end
    
    Client -->|1. Query| HC
    HC -->|2. Generate Embedding| EMB
    EMB -->|3. Search| HNSW
    HNSW -->|4. Top-K IDs| IDMAP
    IDMAP -->|5. Milvus IDs| MILVUS
    MILVUS -->|6. Documents| HC
    HC -->|7. Cache Hit| Client
    HC -->|8. Cache Miss| LLM
    LLM -->|9. Response| HC
    HC -->|10. Store| MILVUS
    HC -->|11. Update Index| HNSW
    
    style HNSW fill:#90EE90
    style EMB fill:#90EE90
    style IDMAP fill:#90EE90
    style MILVUS fill:#87CEEB
Loading

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Release Notes: Yes/No

@netlify
Copy link

netlify bot commented Oct 21, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit b929ec5
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68f93219fdb8bc00097c8e09
😎 Deploy Preview https://deploy-preview-504--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

github-actions bot commented Oct 21, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 config

Owners: @rootfs
Files changed:

  • config/config.hybrid.yaml
  • config/config.development.yaml
  • config/config.yaml

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/cache/comprehensive_benchmark_test.go
  • src/semantic-router/pkg/cache/hybrid_cache.go
  • src/semantic-router/pkg/cache/hybrid_cache_test.go
  • src/semantic-router/pkg/cache/hybrid_vs_milvus_benchmark_test.go
  • src/semantic-router/pkg/cache/large_scale_benchmark_test.go
  • src/semantic-router/pkg/cache/simd_benchmark_test.go
  • src/semantic-router/pkg/cache/simd_distance_amd64.go
  • src/semantic-router/pkg/cache/simd_distance_amd64.s
  • src/semantic-router/pkg/cache/simd_distance_generic.go
  • src/semantic-router/go.mod
  • src/semantic-router/go.sum
  • src/semantic-router/pkg/cache/cache_factory.go
  • src/semantic-router/pkg/cache/cache_interface.go
  • src/semantic-router/pkg/cache/inmemory_cache.go
  • src/semantic-router/pkg/cache/inmemory_cache_integration_test.go
  • src/semantic-router/pkg/cache/milvus_cache.go

📁 website

Owners: @Xunzhuo, @rootfs, @yuluo-yx
Files changed:

  • website/docs/tutorials/semantic-cache/hybrid-cache.md
  • website/docs/tutorials/semantic-cache/in-memory-cache.md

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • .github/workflows/pre-commit.yml
  • .github/workflows/publish-crate.yml
  • .github/workflows/test-and-build.yml
  • .pre-commit-config.yaml
  • Dockerfile.extproc
  • Dockerfile.extproc.cross

📁 candle-binding

Owners: @rootfs
Files changed:

  • candle-binding/Cargo.lock
  • candle-binding/Cargo.toml

📁 tools

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

  • tools/make/build-run-test.mk
  • tools/make/milvus.mk
  • tools/make/rust.mk

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@rootfs rootfs marked this pull request as draft October 21, 2025 20:36
@rootfs rootfs force-pushed the hybrid-cache branch 2 times, most recently from 7ba854b to 60dac1f Compare October 21, 2025 22:46
@rootfs rootfs requested a review from Copilot October 22, 2025 16:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a hybrid cache architecture that combines an in-memory HNSW (Hierarchical Navigable Small World) index for fast similarity search with Milvus vector database for scalable, persistent storage. The hybrid approach provides O(log n) search performance while supporting millions of entries without storing full documents in memory.

Key changes:

  • Introduces hybrid cache backend combining HNSW index with Milvus storage
  • Adds SIMD-optimized (AVX2/AVX-512) dot product operations for vector similarity calculations
  • Implements comprehensive benchmarking suite comparing hybrid vs pure Milvus performance

Reviewed Changes

Copilot reviewed 30 out of 32 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/semantic-router/pkg/cache/hybrid_cache.go Core hybrid cache implementation with HNSW index and Milvus integration
src/semantic-router/pkg/cache/inmemory_cache.go Extended in-memory cache with HNSW index support
src/semantic-router/pkg/cache/simd_distance_amd64.go SIMD-optimized vector operations for x86-64
src/semantic-router/pkg/cache/milvus_cache.go Added batch operations and GetByID for hybrid cache
website/docs/tutorials/semantic-cache/hybrid-cache.md Documentation for hybrid cache architecture
tools/make/rust.mk Added rust-ci target for CI/CD without CUDA
config/config.yaml Updated configuration with HNSW parameters

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

}

// Generate unique ID
id := fmt.Sprintf("%x", md5.Sum(fmt.Appendf(nil, "%s_%s_%d", entry.Model, entry.Query, time.Now().UnixNano())))
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt.Appendf does not exist in the fmt package. This should be fmt.Sprintf to format the string before passing to md5.Sum, or use a bytes.Buffer with fmt.Fprintf.

Suggested change
id := fmt.Sprintf("%x", md5.Sum(fmt.Appendf(nil, "%s_%s_%d", entry.Model, entry.Query, time.Now().UnixNano())))
id := fmt.Sprintf("%x", md5.Sum([]byte(fmt.Sprintf("%s_%s_%d", entry.Model, entry.Query, time.Now().UnixNano()))))

Copilot uses AI. Check for mistakes.

Comment on lines 575 to 585
// Suppress context error to avoid noise
_ = ctx

Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment and blank identifier assignment serves no purpose since ctx is a local variable that would be garbage collected anyway. Remove these lines.

Suggested change
// Suppress context error to avoid noise
_ = ctx

Copilot uses AI. Check for mistakes.

func generateQuery(length ContentLength, index int) string {
// Hash the index to get pseudo-random values (deterministic but well-distributed)
hash := uint64(index) // #nosec G115 -- index is always positive and bounded
hash = hash*2654435761 + 1013904223 // Knuth's multiplicative hash
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment references 'Knuth's multiplicative hash' but the constants used (2654435761 and 1013904223) are not from Knuth's algorithm. Knuth's multiplicative hash uses 2654435761 alone with modulo 2^32. Clarify the algorithm or correct the implementation.

Suggested change
hash = hash*2654435761 + 1013904223 // Knuth's multiplicative hash
hash = hash * 2654435761 // Knuth's multiplicative hash (multiply only, no addition)

Copilot uses AI. Check for mistakes.

Comment on lines 130 to 132
projectRoot := "/home/ubuntu/rootfs/back/semantic-router.bak"
if envRoot := os.Getenv("PROJECT_ROOT"); envRoot != "" {
projectRoot = envRoot
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded absolute path is environment-specific and will fail in other environments. Use relative path resolution or require PROJECT_ROOT environment variable to be set.

Suggested change
projectRoot := "/home/ubuntu/rootfs/back/semantic-router.bak"
if envRoot := os.Getenv("PROJECT_ROOT"); envRoot != "" {
projectRoot = envRoot
projectRoot := os.Getenv("PROJECT_ROOT")
if projectRoot == "" {
b.Fatalf("PROJECT_ROOT environment variable must be set for benchmark output directory")

Copilot uses AI. Check for mistakes.

Comment on lines 95 to 102
// Try absolute path first (for direct test execution)
configPath := "/home/ubuntu/rootfs/back/semantic-router.bak/config/cache/milvus.yaml"
if _, err := os.Stat(configPath); err == nil {
return configPath
}

// Try relative from project root (when run via make)
configPath = "config/cache/milvus.yaml"
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded absolute path is environment-specific and will fail in other environments. Use relative path resolution or require an environment variable.

Suggested change
// Try absolute path first (for direct test execution)
configPath := "/home/ubuntu/rootfs/back/semantic-router.bak/config/cache/milvus.yaml"
if _, err := os.Stat(configPath); err == nil {
return configPath
}
// Try relative from project root (when run via make)
configPath = "config/cache/milvus.yaml"
// Check for environment variable first
if envPath := os.Getenv("MILVUS_CONFIG_PATH"); envPath != "" {
if _, err := os.Stat(envPath); err == nil {
return envPath
}
}
// Try relative from project root (when run via make)
configPath := "config/cache/milvus.yaml"

Copilot uses AI. Check for mistakes.

Comment on lines 51 to 53
if len(buf.visited) > 1000 || cap(buf.candidates.data) > 200 || cap(buf.results.data) > 200 {
return
}
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic numbers (1000, 200, 200) should be defined as named constants to clarify their purpose and make them easier to tune.

Copilot uses AI. Check for mistakes.

Comment on lines 784 to 798
level := 0
for level < 16 { // Max 16 layers
if randFloat() > h.hnswIndex.ml {
break
}
level++
}
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number 16 (max layers) should be defined as a constant to make it configurable and document the architectural limitation.

Copilot uses AI. Check for mistakes.

@rootfs rootfs requested a review from Copilot October 22, 2025 19:15
…cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 30 out of 32 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

vec_b[i] = rand.Float32()
}

b.Run("SIMD/"+string(rune(size)), func(b *testing.B) {
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting an integer to a rune and then to string produces incorrect benchmark names (will show unicode characters instead of numbers). Use fmt.Sprintf(\"SIMD/%d\", size) instead.

Copilot uses AI. Check for mistakes.

_ = sum
})

b.Run("Scalar/"+string(rune(size)), func(b *testing.B) {
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as SIMD benchmark: converting integer to rune produces incorrect benchmark names. Use fmt.Sprintf(\"Scalar/%d\", size) instead.

Copilot uses AI. Check for mistakes.

Comment on lines 210 to 211
if !config.Development.AutoCreateCollection && !config.Development.DropCollectionOnStartup {
fmt.Printf("[WARN] Development settings parsed as false, forcing to true for benchmarks\n")
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workaround silently overrides configuration settings, which could lead to unexpected behavior in production. The YAML parsing issue should be fixed at its root cause rather than working around it. If a workaround is necessary, add environment variable guards to ensure it only applies in test/benchmark scenarios.

Suggested change
if !config.Development.AutoCreateCollection && !config.Development.DropCollectionOnStartup {
fmt.Printf("[WARN] Development settings parsed as false, forcing to true for benchmarks\n")
// Only apply this workaround if SR_BENCHMARK_MODE or SR_TEST_MODE is set
benchmarkMode := os.Getenv("SR_BENCHMARK_MODE")
testMode := os.Getenv("SR_TEST_MODE")
if (benchmarkMode == "1" || benchmarkMode == "true" || testMode == "1" || testMode == "true") &&
!config.Development.AutoCreateCollection && !config.Development.DropCollectionOnStartup {
fmt.Printf("[WARN] Development settings parsed as false, forcing to true for benchmarks/tests\n")

Copilot uses AI. Check for mistakes.

Comment on lines 583 to 584
// Suppress context error to avoid noise
_ = ctx
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context is intentionally ignored with a blank assignment. This appears to be dead code to suppress unused variable warnings. Consider removing the context variable or using it properly for timeout/cancellation.

Suggested change
// Suppress context error to avoid noise
_ = ctx
// No match found above threshold, return miss

Copilot uses AI. Check for mistakes.

Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
@rootfs rootfs marked this pull request as ready for review October 22, 2025 19:21
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
@rootfs rootfs merged commit 7abe96a into vllm-project:main Oct 23, 2025
22 checks passed
rootfs added a commit to rootfs/semantic-router.bak that referenced this pull request Oct 23, 2025
…d doc store (vllm-project#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <[email protected]>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <[email protected]>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <[email protected]>

* rebuild index upon restart

Signed-off-by: Huamin Chen <[email protected]>

* precommit fix

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* disable cuda build on ci

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>
rootfs added a commit that referenced this pull request Oct 23, 2025
* Update test description from Math to General (#483)

Signed-off-by: carlory <[email protected]>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <[email protected]>

* try fix network error

Signed-off-by: JaredforReal <[email protected]>

* more

---------

Signed-off-by: JaredforReal <[email protected]>
Co-authored-by: bitliu <[email protected]>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

---------

Signed-off-by: yuluo-yx <[email protected]>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <[email protected]>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <[email protected]>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <[email protected]>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <[email protected]>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <[email protected]>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <[email protected]>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <[email protected]>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <[email protected]>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <[email protected]>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <[email protected]>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <[email protected]>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <[email protected]>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <[email protected]>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <[email protected]>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <[email protected]>

* rebuild index upon restart

Signed-off-by: Huamin Chen <[email protected]>

* precommit fix

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* disable cuda build on ci

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: carlory <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: yuluo-yx <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
Signed-off-by: cryo <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: 杨朱 · Kiki <[email protected]>
Co-authored-by: Jared <[email protected]>
Co-authored-by: bitliu <[email protected]>
Co-authored-by: shown <[email protected]>
Co-authored-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: cryo <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>
rootfs added a commit that referenced this pull request Oct 23, 2025
* Update test description from Math to General (#483)

Signed-off-by: carlory <[email protected]>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <[email protected]>

* try fix network error

Signed-off-by: JaredforReal <[email protected]>

* more

---------

Signed-off-by: JaredforReal <[email protected]>
Co-authored-by: bitliu <[email protected]>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

---------

Signed-off-by: yuluo-yx <[email protected]>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <[email protected]>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <[email protected]>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <[email protected]>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <[email protected]>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <[email protected]>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <[email protected]>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <[email protected]>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <[email protected]>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <[email protected]>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <[email protected]>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <[email protected]>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <[email protected]>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <[email protected]>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <[email protected]>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <[email protected]>

* rebuild index upon restart

Signed-off-by: Huamin Chen <[email protected]>

* precommit fix

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* disable cuda build on ci

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: carlory <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: yuluo-yx <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
Signed-off-by: cryo <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: 杨朱 · Kiki <[email protected]>
Co-authored-by: Jared <[email protected]>
Co-authored-by: bitliu <[email protected]>
Co-authored-by: shown <[email protected]>
Co-authored-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: cryo <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>
rootfs added a commit that referenced this pull request Oct 23, 2025
* Update test description from Math to General (#483)

Signed-off-by: carlory <[email protected]>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <[email protected]>

* try fix network error

Signed-off-by: JaredforReal <[email protected]>

* more

---------

Signed-off-by: JaredforReal <[email protected]>
Co-authored-by: bitliu <[email protected]>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

---------

Signed-off-by: yuluo-yx <[email protected]>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <[email protected]>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <[email protected]>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <[email protected]>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <[email protected]>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <[email protected]>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <[email protected]>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <[email protected]>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <[email protected]>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <[email protected]>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <[email protected]>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <[email protected]>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <[email protected]>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <[email protected]>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <[email protected]>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <[email protected]>

* rebuild index upon restart

Signed-off-by: Huamin Chen <[email protected]>

* precommit fix

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* disable cuda build on ci

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>

* merge main to feat branch

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: carlory <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: yuluo-yx <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
Signed-off-by: cryo <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: 杨朱 · Kiki <[email protected]>
Co-authored-by: Jared <[email protected]>
Co-authored-by: bitliu <[email protected]>
Co-authored-by: shown <[email protected]>
Co-authored-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: cryo <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>
rootfs added a commit that referenced this pull request Oct 23, 2025
* Update test description from Math to General (#483)

Signed-off-by: carlory <[email protected]>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <[email protected]>

* try fix network error

Signed-off-by: JaredforReal <[email protected]>

* more

---------

Signed-off-by: JaredforReal <[email protected]>
Co-authored-by: bitliu <[email protected]>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

---------

Signed-off-by: yuluo-yx <[email protected]>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <[email protected]>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <[email protected]>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <[email protected]>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <[email protected]>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <[email protected]>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <[email protected]>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <[email protected]>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <[email protected]>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <[email protected]>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <[email protected]>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <[email protected]>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <[email protected]>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <[email protected]>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <[email protected]>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <[email protected]>

* rebuild index upon restart

Signed-off-by: Huamin Chen <[email protected]>

* precommit fix

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* disable cuda build on ci

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: carlory <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: yuluo-yx <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
Signed-off-by: cryo <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: 杨朱 · Kiki <[email protected]>
Co-authored-by: Jared <[email protected]>
Co-authored-by: bitliu <[email protected]>
Co-authored-by: shown <[email protected]>
Co-authored-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: cryo <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>
rootfs added a commit that referenced this pull request Oct 23, 2025
* Update test description from Math to General (#483)

Signed-off-by: carlory <[email protected]>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <[email protected]>

* try fix network error

Signed-off-by: JaredforReal <[email protected]>

* more

---------

Signed-off-by: JaredforReal <[email protected]>
Co-authored-by: bitliu <[email protected]>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

---------

Signed-off-by: yuluo-yx <[email protected]>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <[email protected]>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <[email protected]>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <[email protected]>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <[email protected]>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <[email protected]>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <[email protected]>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <[email protected]>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <[email protected]>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <[email protected]>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <[email protected]>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <[email protected]>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <[email protected]>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <[email protected]>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <[email protected]>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <[email protected]>

* rebuild index upon restart

Signed-off-by: Huamin Chen <[email protected]>

* precommit fix

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* disable cuda build on ci

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: carlory <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: yuluo-yx <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
Signed-off-by: cryo <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: 杨朱 · Kiki <[email protected]>
Co-authored-by: Jared <[email protected]>
Co-authored-by: bitliu <[email protected]>
Co-authored-by: shown <[email protected]>
Co-authored-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: cryo <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>
rootfs added a commit that referenced this pull request Oct 23, 2025
* Update test description from Math to General (#483)

Signed-off-by: carlory <[email protected]>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <[email protected]>

* try fix network error

Signed-off-by: JaredforReal <[email protected]>

* more

---------

Signed-off-by: JaredforReal <[email protected]>
Co-authored-by: bitliu <[email protected]>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

---------

Signed-off-by: yuluo-yx <[email protected]>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <[email protected]>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <[email protected]>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <[email protected]>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <[email protected]>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <[email protected]>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <[email protected]>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <[email protected]>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <[email protected]>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <[email protected]>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <[email protected]>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <[email protected]>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <[email protected]>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <[email protected]>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <[email protected]>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <[email protected]>

* rebuild index upon restart

Signed-off-by: Huamin Chen <[email protected]>

* precommit fix

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* disable cuda build on ci

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>

* merge main to feat branch

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: carlory <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: yuluo-yx <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
Signed-off-by: cryo <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: 杨朱 · Kiki <[email protected]>
Co-authored-by: Jared <[email protected]>
Co-authored-by: bitliu <[email protected]>
Co-authored-by: shown <[email protected]>
Co-authored-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: cryo <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>
rootfs added a commit that referenced this pull request Oct 24, 2025
* refactor: Implement modular candle-binding architecture (#254)


- Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/)
- Add unified error handling and configuration loading systems
- Implement dual-path architecture for traditional and LoRA models
- Add comprehensive FFI layer with memory safety

Maintains backward compatibility while enabling future model integrations.

refactor: Implement modular candle-binding architecture

- Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/)
- Add unified error handling and configuration loading systems
- Implement dual-path architecture for traditional and LoRA models
- Add comprehensive FFI layer with memory safety

Maintains backward compatibility while enabling future model integrations.

Signed-off-by: OneZero-Y <[email protected]>

* feat:unit tests for candle refactoring (#296)

feat:unit tests for candle refactoring

feat:unit tests for candle refactoring

Signed-off-by: OneZero-Y <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>

* feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) (#453)

feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M)

Signed-off-by: OneZero-Y <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>

* fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers (#464)

Signed-off-by: OneZero-Y <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>

* fix:Improve rust unit test and optimize concurrent tests with rayon (#471)

- Add 6 new unit test files
- Replace std::thread::spawn with rayon::par_iter

Signed-off-by: OneZero-Y <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>

* fix: resolve syntax errors after rebase

Signed-off-by: Huamin Chen <[email protected]>

* add additional update

Signed-off-by: Huamin Chen <[email protected]>

* Change label count params to c_int (#494)

Signed-off-by: carlory <[email protected]>

* update embedding setting in config (#489)

Signed-off-by: Huamin Chen <[email protected]>

* make CUDA and Flash Attention 2 optional features (#511)

Signed-off-by: OneZero-Y <[email protected]>

* fix: Fix duplicate UNIFIED_CLASSIFIER definition and optimize lock contention (#516)

- Remove duplicate UNIFIED_CLASSIFIER global state
- Optimize PARALLEL_LORA_ENGINE lock contention by using Arc clone

Signed-off-by: OneZero-Y <[email protected]>

* Merge main to candle refactoring (#523)

* Update test description from Math to General (#483)

Signed-off-by: carlory <[email protected]>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <[email protected]>

* try fix network error

Signed-off-by: JaredforReal <[email protected]>

* more

---------

Signed-off-by: JaredforReal <[email protected]>
Co-authored-by: bitliu <[email protected]>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

---------

Signed-off-by: yuluo-yx <[email protected]>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <[email protected]>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <[email protected]>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <[email protected]>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <[email protected]>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <[email protected]>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <[email protected]>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <[email protected]>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <[email protected]>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <[email protected]>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <[email protected]>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <[email protected]>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <[email protected]>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <[email protected]>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <[email protected]>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <[email protected]>

* rebuild index upon restart

Signed-off-by: Huamin Chen <[email protected]>

* precommit fix

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* disable cuda build on ci

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: carlory <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: yuluo-yx <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
Signed-off-by: cryo <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: 杨朱 · Kiki <[email protected]>
Co-authored-by: Jared <[email protected]>
Co-authored-by: bitliu <[email protected]>
Co-authored-by: shown <[email protected]>
Co-authored-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: cryo <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Candle refactoring to main (#524)

* Update test description from Math to General (#483)

Signed-off-by: carlory <[email protected]>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <[email protected]>

* try fix network error

Signed-off-by: JaredforReal <[email protected]>

* more

---------

Signed-off-by: JaredforReal <[email protected]>
Co-authored-by: bitliu <[email protected]>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

---------

Signed-off-by: yuluo-yx <[email protected]>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <[email protected]>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <[email protected]>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <[email protected]>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <[email protected]>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <[email protected]>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <[email protected]>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <[email protected]>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <[email protected]>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <[email protected]>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <[email protected]>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <[email protected]>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <[email protected]>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <[email protected]>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <[email protected]>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <[email protected]>

* rebuild index upon restart

Signed-off-by: Huamin Chen <[email protected]>

* precommit fix

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* disable cuda build on ci

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: carlory <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: yuluo-yx <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
Signed-off-by: cryo <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: 杨朱 · Kiki <[email protected]>
Co-authored-by: Jared <[email protected]>
Co-authored-by: bitliu <[email protected]>
Co-authored-by: shown <[email protected]>
Co-authored-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: cryo <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Merge candle refactoring 3 (#525)

* Update test description from Math to General (#483)

Signed-off-by: carlory <[email protected]>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <[email protected]>

* try fix network error

Signed-off-by: JaredforReal <[email protected]>

* more

---------

Signed-off-by: JaredforReal <[email protected]>
Co-authored-by: bitliu <[email protected]>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <[email protected]>

---------

Signed-off-by: yuluo-yx <[email protected]>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <[email protected]>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <[email protected]>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <[email protected]>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <[email protected]>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <[email protected]>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <[email protected]>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <[email protected]>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <[email protected]>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <[email protected]>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <[email protected]>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <[email protected]>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <[email protected]>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <[email protected]>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <[email protected]>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <[email protected]>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <[email protected]>

* rebuild index upon restart

Signed-off-by: Huamin Chen <[email protected]>

* precommit fix

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* fix precommit

Signed-off-by: Huamin Chen <[email protected]>

* disable cuda build on ci

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>

* merge main to feat branch

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: carlory <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: yuluo-yx <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
Signed-off-by: cryo <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: 杨朱 · Kiki <[email protected]>
Co-authored-by: Jared <[email protected]>
Co-authored-by: bitliu <[email protected]>
Co-authored-by: shown <[email protected]>
Co-authored-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: cryo <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>

* chore: fix unit test (#527)

* chore: fix unit test

Signed-off-by: Huamin Chen <[email protected]>

* fix go vet

Signed-off-by: Huamin Chen <[email protected]>

* fix ci

Signed-off-by: Huamin Chen <[email protected]>

* fix ci

Signed-off-by: Huamin Chen <[email protected]>

* split test-binding to two stages on ci

Signed-off-by: Huamin Chen <[email protected]>

* ignore test failure due to embeddinggemma restriction

Signed-off-by: Huamin Chen <[email protected]>

* reorder ci test sequences to avoid missing models

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>

* refactor: Replace lazy_static with OnceLock for zero-cost concurrent reads based on review  (#528)

* refactor: Replace lazy_static with OnceLock for zero-cost concurrent reads based on review #266 (comment)

Signed-off-by: Huamin Chen <[email protected]>

* update tests

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>

* chore: fix lint error (#530)

Signed-off-by: Huamin Chen <[email protected]>

* Fix lint error2 (#531)

* chore: fix lint error

Signed-off-by: Huamin Chen <[email protected]>

* chore: fix lint error

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: OneZero-Y <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: carlory <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: yuluo-yx <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
Signed-off-by: cryo <[email protected]>
Co-authored-by: OneZero-Y <[email protected]>
Co-authored-by: 杨朱 · Kiki <[email protected]>
Co-authored-by: Jared <[email protected]>
Co-authored-by: bitliu <[email protected]>
Co-authored-by: shown <[email protected]>
Co-authored-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: cryo <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants