Skip to content

Conversation

@XuanYang-cn
Copy link
Contributor

@XuanYang-cn XuanYang-cn commented Oct 17, 2025

This commit includes comprehensive benchmark suite to identify remaining bottlenecks

New Benchmarks
Created comprehensive benchmark suite under tests/benchmark/:
- Access patterns: 23 tests measuring real-world usage patterns
- First-access (UI display), iterate-all (export), random-access (pagination)
- Search benchmarks: Various vector types, dimensions, output fields
- Query benchmarks: Scalars, JSON, all output types
- Hybrid search: Multiple requests, varying top-k

Profiling Infrastructure
- Mock framework for client-only testing (no server required)
- Integrated pytest-memray for memory profiling
- Added helper scripts:
- profile_cpu.sh: CPU profiling with py-spy
- profile_memory.sh: Memory profiling with pytest-memray

Profiling Tools
- pytest-benchmark: Timing measurements
- py-spy: CPU profiling and flamegraphs
- memray: Memory allocation tracking

Key Discoveries

1. **Lazy loading inefficiency** (CRITICAL)
   - Accessing first result materializes ALL results (+77% overhead)
   - Example: `result[0][0]` loads all 10,000 results
   - Impact: 423ms → 749ms for 10K results

2. **Vector materialization dominates** (HIGH PRIORITY)
   - 76% of memory usage (326 MiB of 431 MiB for 65K results)
   - 8x slower than scalars (337ms vs 42ms for 10K results)
   - Scales linearly with dimensions (128d: 8 MiB, 1536d: 68 MiB)

3. **Struct fields are slow** (MEDIUM PRIORITY)
   - 10x slower than scalars (435ms vs 42ms for 10K results)
   - Column-to-row conversion overhead
   - Linear O(n) scaling with high constant factor

4. **Scalars are efficient** (NO OPTIMIZATION NEEDED)
   - 64.6 MiB for 65K rows × 4 fields
   - ~1 KB per entity (acceptable dict overhead)

Signed-off-by: yangxuan [email protected]

@sre-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: XuanYang-cn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…nsive benchmarking

This commit includes comprehensive benchmark suite to identify remaining bottlenecks

**New Benchmarks**
Created comprehensive benchmark suite under `tests/benchmark/`:
- **Access patterns**: 23 tests measuring real-world usage patterns
  - First-access (UI display), iterate-all (export), random-access (pagination)
- **Search benchmarks**: Various vector types, dimensions, output fields
- **Query benchmarks**: Scalars, JSON, all output types
- **Hybrid search**: Multiple requests, varying top-k

**Profiling Infrastructure**
- Mock framework for client-only testing (no server required)
- Integrated `pytest-memray` for memory profiling
- Added helper scripts:
  - `profile_cpu.sh`: CPU profiling with py-spy
  - `profile_memory.sh`: Memory profiling with pytest-memray

**Profiling Tools**
- `pytest-benchmark`: Timing measurements
- `py-spy`: CPU profiling and flamegraphs
- `memray`: Memory allocation tracking

**Key Discoveries**

1. **Lazy loading inefficiency** (CRITICAL)
   - Accessing first result materializes ALL results (+77% overhead)
   - Example: `result[0][0]` loads all 10,000 results
   - Impact: 423ms → 749ms for 10K results

2. **Vector materialization dominates** (HIGH PRIORITY)
   - 76% of memory usage (326 MiB of 431 MiB for 65K results)
   - 8x slower than scalars (337ms vs 42ms for 10K results)
   - Scales linearly with dimensions (128d: 8 MiB, 1536d: 68 MiB)

3. **Struct fields are slow** (MEDIUM PRIORITY)
   - 10x slower than scalars (435ms vs 42ms for 10K results)
   - Column-to-row conversion overhead
   - Linear O(n) scaling with high constant factor

4. **Scalars are efficient** (NO OPTIMIZATION NEEDED)
   - 64.6 MiB for 65K rows × 4 fields
   - ~1 KB per entity (acceptable dict overhead)

Signed-off-by: yangxuan <[email protected]>
@codecov
Copy link

codecov bot commented Oct 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (master@920df7b). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff            @@
##             master    #3029   +/-   ##
=========================================
  Coverage          ?   46.41%           
=========================================
  Files             ?       65           
  Lines             ?    13614           
  Branches          ?        0           
=========================================
  Hits              ?     6319           
  Misses            ?     7295           
  Partials          ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mergify mergify bot added the ci-passed label Oct 21, 2025
@XuanYang-cn XuanYang-cn changed the title Optimize pymilvus performance with Cython and CI/CD improvements feat(perf): optimize client-side performance with Cython and comprehensive benchmarking Oct 21, 2025
@XuanYang-cn XuanYang-cn changed the title feat(perf): optimize client-side performance with Cython and comprehensive benchmarking feat(perf): Add comprehensive benchmarking framework Oct 21, 2025
@XuanYang-cn XuanYang-cn added the PR | need to cherry-pick to 2.x This PR need to be cherry-picked to 2.x branch label Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants