-
Notifications
You must be signed in to change notification settings - Fork 285
accelerate vector ops with vek #23245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
||||||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
||||||||||||||||
fengttt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems vek only have AVX and does not have NEON?
This is a tough decision
fengttt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the _op we benchmarked is at 400+ns, cgo is probably acceptable. What about implement the L2 distance function in C and Cgo call it? That should give us best perf on both AMD64 and ARM.
Besides, I trust GCC/CLANG more than hand written assembly to be bug free.
User description
What type of PR is this?
Which issue(s) this PR fixes:
issue #22508
What this PR does / why we need it:
l2 distance benchmark is improved by 15 times
vanilla go:
Benchmark_L2Distance/L2_Distance-24 181933 6490 ns/op 4096 B/op 1 allocs/op
Benchmark_L2Distance/Normalize_L2-24 717655 1670 ns/op 0 B/op 0 allocs/op
Benchmark_L2Distance/L2_Distance(v1,_NormalizeL2)-24 277056 4254 ns/op 4096 B/op 1 allocs/op
SIMD version of vek:
Benchmark_L2Distance/L2_Distance-24 2925302 405.2 ns/op 0 B/op 0 allocs/op
Benchmark_L2Distance/Normalize_L2-24 3214724 374.1 ns/op 0 B/op 0 allocs/op
Benchmark_L2Distance/L2_Distance(v1,_NormalizeL2)-24 1973931 607.6 ns/op 0 B/op 0 allocs/op
PR Type
Enhancement
Description
Replace BLAS operations with vek SIMD library for vector distance calculations
Achieve 15x performance improvement in L2 distance benchmarks
Simplify code by removing manual vector allocations and loops
Update dependencies to include vek library and update third-party versions
Diagram Walkthrough
File Walkthrough
distance_func.go
Replace BLAS with vek SIMD operationspkg/vectorindex/metric/distance_func.go
L2Distance, L2DistanceSq, L1Distance, InnerProduct, and CosineDistance
in-place normalization
value
Metric_L2Distance
distance_func_bench_test.go
Update benchmarks for vek float32 operationspkg/vectorindex/metric/distance_func_bench_test.go
testing
go.mod
Add vek SIMD library dependencygo.mod
operations
v0.0.0-20251130095425-a2f175991017
go.sum
Update dependency checksumsgo.sum
dependency)
Makefile
Update third-party library versionsthirdparties/Makefile