-
Notifications
You must be signed in to change notification settings - Fork 285
ISCP integration with fully async ivfflat #22598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
PR Code Suggestions ✨Explore these optional code suggestions:
|
|
User description
What type of PR is this?
Which issue(s) this PR fixes:
issue ##21835
What this PR does / why we need it:
load the models once and perform all batches from DataRetriever and finally save model files into database.
This changes save a lot of time for upload/download the model every time there is a 8192 vector block.
PR Type
Enhancement, Tests
Description
• ISCP Integration: Added comprehensive Index Sync Change Processing (ISCP) support for async vector index operations with CDC task management, job lifecycle handling, and background processing capabilities
• Fully Async IVF Flat: Implemented fully asynchronous IVF flat index updates with K-means clustering running in separate threads, eliminating blocking operations during index updates
• HNSW Performance Improvements: Refactored HNSW updates to use direct APIs instead of SQL execution, enabling batch processing with model persistence and significantly reducing upload/download overhead
• SqlProcess Abstraction: Created unified
SqlProcess
wrapper to support both frontend (process.Process
) and background execution modes, enabling vector index operations in headless environments• Transaction Event System: Enhanced transaction event handling with context-aware callbacks and structured
TxnEventCallback
interface for better error handling and lifecycle management• DDL Integration: Added ISCP job management to DDL operations including table/index creation, drops, and alter operations with proper async index handling
• Comprehensive Testing: Added extensive test coverage for async vector operations, ISCP utilities, and new interfaces across HNSW, IVF, and fulltext indexes
Diagram Walkthrough
File Walkthrough
27 files
sync.go
Refactor HNSW sync to use SqlProcess wrapper and separate lifecycle
methods
pkg/vectorindex/hnsw/sync.go
• Refactored
CdcSync
function to use newHnswSync
struct with separateNewHnswSync
,RunOnce
,Update
, andSave
methods• Replaced
process.Process
parameter withSqlProcess
wrapper to support bothfrontend and background execution
• Added
DownloadAll
method topre-download all models and improved model lifecycle management
•
Updated all function signatures to use
SqlProcess
instead ofprocess.Process
alter.go
Integrate ISCP job management into alter table operations
pkg/sql/compile/alter.go
• Added ISCP job management during alter table operations - dropping
jobs for temp tables and registering jobs for unaffected indexes
•
Enhanced index cloning logic to handle async indexes and skip certain
index types during clone operations
• Added validation for index CDC
tasks and improved error handling for IVF index table operations
•
Restructured the alter table flow to better handle ISCP integration
sqlexec.go
Create SqlProcess wrapper for unified frontend and background SQL
execution
pkg/vectorindex/sqlexec/sqlexec.go
• Created new
SqlProcess
wrapper to support both frontend(
process.Process
) and background (SqlContext
) execution modes• Added
SqlContext
struct for background SQL execution with required context,UUID, and transaction operator
• Implemented
RunTxnWithSqlContext
function for running transactions in background mode
• Updated all SQL
execution functions to work with
SqlProcess
wrapperoperator.go
Add context-aware transaction event handling and callback system
pkg/txn/client/operator.go
• Updated transaction event handling to pass
context.Context
to alltriggerEvent
calls• Modified callback signatures to include context
parameter for better error handling
• Enhanced transaction lifecycle
methods (
Commit
,Rollback
,closeLocked
) with context-aware eventtriggering
• Updated error handling in transaction operations to
properly propagate context
ddl.go
Integrate ISCP job lifecycle management into DDL operations
pkg/sql/compile/ddl.go
• Added ISCP job management for database drops, table drops, and index
operations
• Integrated async index handling to skip sync operations
for async IVF and fulltext indexes
• Added ISCP job creation for new
tables with indexes and proper cleanup during drops
• Enhanced alter
table rename operations to update ISCP job references
mock_consumer.go
Update mock consumer to use direct transaction operators instead of
executors
pkg/iscp/mock_consumer.go
• Updated
NewInteralSqlConsumer
to acceptcnEngine
andcnTxnClient
parameters
• Replaced executor-based transaction handling with direct
transaction operator usage
• Modified
consumeData
and related methodsto use
client.TxnOperator
instead ofexecutor.TxnExecutor
• Added
getTxn
helper function for transaction creationbuild_dml_util.go
Add async index support to DML operations and plan building
pkg/sql/plan/build_dml_util.go
• Added async index checks to skip sync operations for async IVF and
fulltext indexes in DML operations
• Updated
buildPreInsertMultiTableIndexes
,buildDeleteMultiTableIndexes
, andfulltext index functions
• Enhanced index handling logic to
differentiate between sync and async index operations
• Improved DML
plan building to respect async index configuration
fulltext.go
Update fulltext functions to use SqlProcess wrapper
pkg/sql/colexec/table_function/fulltext.go
• Updated fulltext SQL execution calls to use
SqlProcess
wrapper•
Modified
runWordStats
andrunCountStar
functions to usesqlexec.NewSqlProcess(proc)
• Aligned with the new SQL execution
pattern using
SqlProcess
abstractionstorage_test.go
Update transaction event callbacks to new interface
pkg/partitionservice/storage_test.go
• Updated transaction event callback to use new
TxnEventCallback
structure with context and error handling
• Modified callback
functions to return error instead of void
build_test.go
Integrate SqlProcess wrapper for HNSW build operations
pkg/vectorindex/hnsw/build_test.go
• Added import for
sqlexec
package• Updated
NewHnswBuild
calls to usesqlexec.NewSqlProcess(proc)
instead of directproc
hnsw_search.go
Integrate SqlProcess wrapper for HNSW search operations
pkg/sql/colexec/table_function/hnsw_search.go
• Added import for
sqlexec
package• Updated vector cache search call
to use
sqlexec.NewSqlProcess(proc)
data_retriever.go
Refactor watermark update to use transaction operator
pkg/iscp/data_retriever.go
• Updated
UpdateWatermark
method signature to use context andtransaction operator instead of executor
• Added system account
context setup with timeout
• Replaced SQL execution with
ExecWithResult
functionservice.go
Update increment service transaction callbacks
pkg/incrservice/service.go
• Updated transaction event callback registration to use new
TxnEventCallback
structure• Modified
txnClosed
method to match newcallback signature with error return
storage_test.go
Update shard service transaction callbacks
pkg/shardservice/storage_test.go
• Updated transaction event callbacks to use new
TxnEventCallback
structure
• Modified callback functions to include context parameter
and return error
hnsw_create.go
Integrate SqlProcess wrapper for HNSW creation
pkg/sql/colexec/table_function/hnsw_create.go
• Updated SQL execution calls to use
sqlexec.NewSqlProcess(proc)
•
Modified HNSW build creation to use
SqlProcess
wrapperconsumer.go
Add engine and transaction client to consumer creation
pkg/iscp/consumer.go
• Added engine and transaction client parameters to consumer
constructors
• Updated
NewConsumer
function signature to includeadditional dependencies
types.go
Introduce structured transaction event callbacks
pkg/txn/client/types.go
• Added new
TxnEventCallback
struct with function and value fields•
Updated
AppendEventCallback
method signature to use callback struct•
Added constructor functions for creating callback instances
types.go
Update DataRetriever interface for transaction integration
pkg/iscp/types.go
• Updated
DataRetriever
interface to use context and transactionoperator for watermark updates
• Removed dependency on executor
package
metadata_scan.go
Integrate SqlProcess wrapper for metadata scanning
pkg/sql/colexec/table_function/metadata_scan.go
• Updated SQL execution to use
sqlexec.NewSqlProcess(proc)
• Added
import for
sqlexec
package and reorganized importsstore_mem.go
Update memory store transaction callbacks
pkg/incrservice/store_mem.go
• Updated transaction event callback to use new
TxnEventCallback
structure
• Modified callback function to include context and return
error
sql.go
Integrate SqlProcess wrapper for IVF flat operations
pkg/vectorindex/ivfflat/sql.go
• Updated
GetVersion
function to useSqlProcess
instead ofprocess.Process
• Modified SQL execution and error context to use
SqlProcess
types.go
Enhance vector index CDC with capacity and sonic JSON
pkg/vectorindex/types.go
• Updated
NewVectorIndexCdc
to accept capacity parameter• Replaced
json.Marshal
withsonic.Marshal
for better performanceivf_create.go
Integrate SqlProcess wrapper for IVF creation
pkg/sql/colexec/table_function/ivf_create.go
• Updated version retrieval and SQL execution to use
sqlexec.NewSqlProcess(proc)
service.go
Update partition service transaction callbacks
pkg/partitionservice/service.go
• Updated transaction event callback to use new
TxnEventCallback
structure
• Modified callback function to include context and return
error
iteration.go
Update consumer creation with additional dependencies
pkg/iscp/iteration.go
• Updated
NewConsumer
call to include engine and transaction clientparameters
storage_txn_client.go
Update memory storage transaction callback interface
pkg/txn/storage/memorystorage/storage_txn_client.go
• Updated
AppendEventCallback
method signature to useTxnEventCallback
parameter
watermark_updater.go
Make ExecWithResult function configurable
pkg/iscp/watermark_updater.go
• Made
ExecWithResult
function a variable for potential mocking orreplacement
24 files
index_consumer_test.go
Update ISCP consumer tests for new engine integration and IVF support
pkg/iscp/index_consumer_test.go
• Updated test functions to use new
NewConsumer
signature withcnEngine
andcnClient
parameters• Replaced mock SQL executor with
ExecWithResult
stub for cleaner testing• Added new test functions for
IVF index testing (
newTestIvfTableDef
,newTestIvfConsumerInfo
)•
Updated HNSW test to verify JSON CDC format instead of SQL function
calls
sync_test.go
Update HNSW sync tests to use SqlProcess wrapper and new API
pkg/vectorindex/hnsw/sync_test.go
• Updated all test functions to use
SqlProcess
wrapper instead ofdirect
process.Process
• Modified test calls from
CdcSync
toNewHnswSync
followed byRunOnce
pattern• Added new continuous update
test
TestSyncContinuousUpdateInsertShuffle2FilesF32WithSmallCap
•
Updated mock functions to accept
SqlProcess
parameteriscp_util_test.go
Add comprehensive tests for ISCP utility functions
pkg/sql/compile/iscp_util_test.go
• New test file for ISCP utility functions
• Added comprehensive tests
for CDC task validation and creation
• Included mock functions for
testing error scenarios
• Added tests for index algorithm validation
and sinker type determination
• Implemented panic testing for invalid
algorithm types
model_test.go
Update HNSW model tests for SqlProcess interface
pkg/vectorindex/hnsw/model_test.go
• Updated test functions to use
*sqlexec.SqlProcess
instead of*process.Process
• Modified mock streaming function signatures to
match new interface
• Added
sqlproc
initialization in test setup•
Updated all model method calls to use
sqlproc
parameterengine_mock.go
Update engine mock to match interface changes
pkg/frontend/test/engine_mock.go
• Updated mock method signatures to match interface changes
• Added
skipDeletes
parameter toCollectChanges
method• Added
partitionIndex
parameter to
PrimaryKeysMayBeModified
method• Added new
HasTempEngine
mock method
• Updated method call recordings to include new parameters
search_test.go
Update HNSW search tests for SqlProcess interface
pkg/vectorindex/hnsw/search_test.go
• Updated mock function signatures to use
*sqlexec.SqlProcess
insteadof
*process.Process
• Added
sqlproc
initialization in test functions•
Modified streaming mock functions to handle new interface
• Updated
cache search calls to use
sqlproc
parametersearch_test.go
Update IVF search tests for SqlProcess interface
pkg/vectorindex/ivfflat/search_test.go
• Updated mock function signatures to use
*sqlexec.SqlProcess
insteadof
*process.Process
• Added
sqlproc
initialization in test functions•
Modified streaming mock functions to handle new interface
• Updated
search method calls to use
sqlproc
parameterivf_search_test.go
Update IVF search table function tests for SqlProcess interface
pkg/sql/colexec/table_function/ivf_search_test.go
• Updated mock function signatures to use
*sqlexec.SqlProcess
insteadof
*process.Process
• Modified mock search interface methods to accept
SqlProcess
parameter• Added import for
sqlexec
packagebuild_dml_util_test.go
Add tests for async FullText index DML operations
pkg/sql/plan/build_dml_util_test.go
• Added new test functions for async FullText index processing
•
Included tests for invalid JSON parameter handling
• Added validation
tests for async parameter parsing
• Added import for
plan
packagefulltext_test.go
Update fulltext tests for SqlProcess integration
pkg/sql/colexec/table_function/fulltext_test.go
• Updated test functions to use
SqlProcess
parameter instead of directprocess.Process
• Modified fake SQL execution functions to accept
SqlProcess
wrapperhnsw_search_test.go
Update HNSW search tests for SqlProcess integration
pkg/sql/colexec/table_function/hnsw_search_test.go
• Updated mock search interface to use
SqlProcess
instead ofprocess.Process
• Modified
Search
andLoad
methods to acceptSqlProcess
parameterhnsw_create_test.go
Update HNSW create tests for SqlProcess integration
pkg/sql/colexec/table_function/hnsw_create_test.go
• Updated mock SQL execution function to use
SqlProcess
parameter•
Modified test helper to accept
SqlProcess
wrappersqlexec_test.go
Update SQL execution tests for SqlProcess integration
pkg/vectorindex/sqlexec/sqlexec_test.go
• Updated test functions to use
SqlProcess
wrapper instead of directprocess
• Modified
RunTxn
calls to acceptSqlProcess
parameteroperator_events_test.go
Update transaction operator event tests
pkg/txn/client/operator_events_test.go
• Updated transaction event callback test to use new
TxnEventCallback
structure
• Modified callback function to match new signature with
context and error return
txn_mock.go
Update transaction operator mock for new callback interface
pkg/frontend/test/txn_mock.go
• Updated mock
AppendEventCallback
method signature to useTxnEventCallback
parameterstore_sql_test.go
Update increment service SQL store test mocks
pkg/incrservice/store_sql_test.go
• Updated test transaction operator mock to use new
TxnEventCallback
parameter
service_test.go
Update bootstrap service test mocks
pkg/bootstrap/service_test.go
• Updated test transaction operator mock to use new
TxnEventCallback
parameter
txn_test.go
Update frontend transaction test mocks
pkg/frontend/txn_test.go
• Updated test transaction operator mock to use new
TxnEventCallback
parameter
entire_engine_test.go
Update engine test operator mocks
pkg/vm/engine/entire_engine_test.go
• Updated test operator mock to use new
TxnEventCallback
parametertypes_test.go
Update vector index CDC test for capacity parameter
pkg/vectorindex/types_test.go
• Updated
NewVectorIndexCdc
call to include capacity parameter (8192)vector_ivf_async.result
Add async IVF flat vector index test results
test/distributed/cases/vector/vector_ivf_async.result
• Added comprehensive test results for async IVF flat vector index
operations
• Includes create, insert, search, and load operations with
expected outputs
vector_hnsw.result
Update HNSW vector index test results for async behavior
test/distributed/cases/vector/vector_hnsw.result
• Added sleep operation and updated expected search results
• Modified
test results to show proper async index behavior
vector_hnsw_f64_async.sql
Add async HNSW F64 vector index test suite
test/distributed/cases/vector/vector_hnsw_f64_async.sql
• Added comprehensive test suite for async HNSW F64 vector operations
• Includes CDC operations, index creation, and search functionality
tests
vector_hnsw.sql
Update HNSW vector test for async index operations
test/distributed/cases/vector/vector_hnsw.sql
• Added sleep operation to wait for async index updates
• Updated
comments to reflect async index behavior
13 files
cache_test.go
Update cache tests to use SqlProcess interface
pkg/vectorindex/cache/cache_test.go
• Updated import from
process
tosqlexec
package• Modified mock
search interfaces to use
*sqlexec.SqlProcess
instead of*process.Process
• Added
sqlproc
variable initialization in testfunctions
• Updated all cache search calls to use
sqlproc
parametermodel.go
Refactor HNSW model to use SqlProcess interface
pkg/vectorindex/hnsw/model.go
• Updated function signatures to use
*sqlexec.SqlProcess
instead of*process.Process
• Added
NThread
field toHnswModel
struct andinitialization logic
• Modified index loading methods to handle empty
checksums and file sizes
• Updated context handling to use
sqlproc.GetContext()
instead ofproc.Ctx
• Added
initIndex
method forindex initialization
search.go
Refactor IVF search to use SqlProcess interface
pkg/vectorindex/ivfflat/search.go
• Updated function signatures to use
*sqlexec.SqlProcess
instead of*process.Process
• Modified context handling to use
sqlproc.GetContext()
andsqlproc.GetTopContext()
• Updated error
handling to use new context references
• Changed all search and load
methods to accept
SqlProcess
parameterservice_txn_event.go
Update transaction event handling to new callback interface
pkg/txn/trace/service_txn_event.go
• Updated transaction event callback registrations to use
NewTxnEventCallback
wrapper• Modified event handler function
signatures to include context and additional parameters
• Added return
statements to event handler functions
• Updated all event callback
functions to match new interface requirements
search.go
Refactor HNSW search to use SqlProcess interface
pkg/vectorindex/hnsw/search.go
• Updated function signatures to use
*sqlexec.SqlProcess
instead of*process.Process
• Modified context handling and error reporting to
use
sqlproc.GetContext()
• Updated metadata loading and index loading
methods
• Changed search method to accept
SqlProcess
parametercache.go
Update vector index cache to use SqlProcess interface
pkg/vectorindex/cache/cache.go
• Updated
VectorIndexSearchIf
interface to use*sqlexec.SqlProcess
instead of
*process.Process
• Modified cache search and load methods
to accept
SqlProcess
parameter• Updated all internal method calls to
use new interface
• Changed import from
process
tosqlexec
packageclient.go
Update transaction client for new event callback interface
pkg/txn/client/client.go
• Updated transaction event callback registration to use
TxnEventCallback
struct• Modified callback function signatures to
include context and additional parameters
• Added return statements to
callback functions
• Updated
SyncLatestCommitTS
call to match newinterface
index_sqlwriter.go
Update HNSW SQL writer for CDC integration
pkg/iscp/index_sqlwriter.go
• Modified HNSW SQL writer to return JSON data instead of SQL
statements
• Added
NewSync
method to createHnswSync
instances•
Updated writer capacity initialization for CDC processing
• Added
imports for
hnsw
andsqlexec
packagesivf_search.go
Update IVF search table function for SqlProcess interface
pkg/sql/colexec/table_function/ivf_search.go
• Updated function calls to use
sqlexec.NewSqlProcess(proc)
wrapper•
Modified
getVersion
and cache search calls to useSqlProcess
interface• Added import for
sqlexec
packageservice.go
Update shard service for new transaction event interface
pkg/shardservice/service.go
• Updated transaction event callback registration to use
NewTxnEventCallback
wrapper• Modified callback function signatures to
include context and additional parameters
• Added return statements to
callback functions
• Updated both create and delete callback
registrations
operator_events.go
Refactor transaction event callbacks with structured interface
pkg/txn/client/operator_events.go
• Added
TxnEventCallback
struct to replace function callbacks•
Updated
AppendEventCallback
to acceptTxnEventCallback
instead offunctions
• Modified event triggering to handle new callback interface
with context
• Added error handling to callback execution
build.go
Refactor HNSW build to use SqlProcess interface
pkg/vectorindex/hnsw/build.go
• Updated function signatures to use
*sqlexec.SqlProcess
instead of*process.Process
• Modified context handling to use
sqlproc.GetContext()
• Updated
NewHnswBuild
andaddFromChannel
methods• Changed import from
process
tosqlexec
packagefunction_id.go
Remove HNSW CDC update function constant
pkg/sql/plan/function/function_id.go
• Removed
HNSW_CDC_UPDATE
function constant• Decreased
FUNCTION_END_NUMBER
from 351 to 3503 files
index_consumer.go
Integrate HNSW and async processing with ISCP consumer
pkg/iscp/index_consumer.go
• Added new imports for
sonic
,types
,hnsw
,sqlexec
, andengine
•
Modified
IndexConsumer
struct to includecnEngine
,cnTxnClient
, andalgo
fields• Replaced SQL executor with transaction-based SQL
execution using
sqlexec.RunTxnWithSqlContext
• Added specialized HNSW
processing with
runHnsw
function for different numeric types•
Implemented CDC data processing with JSON unmarshaling for HNSW
updates
iscp_util.go
Add ISCP utility functions for CDC task management
pkg/sql/compile/iscp_util.go
• New utility file for ISCP (Index Sync Change Processing) integration
• Added functions for CDC task management: create, delete, and
validation
• Implemented transaction event callbacks for async index
operations
• Added support for different index algorithms (HNSW, IVF,
FullText)
• Included helper functions for sinker type determination
and job ID generation
ddl_index_algo.go
Add async index processing support to DDL operations
pkg/sql/compile/ddl_index_algo.go
• Added async processing support for FullText and IVF index creation
•
Modified index creation to check for async parameter and create ISCP
jobs accordingly
• Updated HNSW index handling to support both sync
and async modes
• Added transaction event callbacks for long-running
index operations
• Integrated CDC task creation for async index
updates
1 files
util.go
Add error handling to fulltext index SQL generation
pkg/sql/compile/util.go
• Updated
genInsertIndexTableSqlForFullTextIndex
function to returnerror alongside SQL strings
1 files
function_id_test.go
Remove HNSW CDC update function ID
pkg/sql/plan/function/function_id_test.go
• Removed
HNSW_CDC_UPDATE
function ID entry• Updated
FUNCTION_END_NUMBER
from 351 to 3509 files