perf: skip task queue metadata blob write and serialization in CreateTasks#9622
perf: skip task queue metadata blob write and serialization in CreateTasks#9622mykaul wants to merge 2 commits intotemporalio:mainfrom
Conversation
The Cassandra CreateTasks CAS batch previously wrote the full TaskQueueInfo proto blob (~200-300 bytes) on every call, but only the range_id fencing token is needed for the conditional check. This matches the SQL backend which already returns UpdatedMetadata:false and relies on SyncState to periodically flush metadata changes. Introduces templateCheckRangeIDQuery that only sets range_id in the CAS UPDATE, removing the task_queue and task_queue_encoding columns from the hot path. This reduces Paxos/Raft proposal payload size and eliminates unnecessary TaskQueueInfo serialization writes. Benchmark results (throughput_stress mc150, 5min, host networking, GOMAXPROCS=4 GOGC=200, cores 0-3 server / 4-7 DB / 8-11 ES / 12-15 omes): Cassandra: before 192 / after 202 iterations (+5.2%) ScyllaDB: before 171 / after 245 iterations (+43.3%) Note: Temporal server CPUs (0-3) were >80% utilized in both runs, while DB CPUs (4-7) were <65%, indicating the server is the throughput bottleneck.
|
I've tested this extensively with ScyllaDB 2026.1, Cassandra 5.0 and Cassandra 3.11, running multiple Omes workloads - short periods of multiple workloads. |
|
Fyi, we're discussing this internally. There's no correctness concern, but there is some concern about metadata updates happening less frequently, mostly about the backlog size. We might want to do a compromise and skip this sometimes but not always. Do you have any numbers to quantify the improvements you saw? How did you notice this issue in the first place? |
AI noticed the potential improvement as I was working to improve the performance of ScyllaDB with Temporal (see #8652 ). Do note that I do NOT feel I'm able to truly saturate the environment - it's either my pathetic laptop or the setup - or omes, which fails to properly load the system (ref #8652 (comment) ) |
Summary
range_idfencing token is needed for the conditional check — the full blob (~200-300 bytes) was written unnecessarily on every CreateTasks call.Motivation
The CreateTasks path is the hottest write path in the matching service. Every task dispatch writes the full TaskQueueInfo proto blob as part of the CAS batch, but only
range_idis used for fencing. This adds unnecessary serialization cost and increases Paxos/Raft proposal payload size.This matches the SQL backend behavior, which already returns
UpdatedMetadata: falseand relies on SyncState to periodically flush metadata changes.Approach
templateCheckRangeIDQuerythat only setsrange_idin the CAS UPDATE, removingtask_queueandtask_queue_encodingcolumns from the hot path.task_manager.go) no longer serializes the TaskQueueInfo blob since no store consumes it.Testing
TestCreateTask_DoesNotWriteTaskQueueMetadataverifying metadata is not written during CreateTasks.