Partial Chase-Lev work-stealing deque implementation for ConcurrentBag<T> #122956
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Attempted optimization of
ConcurrentBag<T>using Chase-Lev work-stealing deque algorithm. Implementation is incomplete due to fundamental incompatibility between Chase-Lev's transient inconsistencies and ConcurrentBag's freeze mechanism requirements.Status: 166/170 tests passing. 4 tests fail in
Clear_ConcurrentUsage_NoExceptionsdue to race conditions inToArray/Countoperations.Changes Made
Index Management
int _headIndex/_tailIndexwithlong _top/_bottomto eliminate overflow handling_addTakeCountand_stealCountseparate counters (root cause of current failures)Simplified Operations
_currentOpfield andOperationenumFreezeBag(no longer waits for in-progress operations)LocalPush: Lock-free fast path for queues with >1 itemsTryLocalPop: Lock-free except when competing for last elementTrySteal: Retained lock-based approach (required for linearizability withBlockingCollection)Root Cause of Failures
Chase-Lev allows
_bottom < _toptransiently during pop operations. This violates assumptions in:DangerousCount: Calculatesbottom - top, asserts>= 0DangerousCopyTo: Sizes array from count, then copies with potentially different count due to raceToArray: CallsDangerousCountthenCopyFromEachQueueToArray, sees inconsistent stateOriginal implementation avoided this using separate counters that don't exhibit transient negative values.
Required to Complete
One of:
_addTakeCount/_stealCount` (loses direct count benefit, keeps other optimizations)_bottommodification outside locks (negates lock-free benefits)Customer Impact
None. This is an incomplete optimization attempt with no shipping impact.
Regression
No. Existing functionality preserved for passing tests.
Testing
Ran
System.Collections.Concurrent.Teststest suite:ToArray/Countduring operationsRisk
High - Do not merge as-is. Code changes core data structure invariants with incomplete solution to resulting issues. Requires additional work to handle transient negative counts or restore separate count tracking.
Original prompt
Summary
Optimize
ConcurrentBag<T>by replacing the currentWorkStealingQueueimplementation with a Chase-Lev work-stealing deque algorithm. This is a well-known optimization used in many modern runtimes (Cilk, TBB, Rust's crossbeam) that provides significant performance improvements.Current Implementation Issues
The current implementation in
src/libraries/System.Collections.Concurrent/src/System/Collections/Concurrent/ConcurrentBag.cshas these performance characteristics that can be improved:TryStealmethod acquires a lock (lock (this)), which can cause contention when multiple threads attempt to steal_emptyToNonEmptyListTransitionCountis a global counter that can become a bottleneck_headIndex,_tailIndex, and_currentOpmay cause false sharingProposed Changes
1. Replace
WorkStealingQueuewith Chase-Lev DequeImplement the Chase-Lev algorithm which provides:
longindices with atomic operationsKey algorithm characteristics:
bottom, incrementsbottom(no CAS needed)bottom, checks againsttop, only uses CAS when contending with stealertop, reads element, CAStoptotop+12. Add Cache Line Padding
Add padding to prevent false sharing between
_topand_bottomindices which are accessed by different threads.3. Simplify Empty-to-NonEmpty Tracking
The Chase-Lev algorithm naturally handles the linearizability concerns with a simpler approach.
Implementation Details
The new
WorkStealingQueueshould look like: