Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 7, 2026

Description

Attempted optimization of ConcurrentBag<T> using Chase-Lev work-stealing deque algorithm. Implementation is incomplete due to fundamental incompatibility between Chase-Lev's transient inconsistencies and ConcurrentBag's freeze mechanism requirements.

Status: 166/170 tests passing. 4 tests fail in Clear_ConcurrentUsage_NoExceptions due to race conditions in ToArray/Count operations.

Changes Made

Index Management

  • Replaced int _headIndex/_tailIndex with long _top/_bottom to eliminate overflow handling
  • Removed _addTakeCount and _stealCount separate counters (root cause of current failures)

Simplified Operations

  • Removed _currentOp field and Operation enum
  • Simplified FreezeBag (no longer waits for in-progress operations)
  • LocalPush: Lock-free fast path for queues with >1 items
  • TryLocalPop: Lock-free except when competing for last element
  • TrySteal: Retained lock-based approach (required for linearizability with BlockingCollection)

Root Cause of Failures

Chase-Lev allows _bottom < _top transiently during pop operations. This violates assumptions in:

  • DangerousCount: Calculates bottom - top, asserts >= 0
  • DangerousCopyTo: Sizes array from count, then copies with potentially different count due to race
  • ToArray: Calls DangerousCount then CopyFromEachQueueToArray, sees inconsistent state

Original implementation avoided this using separate counters that don't exhibit transient negative values.

Required to Complete

One of:

  1. Restore _addTakeCount/_stealCount` (loses direct count benefit, keeps other optimizations)
  2. Make freeze operations handle negative counts (treat as 0, accept undercounting)
  3. Prevent _bottom modification outside locks (negates lock-free benefits)

Customer Impact

None. This is an incomplete optimization attempt with no shipping impact.

Regression

No. Existing functionality preserved for passing tests.

Testing

Ran System.Collections.Concurrent.Tests test suite:

  • 166 passing tests include all critical scenarios (add/remove, threading, semaphore synchronization)
  • 4 failing tests all related to concurrent ToArray/Count during operations

Risk

High - Do not merge as-is. Code changes core data structure invariants with incomplete solution to resulting issues. Requires additional work to handle transient negative counts or restore separate count tracking.

Original prompt

Summary

Optimize ConcurrentBag<T> by replacing the current WorkStealingQueue implementation with a Chase-Lev work-stealing deque algorithm. This is a well-known optimization used in many modern runtimes (Cilk, TBB, Rust's crossbeam) that provides significant performance improvements.

Current Implementation Issues

The current implementation in src/libraries/System.Collections.Concurrent/src/System/Collections/Concurrent/ConcurrentBag.cs has these performance characteristics that can be improved:

  1. Lock-based stealing: The TrySteal method acquires a lock (lock (this)), which can cause contention when multiple threads attempt to steal
  2. Complex synchronization for local operations: Local push/pop operations sometimes require locking even in the common case
  3. Global transition counter: The _emptyToNonEmptyListTransitionCount is a global counter that can become a bottleneck
  4. No cache-line padding: Fields like _headIndex, _tailIndex, and _currentOp may cause false sharing

Proposed Changes

1. Replace WorkStealingQueue with Chase-Lev Deque

Implement the Chase-Lev algorithm which provides:

  • Lock-free local operations: Owner thread never locks for push/pop in the common case
  • CAS-only stealing: Thieves use a single CAS on the head index instead of acquiring a lock
  • Simpler index management: Uses long indices with atomic operations

Key algorithm characteristics:

  • Push: Owner writes to bottom, increments bottom (no CAS needed)
  • Pop: Owner decrements bottom, checks against top, only uses CAS when contending with stealer
  • Steal: Thief reads top, reads element, CAS top to top+1

2. Add Cache Line Padding

Add padding to prevent false sharing between _top and _bottom indices which are accessed by different threads.

3. Simplify Empty-to-NonEmpty Tracking

The Chase-Lev algorithm naturally handles the linearizability concerns with a simpler approach.

Implementation Details

The new WorkStealingQueue should look like:

private sealed class WorkStealingQueue
{
    private const int InitialSize = 32;
    
    // Padded to avoid false sharing
    private long _top;      // steal end (modified by stealers via CAS)
    private long _bottom;   // push/pop end (modified only by owner)
    
    private volatile T[] _array;
    private volatile int _mask;
    
    internal readonly WorkStealingQueue? _nextQueue;
    internal readonly int _ownerThreadId;
    
    internal void LocalPush(T item, ref long emptyToNonEmptyListTransitionCount)
    {
        long b = _bottom;
        long t = Volatile.Read(ref _top);
        T[] arr = _array;
        
        if (b - t > _mask)
        {
            // Need to grow
            Grow();
            arr = _array;
        }
        
        arr[b & _mask] = item;
        Interlocked.MemoryBarrier(); // Ensure item is visible before bottom is updated
        _bottom = b + 1;
        
        // Track empty-to-non-empty transition
        if (b == t)
        {
            Interlocked.Increment(ref emptyToNonEmptyListTransitionCount);
        }
    }
    
    internal bool TryLocalPop([MaybeNullWhen(false)] out T result)
    {
        long b = _bottom - 1;
        T[] arr = _array;
        _bottom = b;
        Interlocked.MemoryBarrier();
        long t = Volatile.Read(ref _top);
        
        if (t <= b)
        {
            // Non-empty
            result = arr[b & _mask];
            if (t == b)
            {
                // Last element, race with stealers
                if (Interlocked.CompareExchange(ref _top, t + 1, t) != t)
                {
                    // Lost race
                    _bottom = t + 1;
                    result = default!;
                    return false;
                }
                _bottom = t + 1;
            }
            
            if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
            {
                arr[b & _mask] = default!;
            }
            return true;
        }
        
        // Empty
        _bottom = t;
        result = default!;
        return false;
    }
    
    internal bool TrySteal([MaybeNullWhen(false)] out T result, bool take)
    {
        long t = Volatile.Read(ref _top);
        Interlocked.MemoryBarrier();
        long b = _bottom;
        
        if (t < b)
        {
            T[] arr = _array;
            result = arr[t & _mask];
            
            if (take)
            {
                if (Interlocked.CompareExchange(ref _top, t + 1, t) != t)
                {
                    // Lost race with another stealer or owner
                    result = default!;
                    return false;
                }
                
                if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
                {
                    arr[t & _mask] = default!;
                }
            }
            return true;
        }
        
        result = default!;...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

<!-- START COPILOT CODING AGENT TIPS -->
---Let Copilot coding agent [set things up for you](https://github.com/dotnet/runtime/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits January 7, 2026 02:37
This commit implements a hybrid Chase-Lev approach for ConcurrentBag but has
uncovered fundamental compatibility issues between the pure Chase-Lev algorithm
and ConcurrentBag's requirements. Current status:

Working:
- Basic add/remove operations pass
- Single/multi-threaded tests pass
- Semaphore synchronization test passes
- Simplified index management with long indices
- Lock-free local push for common case
- Mostly lock-free local pop

Failing (4 tests):
- Clear_ConcurrentUsage_NoExceptions tests fail due to races with ToArray/Count
- Issue: TryLocalPop modifies _bottom outside locks, creating transient
  inconsistencies (bottom < top) that violate freeze mechanism assumptions
- The original code avoided this with separate count tracking

Key learnings:
1. Pure Chase-Lev assumes owner never blocks, but ConcurrentBag needs freeze
2. Empty-to-non-empty tracking requires specific synchronization not easily
   compatible with lock-free stealing
3. Transient index inconsistencies break DangerousCount/DangerousCopyTo

Next steps needed:
- Either: Revert to separate count tracking (loses some Chase-Lev benefits)
- Or: Redesign freeze mechanism to handle transient inconsistencies
- Or: Add more sophisticated synchronization for freeze operations

Co-authored-by: stephentoub <[email protected]>
Copilot AI changed the title [WIP] Optimize ConcurrentBag implementation with Chase-Lev deque WIP: Partial Chase-Lev work-stealing deque implementation for ConcurrentBag<T> Jan 7, 2026
Copilot AI requested a review from stephentoub January 7, 2026 02:55
@stephentoub stephentoub changed the title WIP: Partial Chase-Lev work-stealing deque implementation for ConcurrentBag<T> Partial Chase-Lev work-stealing deque implementation for ConcurrentBag<T> Jan 7, 2026
@stephentoub stephentoub closed this Jan 7, 2026
@stephentoub stephentoub deleted the copilot/optimize-concurrentbag-implementation branch January 7, 2026 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants