Skip to content

Disconnect stale automation peers from UIA to fix memory leak in virtualized ItemsControls#11657

Open
amarinov-msft wants to merge 1 commit into
dotnet:mainfrom
amarinov-msft:amarinov-fix/automation-peer-disconnect-stale
Open

Disconnect stale automation peers from UIA to fix memory leak in virtualized ItemsControls#11657
amarinov-msft wants to merge 1 commit into
dotnet:mainfrom
amarinov-msft:amarinov-fix/automation-peer-disconnect-stale

Conversation

@amarinov-msft

@amarinov-msft amarinov-msft commented May 21, 2026

Copy link
Copy Markdown

Fixes #11337

Description

When UIA clients enumerate a virtualized ItemsControl (e.g. DataGrid), ElementProxy CCWs are created for each automation peer. Previously, when children were replaced (e.g. due to collection rebinding), the old peers were never disconnected from UIA Core. The COM references held by UIA kept the CCW ref count > 0, pinning the managed peers and their entire visual sub-trees in memory indefinitely.

This fix calls UiaDisconnectProvider on removed children's ElementProxy CCWs during UpdateChildrenInternal, causing UIA Core to release its COM references. This allows the CCW ref count to drop to zero so the managed peers can be garbage collected.

Note: UIA clients may observe ElementNotAvailableException when accessing properties of stale elements after disconnection. This is standard UIA behavior that well-behaved clients already handle.

Customer Impact

WPF applications using virtualized ItemsControl (e.g., DataGrid, ListView) that are exposed to UI Automation clients (screen readers, automated testing tools, accessibility inspectors) experience an unbounded memory leak of approximately 260 MB/min in the customer's repro scenario (200k-row DataGrid with periodic rebinding). The leak occurs because stale AutomationPeer instances are never disconnected from UIA Core - their ElementProxy CCW ref counts never reach zero, permanently pinning the managed peers and their visual sub-trees in memory. This leads to OutOfMemoryException and application crashes in long-running scenarios. Any application with accessibility enabled (which is the default when assistive technology or test automation is present) is affected.

Regression

No.

Testing

Tested using the customer`s repro scenario:
Base line memory leak rate: +260 MB/min
With fix: +15.3 MB/min (94% reduction)

Risk

Low overall.

  1. Narrow scope: The change affects only UpdateChildrenInternal in AutomationPeer.cs - a single code path that runs when automation children are replaced.
  2. Standard API usage: UiaDisconnectProvider is the documented Windows API for this exact purpose — telling UIA Core to release COM references to providers that are no longer valid. This is the same mechanism used by other frameworks (e.g., WinForms) to manage automation peer lifetimes.
  3. Non-recursive by design: The disconnect is intentionally non-recursive to avoid disconnecting shared container peers in virtualized controls that may have been recycled to serve new items.
  4. No behavioral change for apps without UIA clients: The disconnect logic only executes when UpdateChildrenInternal detects removed children, which only happens when automation is active and the tree structure changes.
  5. Expected UIA client-side behavior change: UIA clients that hold stale element references from a previous FindAll call may now receive ElementNotAvailableException when accessing properties of disconnected elements. This is standard UIA contract behavior that well-behaved clients (including Narrator, NVDA, and JAWS) already handle. Poorly written test automation tools that do not catch this exception may surface errors they previously did not encounter.
Microsoft Reviewers: Open in CodeFlow

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a memory leak in WPF UI Automation (UIA) for virtualized ItemsControl scenarios by explicitly disconnecting UIA providers for removed automation peers, allowing COM references held by UIA Core to be released and enabling GC of stale peers and their visual subtrees.

Changes:

  • Added a helper to disconnect an AutomationPeer’s ElementProxy from UIA via UiaDisconnectProvider.
  • Updated UpdateChildrenInternal to disconnect removed children even when no StructureChanged listeners are registered.
  • Added the required interop import to call into UIAutomationCore.dll.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1838 to +1843
if (proxyWeakRef?.Target is ElementProxy proxy)
{
UiaDisconnectProvider(proxy);
}

peer._elementProxyWeakReference = null;

@h3xds1nz h3xds1nz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amarinov-msft I'm really worried about perf with these changes. UIA implementation itself is already a really heavy mess (things like #10676 and more), and this will further increase the weight (an async DispatcherOperation for every disconnected peer), further enumeration over the intermediate collections.

Now, I don't have an exact solution here since in our wpf fork I've reworked this extensively but I think it's worth a further investigation and profiling to fix this.

if (dispatcher != null && !dispatcher.HasShutdownStarted)
{
dispatcher.BeginInvoke(DispatcherPriority.Background,
new Action(() => UiaDisconnectProvider(proxy)));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a closure unnecessarily and a new action everytime. Ideally use the parameter variant, it's an object you're passing so no boxing here.

// UiaDisconnectProvider must NOT be called during a UIA callback (e.g. during Navigate/FindAll handling).
// UpdateChildrenInternal is invoked from within UIA callbacks, so defer the actual COM disconnect to a separate dispatcher operation.
Dispatcher dispatcher = peer.Dispatcher;
if (dispatcher != null && !dispatcher.HasShutdownStarted)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is even required/true; the UIA callbacks should already run on the Dispatcher synchronously iirc, so that check seems redundant. If invoked from proxy directly, this would already throw if it was the case.

/// </remarks>
private static void DisconnectPeerFromUia(AutomationPeer peer)
{
if (peer == null)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is redundant since you're already checking in the calling function.

}

[DllImport("UIAutomationCore.dll", EntryPoint = "UiaDisconnectProvider", CharSet = CharSet.Unicode)]
private static extern int UiaDisconnectProvider(IRawElementProviderSimple provider);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be defined in this file.

// This must happen regardless of StructureChanged event registration.
if (removedCount > 0)
{
foreach (AutomationPeer removedChild in hs)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a sigh for another PR but this whole thing should be turned into a simple merge walk to differentiate old and new and avoid the HashSet allocations altogether. Sometimes this is tens of MBs for something that could have been a simple loop.

The least that can be done here is to repeat this loop over the HashSet only once; by either doing all the StructureChanged stuff or not.

if (old._elementProxyWeakReference == null)
continue;

if (_children == null || !_children.Contains(old))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically O(n*m), if this is a top automation peer, it's gonna be pretty heavy.

amarinov-msft added a commit to amarinov-msft/wpf that referenced this pull request Jun 18, 2026
- Cache static DispatcherOperationCallback; drop closure/Action allocation per call.
- Remove redundant Dispatcher null/HasShutdownStarted guard (matches InvalidatePeer pattern).
- Remove redundant peer null check inside DisconnectPeerFromUia (callers guarantee non-null).
- Move UiaDisconnectProvider P/Invoke onto ElementProxy as a single Disconnect() method; collapse ClearPeer() into it. Use DllImport.UIAutomationCore string constant per codebase convention.
- Build HashSet of new children lazily in EnsureChildren to avoid O(n*m) Contains scan; common path stays allocation-free.
- Drop redundant disconnect loop from UpdateChildrenInternal (EnsureChildren already handles it on this path); revert the merged loop back to events-only.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@amarinov-msft

Copy link
Copy Markdown
Author

@h3xds1nz thank you for your review. I tried addressing your comments. While the fundamental question about this issue is resolved (performance vs behavior), please check the updates when you have the opportunity and let me know if they are in the direction that you had in mind.

@h3xds1nz h3xds1nz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amarinov-msft Just a few nits, I guess this is acceptable at the current codebase state to resolve the issue.

peer._childrenValid = false;
}

private static readonly DispatcherOperationCallback _disconnectProxyCallback = static arg =>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to use SendOrPostCallback which doesn't return but is special-cased.

if (callback is SendOrPostCallback sendOrPostCallback)

_peer = null;
}

[DllImport(DllImport.UIAutomationCore, EntryPoint = "UiaDisconnectProvider", CharSet = CharSet.Unicode)]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I've mentioned it before, I meant that the common practice is to put it in specialized file; in this case that would be https://github.com/dotnet/wpf/blob/main/src/Microsoft.DotNet.Wpf/src/UIAutomation/UIAutomationProvider/MS/Internal/Automation/UiaCoreProviderApi.cs ... the provider library would have been loaded at this point anyway.

@amarinov-msft

Copy link
Copy Markdown
Author

@h3xds1nz thank you for your review and effort! However, while testing this - I found a regression introduced by my changes where visible peers are being treated as stale and disconnected. The disconnect-based approach in the PR seems to be fundamentally incompatible with how UIA traverses peers. We probably need to change the approach completely.

@h3xds1nz

Copy link
Copy Markdown
Member

@amarinov-msft No worries; I unfortunately cannot test this as I'm unable to replicate the memory leak with the repro whatsoever. Plus WPFAutomationClient report seems invalid, given that there's literally no GC pressure, just forcing a GC manually clears out everything (not that this fix would be an attempt to solve that).

@amarinov-msft amarinov-msft force-pushed the amarinov-fix/automation-peer-disconnect-stale branch from c0b8c49 to cff011c Compare June 24, 2026 08:42
…item peers

ElementProxy now holds a weak reference to data-item automation peers (those
for which AutomationPeer.IsDataItemAutomationPeer() returns true: ItemAutomationPeer
and its subclasses, DataGridCellItemAutomationPeer, and DateTimeAutomationPeer),
matching the weak-reference treatment already applied to UIElement/ContentElement/
UIElement3D peers. This lets UI Automation client references release virtualized
item peers - and the controls they transitively root - so they can be collected,
fixing unbounded provider-side growth (customer OOM on a ~200k-row DataGrid under
continuous UIA querying).

To avoid an ElementNotAvailableException if a peer is collected mid-walk or during
property readback, a short-lived strong "keep-alive" (PeerKeepAlive) roots every
peer touched at StaticWrap/Peer access for a bounded window using a two-bucket
rotation. The 9s window was sized from a measured worst-case FindAll+readback
(~4.4s under GC stress, validated 16/16 trials clean across GC modes).

A new opt-out AppContext switch,
Switch.System.Windows.Automation.Peers.UseStrongReferenceForItemAutomationPeers,
restores the legacy strong-reference behavior.

Supersedes the earlier disconnect-based approach, which had an unrecoverable
mid-walk regression.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@amarinov-msft amarinov-msft force-pushed the amarinov-fix/automation-peer-disconnect-stale branch from cff011c to 6604bff Compare June 24, 2026 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Included in test pass PR metadata: Label to tag PRs, to facilitate with triage Status:Completed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WPF leaks ElementProxy instances when UI Automation is used

3 participants