Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting a Message will sometimes cause a stackoverflow. #2588

Closed
grippstick opened this issue Jul 16, 2024 · 36 comments · Fixed by microsoft/kiota-dotnet#343
Closed

Getting a Message will sometimes cause a stackoverflow. #2588

grippstick opened this issue Jul 16, 2024 · 36 comments · Fixed by microsoft/kiota-dotnet#343
Labels

Comments

@grippstick
Copy link

Describe the bug

Repeated calls to Me.Messages.GetAsync will intermittently fail with a stack over flow exception. It appears to be stemming from a recursive call in ParseNodeFactoryRegistry and ParseNodeProxyFactory. This was determined from a dump in procdump.

Expected behavior

There should not be intermittent stackoverflowexception.

How to reproduce

var tasks = Enumerable.Range(0, 1000).Select(async (index) => { var graphClient = await GetGraphClient(refreshToken).ConfigureAwait(false); return await graphClient.Me.Messages[msgID] .GetAsync(config => { config.QueryParameters.Select = new[] { "internetMessageId","internetMessageHeaders","createdDateTime","subject","bodyPreview", "hasAttachments","uniqueBody","body","from","toRecipients", "ccRecipients","bccRecipients","isDraft" }; }).ConfigureAwait(false); }); await System.Threading.Tasks.Task.WhenAll(tasks).ConfigureAwait(false);

SDK Version

5.56.0

Latest version known to work for scenario above?

No response

Known Workarounds

No response

Debug output

Click to expand log ```
</details>


### Configuration

Windows Server 2022 64 // EC2

### Other information

I have a memory dump that can be provided on request. It has an access token and some PII in the symbols so not sharing publicly.
![image](https://github.com/user-attachments/assets/5741b3db-9536-4f6f-aad6-f1334aeabf38)
@grippstick grippstick added status:waiting-for-triage An issue that is yet to be reviewed or assigned type:bug A broken experience labels Jul 16, 2024
@andrueastman
Copy link
Member

Thanks for raising this @grippstick

This will require some investigation here. Any chance you are able to share more details from the stacktrace on the error information like the callsite?

From a pattern perspective, is there a reason not to make the multiple requests using batching as outlined at
https://learn.microsoft.com/en-us/graph/sdks/batch-requests?tabs=csharp#simple-batching-example?

@andrueastman andrueastman added status:waiting-for-author-feedback Issue that we've responded but needs author feedback to close and removed status:waiting-for-triage An issue that is yet to be reviewed or assigned labels Jul 17, 2024
@grippstick
Copy link
Author

grippstick commented Jul 17, 2024

The call site is MessageItemRequestBuilder.GetAsync:
public async Task<Microsoft.Graph.Models.Message> GetAsync(Action<RequestConfiguration<Microsoft.Graph.Me.Messages.Item.MessageItemRequestBuilder.MessageItemRequestBuilderGetQueryParameters>> requestConfiguration = default, CancellationToken cancellationToken = default) { var requestInfo = ToGetRequestInformation(requestConfiguration); var errorMapping = new Dictionary<string, ParsableFactory<IParsable>> { { "XXX", Microsoft.Graph.Models.ODataErrors.ODataError.CreateFromDiscriminatorValue }, }; return await RequestAdapter.SendAsync<Microsoft.Graph.Models.Message>(requestInfo, Microsoft.Graph.Models.Message.CreateFromDiscriminatorValue, errorMapping, cancellationToken).ConfigureAwait(false); } }

I cannot use the batch pattern in this scenario. We are making local copies of messages for our clients. So we listen to web hooks and then go get the message. In the original code example, I should have denoted that it was a different call with a different refresh token for each client in each loop.

@microsoft-github-policy-service microsoft-github-policy-service bot added Needs: Attention 👋 and removed status:waiting-for-author-feedback Issue that we've responded but needs author feedback to close labels Jul 17, 2024
@michael-schleger
Copy link

michael-schleger commented Jul 24, 2024

I'm seeing something similar having gone from 5.44 to 5.56. GetAsync on users. I'll try get more info in the next few days but the crash dumps look recursively like this.

Microsoft.Kiota.Abstractions.Serialization.ParseNodeProxyFactory+<GetRootParseNodeAsync>d__8.MoveNext() System.Runtime.CompilerServices.AsyncTaskMethodBuilder1[[System.__Canon, mscorlib]].Start[[Microsoft.Kiota.Abstractions.Serialization.ParseNodeProxyFactory+d__8, Microsoft.Kiota.Abstractions]](d__8 ByRef)
`

@grippstick
Copy link
Author

I can confirm it is not just messages. It seems to happen with any large result while processing other concurrent requests. I had to start making raw http requests for attachments, because it would throw stackoverflows often. I also get it with large events. Since we normally see it under load as part of a que processor, we just let it crash reprocess what it failed on and it seems to work. It makes me think this is a threading issue. I have raised another issue asking for a way to turn off kiota backing store, for performance purposes.

@almostjulian
Copy link

+1 on this, we're also seeing microsoft/kiota-dotnet#309 since moving to 5.56. Going back to 5.44 seems to solve it. Issue seems to happen when making concurrent graph calls.

@andrueastman
Copy link
Member

@grippstick @almostjulian Any chance you can confirm this issue still occurs with adding the latest dependency of Microsoft.Graph.Core? https://www.nuget.org/packages/Microsoft.Graph.Core/3.1.14

@andrueastman andrueastman added status:waiting-for-author-feedback Issue that we've responded but needs author feedback to close and removed Needs: Attention 👋 labels Aug 5, 2024
@grippstick
Copy link
Author

grippstick commented Aug 5, 2024 via email

@microsoft-github-policy-service microsoft-github-policy-service bot added Needs: Attention 👋 and removed status:waiting-for-author-feedback Issue that we've responded but needs author feedback to close labels Aug 5, 2024
@jav-mut
Copy link

jav-mut commented Aug 12, 2024

+1 on this, I am experiencing the same issue, and it is solved by reverting the NuGet packages Microsoft.Graph, Microsoft.Graph.Core, and Microsoft.Kiota.* to their previous versions:

NuGet Package | Version with issue -> Previous version without issue

Microsoft.Graph | 5.56.0 -> 5.44.0
Microsoft.Graph.Core | 3.1.15 -> 3.1.8
Microsoft.Kiota.Abstractions | 1.11.0 -> 1.7.12
Microsoft.Kiota.Authentication.Azure | 1.11.0 -> 1.1.4
Microsoft.Kiota.Http.HttpClientLibrary | 1.11.0 -> 1.3.7
Microsoft.Kiota.Serialization.Form | 1.11.0 -> 1.1.5
Microsoft.Kiota.Serialization.Json | 1.11.0 -> 1.1.8
Microsoft.Kiota.Serialization.Multipart | 1.11.0 -> 1.1.3
Microsoft.Kiota.Serialization.Text | 1.11.0 -> 1.1.4

@baywet
Copy link
Member

baywet commented Aug 12, 2024

Thanks everyone for the information. This is most likely the same root cause as #2563 and it'll be interesting to test once this patch is released to see whether you're still facing the issue.
microsoft/kiota-dotnet#326

@andrueastman
Copy link
Member

Thanks everyone for the information. This is most likely the same root cause as #2563 and it'll be interesting to test once this patch is released to see whether you're still facing the issue. microsoft/kiota-dotnet#326

Please confirm if this still occurs if you add a reference to https://www.nuget.org/packages/Microsoft.Kiota.Abstractions/1.11.1

@grippstick
Copy link
Author

I can confirm that it still occurs with 1.11.1.

@baywet
Copy link
Member

baywet commented Aug 14, 2024

@grippstick Thank you for the additional information.
We just released 1.11.2, can you check again please?

@grippstick
Copy link
Author

grippstick commented Aug 14, 2024 via email

@baywet
Copy link
Member

baywet commented Aug 14, 2024

Thank you for the additional information.
Does that timing also include the network time? If not this is still pretty slow.
Don't hesitate to provide additional hot path analysis like you've provided on the other thread, they were especially helpful!

@grippstick
Copy link
Author

This is still an issue as of Microsoft.Kiota.Abstractions 1.12.2

I am still trying to replicate it in a consistent manner.

@baywet
Copy link
Member

baywet commented Aug 15, 2024

Thank you for the additional information.
Besides a repro, are you able to get a stacktrace at least so we can look into the potential root cause?

@grippstick
Copy link
Author

grippstick commented Aug 15, 2024 via email

@almostjulian
Copy link

Is it possible to get an interim graph sdk release that takes in the recent kiota changes?

@baywet
Copy link
Member

baywet commented Aug 15, 2024

Is it possible to get an interim graph sdk release that takes in the recent kiota changes?

Sure, we'll start by releasing core, and then consume the dependency here so we can release a patch
microsoftgraph/msgraph-sdk-dotnet-core#874

@grippstick
Copy link
Author

To make sure the network was out of the way, I captured the json that came down for a FileAttachment and stored it to disk. The attachment itself is roughly 17Mb in the contentBytes.

I then created a test function to do some benchmarking. When I us kiota to parse the payload in a loop of 1000 and doing garbage collections with each iteration[not shown below], I get an average of ~200ms. When I just do a raw parse with Json.Net and access the property directly, I get around 60ms.

[TestMethod]
public void TestParseNode()
{
    var json = System.IO.File.ReadAllText(@"S:\Development\Attach.txt");

    var jDoc = System.Text.Json.JsonDocument.Parse(json);

    var pNode = new Microsoft.Kiota.Serialization.Json.JsonParseNode(
                            jDoc.RootElement,
                            Microsoft.Kiota.Serialization.Json.KiotaJsonSerializationContext.Default
                        );

    var sw = new System.Diagnostics.Stopwatch();

    sw.Restart();
    var attach = pNode.GetObjectValue(Microsoft.Graph.Models.FileAttachment.CreateFromDiscriminatorValue);
    var b1 = attach.ContentBytes;
    sw.Stop();
    var time1 = sw.ElapsedMilliseconds;
    //time1 averages 230ms

    sw.Restart();
    jDoc = System.Text.Json.JsonDocument.Parse(json);
    var b2 = jDoc.RootElement.GetProperty("contentBytes").GetBytesFromBase64();
    sw.Stop();
    var time2 = sw.ElapsedMilliseconds;
    //time2 which ibcludes parsing the document again from scratch is roughly 60ms.

    Assert.IsTrue(time1 < time2);
}

image

@baywet
Copy link
Member

baywet commented Aug 16, 2024

@grippstick
Thank you for the additional information.
I don't suspect this is the stack overflow path, but it's still good to improve the performance.
I'd have suspected that STJ is using Convert.Base64 under the hood, but it doesn't seem to be the case. In any case it's better to use their APIs since we'll be closer to "manual deserialization" in terms of behaviour and performance.
I've put together a draft PR microsoft/kiota-dotnet#337 Would you mind pulling it locally and running the same test you provided with local links please?

@baywet
Copy link
Member

baywet commented Aug 16, 2024

Actually, I took the time to setup a benchmark

currently implementation

Method Mean Error StdDev Median
Run 737.7 us 55.56 us 163.8 us 796.2 us

My PR

Method Mean Error StdDev Median
Run 86.52 us 9.229 us 26.03 us 76.67 us

I'll go ahead and get this PR merged and released.
In the meantime, if you can keep tracking down the stacktrace / repro for the stackoverflow issue?

@grippstick
Copy link
Author

grippstick commented Aug 16, 2024 via email

@grippstick
Copy link
Author

grippstick commented Aug 16, 2024 via email

@baywet
Copy link
Member

baywet commented Aug 19, 2024

  1. yes it's a common issue, make sure you have the latest version of net8 sdk installed
  2. yes, that's painful, you effectively need to clone all the projects, and replace the package references with relative project references all the way. I have not found a way around that like npm link would do for example.

@grippstick
Copy link
Author

grippstick commented Aug 19, 2024 via email

@grippstick
Copy link
Author

I am working through a script I created and while I am trying to analyze a stack track, can you clarify something.
In the screen shot below, where it says "The 3 frame(s) above this were repeated 2324 times." Are they referring to the above in the picture or the below, which is above it in the stack?

Is it referring to the ParseNodeProxyFactory or JsonDocument?

image

@baywet
Copy link
Member

baywet commented Aug 20, 2024

I'd say the factory. I believe it's the only one that can be recursive. But it also depends on how you are tracing things. (single request or multiple, sampling or not, etc...)

@grippstick
Copy link
Author

grippstick commented Aug 20, 2024 via email

@grippstick
Copy link
Author

I am pretty confident that I can confirm this to be inside of Microsoft.Kiota and not MsGraph.
With that said I am still not sure if it is an actual StackOverflow error, even though that is what is being reported. I think something might be failing somewhere, which is causing state to get corrupted and it gets stuck in an infinite loop. I have tests that will reproduce the error within 2400 iterations. Not sure if this is the right channel to send updates, but I wanted to give you a heads up that I am homing in on the actual problem.

@baywet
Copy link
Member

baywet commented Aug 21, 2024

Thank you for the continuous work on this issue! We're looking forward to your findings and getting a reproduction.
For a single request, we should see only a couple (less than 5) calls to GetRootParseNodeAysnc for the different layers. So you're most likely on a promising path already.

@grippstick
Copy link
Author

I think this will be resolved with this pull request: microsoft/kiota-dotnet#343

If you want to talk through anything, I am more than happy to.

@grippstick
Copy link
Author

I can confirm that this has stopped the StackOverFlow exception. It has also improved performance and memory usage in my long running process.

@baywet
Copy link
Member

baywet commented Aug 26, 2024

Thank you for all the information you've provided and the pull requests!

@almostjulian
Copy link

Any chance we can get a new graph sdk version with this baked in?

@baywet
Copy link
Member

baywet commented Aug 27, 2024

This is what #2641 is supposed to do but the CI failed. I just restarted it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants