-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After Upgrade to 23.3.3 from 23.2.2 about every 20th - 30th request finished with status code 500 #2110
Comments
Hello Roman! It's difficult to determine what happened. Could you please upload the entire solution to GitHub for review? Alternatively, you could attach the following artifacts to this thread:
This information will likely assist in identifying the issue.
You didn't observe the same errors in the Ocelot logs for version 23.2.x, did you? Since you're utilizing Kubernetes, could you provide more details about your K8s server deployment? |
Regarding the exception details...
Are you a C# developer? Can you debug C# code? Do you use the official Ocelot NuGet package? Deploying the Release version of the Ocelot package DLLs means there is no debug information for the called lines in the Call Stack trace. However, we can analyze the code of the Ocelot/src/Ocelot/LoadBalancer/LoadBalancers/RoundRobin.cs Lines 19 to 38 in 8c0180a
It appears the issue may lie here: Ocelot/src/Ocelot/LoadBalancer/LoadBalancers/RoundRobin.cs Lines 32 to 33 in 8c0180a
There's a missing null check for line 33. Theoretically, the service provider could return a null object, which should be accounted for.
To conclude, it seems that for some reason, the |
@antikorol The subsequent step in the investigation involves searching for subsequent messages in Ocelot's log. Ocelot/src/Ocelot.Provider.Kubernetes/KubeServiceBuilder.cs Lines 22 to 33 in 8c0180a
We need to search for the following message in the log:
In {
"Logging": {
"LogLevel": {
"Default": "Debug",
"Microsoft.AspNetCore": "Warning"
}
}
} Deploy and review the log for the message:
If this message appears in the log, it indicates that the Kubernetes setup is correct, and we can proceed to investigate the issue more thoroughly. |
It seems that the requests to the Kubernetes endpoint were failing around the 20th to 30th attempts, as indicated by the code in these lines: Ocelot/src/Ocelot.Provider.Kubernetes/Kube.cs Lines 34 to 36 in 8c0180a
This suggests that the Kubernetes client connectivity might be unstable. It would be advisable to switch from the Kube to the PollKube provider and observe the behavior again. The error should no longer occur if this change is made. Additionally, the PollingInterval should be set to less than half the average duration between the occurrences of the 500 error.
Let's figure out what's happening and possibly provide a hotfix for your user scenario to eliminate the Null Reference Error logging. We need to handle this 500 status case and process it accordingly. |
Below is the log with debug information.
|
Dear Roman, |
Please be aware that any custom logic may compromise stability, and the Ocelot team cannot be held accountable for it. It appears your existing custom logic is only compatible with version 23.2. It would be helpful to examine the service definition in your Kubernetes configuration. Could you provide it, please? |
No, I removed that logic because your changes already cover the case I needed with multiple ports. |
Good! After a quick review:
I wonder that your Ocelot app started at all! Did you read Configuration docs? So, I don't see configuration part in these lines L27-L42 ❗ builder.WebHost
.ConfigureAppConfiguration((context, config) =>
{
var env = context.HostingEnvironment;
config.SetBasePath(env.ContentRootPath)
.AddJsonFile("appsettings.json", true, true)
.AddJsonFile($"appsettings.{env.EnvironmentName}.json", true, true)
.AddOcelot(env) // happy path
.AddEnvironmentVariables();
}) So, it's very important to configure via .ConfigureAppConfiguration((context, config) => config.AddOcelot()) This is minimal required configuring code! But better to use official template from docs. |
Good! You wanna use the feature: Downstream Scheme vs Port Names Also you enabled the port finder via "DownstreamScheme": "http". |
There was update from 18.0.0 to 23.2.2 a long time ago and now to 23.3.3 According to the Microsoft documentation, it is not mandatory to explicitly specify appsettings.json or its variations depending on the environment.
yes, what I should provide? |
I apologize, but I see absolutely no difference from the official code. There's no difference at all; every line is exactly the same. I don't understand what you're referring to.
I apologize, but uncommenting line L53 does not result in any replacement. You need to implement actual service injection using the official helpers: Custom Load Balancers. Have you consulted this documentation? Even if you substitute the services with custom ones, it won't aid in identifying the failing line, as this necessitates the inclusion of Debug version DLLs in your Docker file, as seen here: L21. Therefore, you must build with the debug configuration |
Please provide the Kubernetes service definition in text, JSON format, or the Kubernetes client response JSON. From the log you've shared, it appears you have defined only one service; however, the number of ports in the definition is unclear. Additionally, I'm curious as to why every twentieth Kubernetes client request fails or returns an invalid response, causing the Ocelot provider to throw Lastly, I trust that you have the expertise to debug the provider, identify the issue, and potentially write tests or even submit a pull request with a fix. |
Sorry for the confusion. The files are identical. I didn't want to change the project build type. In the release build, I can see lines in the stack trace in my code. I wanted to understand in which exact line the error occurred so that I could add logs later. Surprisingly, the error didn't reproduce when I used just a copy |
I am awaiting a PR that will resolve the issue... |
@raman-m
This class is registered as a singleton for each ServiceProviderConfiguration, and the class that resolves it, RoundRobinCreator is also registered as a singleton. At the same time, we use the Kube.GetAsync method call in async code, where we can easily encounter a "Collection was modified" exception. Maybe somewhere deep in the List.Clear or elsewhere, due to resize or clear array, we might lost references and encounter an "Object reference not set to an instance..." exception.What do you think about this part of code? |
Look at the actual registration method Ocelot/src/Ocelot.Provider.Kubernetes/OcelotBuilderExtensions.cs Lines 9 to 17 in 8c0180a
So, the Kube class is not registered at all. Only KubernetesProviderFactory.Get delegate is registered:
Kube is instantiated on every request. The logic is quite complicated. And I think Kube is scoped service.Also, IKubeApiClient is scoped registered service. The problem could be bad/empty response from IKubeApiClient
The issue seems to be with the load balancing logic, which involves a singleton What's happening in my opinion. I believe the problem is on the line 33
It must be rewritten! |
You are right that problem at this row. But the main problem in concurrent issue that I described before in the class Kube because the collection of services is declared at the class field level.
I spent a lot of time for confirming the issue, I hope it will be enough. ![]() The second element of list is null I can see it in class RoundRoubin ![]() Example of logs:
If the logs aren't entirely clear, I created a code sample that shows if the list is modified from different threads, we can get null where it shouldn't be. In the sample, I always add one element to the list, but in different threads, and I end up with the list size being more than 1 and the second element is null. And we don't get an exeption that the collection was modified. |
@antikorol Roman, If you open a fix PR, I will prepare an additional hotfix release. However, I'm concerned that we need to refactor the FYI, I've included this bug issue in the Summer'24 milestone. This represents the least favorable delivery scenario. I am optimistic that we will manage to deliver the fix within a week or two, or alternatively, we can assign the issue to the Annual'23 release which is early bird aka current release. |
TODO
|
A quick fix is to move the creation of the List from the class fields to a method. |
Would you be interested in contributing by submitting a PR? If so, please fork the repository into your account. |
Hello, Raman @raman-m |
Expected Behavior / New Feature
Resolving a service address to route an API request
Actual Behavior / Motivation for New Feature
Sometimes, about every 20th - 30th request finished with status code 500
Steps to Reproduce the Problem
Specifications
The text was updated successfully, but these errors were encountered: