Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure cloud database connection resiliency #2477

Open
trancefreak77 opened this issue Dec 6, 2024 · 1 comment
Open

Azure cloud database connection resiliency #2477

trancefreak77 opened this issue Dec 6, 2024 · 1 comment

Comments

@trancefreak77
Copy link

We are using Hangfire in a Linux docker container running in Azure. In Azure we experience quite often database connection issues. Example Exception:
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 35 - An internal exception was caught) at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)

We get these exception when accessing the database with our code and also when Hangfire tries to access the db.
After some research I came accross the following GitHub issues:
dotnet/SqlClient#2103 (comment)
dotnet/SqlClient#1773

There seems to be a bug in the Microsoft Linux version of the SQLClient which does not hand over the correct error number when creating the SQLException. The error code is always 0 in these resiliency cases.
The Windows version of the SQLClient does not have this bug and therefore EFCore can handle these resiliency issues and does a reconnect / retry the database operation. On Linux it can't because of the missing error number.

We were able to implement a custom execution strategy to handle these resiliency problems as shown here:
dotnet/SqlClient#2103 (comment)

Have you ever been made aware of these resiliency issues under Linux or could you implement something similar in your code base?
The issue we currently face is that our code now does no longer create these exceptions but when Hangfire wants to access it's database it fails with the error message mentioned above.

So my question is if you could implement such a custom resiliency strategy in your code?

Thanks,
Christian

@odinserj
Copy link
Member

odinserj commented Dec 9, 2024

Hangfire uses retries by default in background, however for client methods retries aren't enabled by default. You can register the IBackgroundJobClient service with retries enabled in the following way in a modern .NET application:

services.AddSingleton<IBackgroundJobClient>(
    provider => new BackgroundJobClient(provider.GetService<JobStorage>())
    {
        RetryAttempts = 3
    });

It will make retry attempts on any exception occurred, and will check whether a particular job already exists first (that's useful on timeout exceptions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants