Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2080 Add new V8 params to fine-tune Polly's circuit-breaker behavior #2081

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion docs/features/qualityofservice.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,18 @@ Then add the following section to a Route configuration:
"QoSOptions": {
"ExceptionsAllowedBeforeBreaking": 3,
"DurationOfBreak": 1000,
"TimeoutValue": 5000
"TimeoutValue": 5000,
"FailureRatio": .8 // .8 = 80% of failed, this is default value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comma❗

Suggested change
"FailureRatio": .8 // .8 = 80% of failed, this is default value
"FailureRatio": .8, // .8 = 80% of failed, this is default value

"SamplingDuration": 10000 // The time period over which the failure-success ratio is calculated (in milliseconds), default is 10000 (10s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Better to add new properties docs below, after line 45.
  2. You are wrong regarding default value! According to the docs for SamplingDuration, the default value is 30 seconds! So, we override library value by our custom one (10s). I believe we must assign lib value in QoSOptions

}

- You must set a number equal or greater than ``2`` against ``ExceptionsAllowedBeforeBreaking`` for this rule to be implemented. [#f2]_
- ``DurationOfBreak`` means the circuit breaker will stay open for 1 second after it is tripped.
- ``TimeoutValue`` means if a request takes more than 5 seconds, it will automatically be timed out.

| Please note: if you use the Circuit-Breaker, Ocelot checks that the parameters are correct during execution. If not, it throws an exception.
| For a complete explanation about Circuit-Breaker strategies and mechanisms, consult Polly documentation here <https://www.pollydocs.org/strategies/circuit-breaker>

.. _qos-circuit-breaker-strategy:

Circuit Breaker strategy
Expand Down
10 changes: 8 additions & 2 deletions src/Ocelot.Provider.Polly/PollyQoSResiliencePipelineProvider.cs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
using Polly.CircuitBreaker;
using Polly.Registry;
using Polly.Timeout;
using System;
using System.Net;

namespace Ocelot.Provider.Polly;
Expand Down Expand Up @@ -53,6 +54,11 @@ public ResiliencePipeline<HttpResponseMessage> GetResiliencePipeline(DownstreamR
return ResiliencePipeline<HttpResponseMessage>.Empty; // shortcut -> No QoS
}

if (!options.IsValid())
{
throw new ArgumentException("QoS options are invalid.");
}

return _registry.GetOrAddPipeline<HttpResponseMessage>(
key: new OcelotResiliencePipelineKey(GetRouteName(route)),
configure: (builder) => ConfigureStrategies(builder, route));
Expand All @@ -76,8 +82,8 @@ protected virtual ResiliencePipelineBuilder<HttpResponseMessage> ConfigureCircui
var info = $"Circuit Breaker for the route: {GetRouteName(route)}: ";
var strategyOptions = new CircuitBreakerStrategyOptions<HttpResponseMessage>
{
FailureRatio = 0.8,
SamplingDuration = TimeSpan.FromSeconds(10),
FailureRatio = options.FailureRatio,
SamplingDuration = TimeSpan.FromMilliseconds(options.SamplingDuration),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both SamplingDuration and BreakDuration are TimeSpans, so they can be initialized from milliseconds.
Not an issue! FYI

MinimumThroughput = options.ExceptionsAllowedBeforeBreaking,
BreakDuration = options.DurationOfBreak > QoSOptions.LowBreakDuration
? TimeSpan.FromMilliseconds(options.DurationOfBreak)
Expand Down
2 changes: 2 additions & 0 deletions src/Ocelot/Configuration/File/FileQoSOptions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -33,5 +33,7 @@ public FileQoSOptions(QoSOptions from)
public int DurationOfBreak { get; set; }
public int ExceptionsAllowedBeforeBreaking { get; set; }
public int TimeoutValue { get; set; }
public double FailureRatio { get; set; }
public int SamplingDuration { get; set; }
}
}
66 changes: 56 additions & 10 deletions src/Ocelot/Configuration/QoSOptions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ public QoSOptions(FileQoSOptions from)
public QoSOptions(
int exceptionsAllowedBeforeBreaking,
int durationOfBreak,
int timeoutValue,
int timeoutValue,
string key)
{
DurationOfBreak = durationOfBreak;
Expand All @@ -32,37 +32,83 @@ public QoSOptions(
TimeoutValue = timeoutValue;
}

public QoSOptions(
int exceptionsAllowedBeforeBreaking,
int durationOfBreak,
double failureRatio,
int timeoutValue,
string key)
{
DurationOfBreak = durationOfBreak;
ExceptionsAllowedBeforeBreaking = exceptionsAllowedBeforeBreaking;
Key = key;
TimeoutValue = timeoutValue;
FailureRatio = failureRatio;
}

public QoSOptions(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add both constructors? I don't see any changes in the Creators' logic.
It appears these constructors were added solely for testing purposes. I would suggest switching to QoSOptionsBuilder and removing these two constructors.
Therefore, you will need to update QoSOptionsBuilder by adding new initialization methods.

int exceptionsAllowedBeforeBreaking,
int durationOfBreak,
double failureRatio,
int samplingDuration,
int timeoutValue,
string key)
{
DurationOfBreak = durationOfBreak;
ExceptionsAllowedBeforeBreaking = exceptionsAllowedBeforeBreaking;
Key = key;
TimeoutValue = timeoutValue;
FailureRatio = failureRatio;
SamplingDuration = samplingDuration;
}

/// <summary>How long the circuit should stay open before resetting in milliseconds.</summary>
/// <remarks>If using Polly version 8 or above, this value must be 500 (0.5 sec) or greater.</remarks>
/// <value>An <see cref="int"/> value (milliseconds).</value>
public int DurationOfBreak { get; } = DefaultBreakDuration;
public int DurationOfBreak { get; }

public const int LowBreakDuration = 500; // 0.5 seconds
public const int DefaultBreakDuration = 5_000; // 5 seconds

/// <summary>
/// How many times a circuit can fail before being set to open.
/// </summary>
/// <remarks>
/// If using Polly version 8 or above, this value must be 2 or greater.
/// if set to 0, the circuit-breaker is deactivated
/// otherwise this value must be 2 or greater.
/// </remarks>
/// <value>
/// An <see cref="int"/> value (no of exceptions).
/// </value>
public int ExceptionsAllowedBeforeBreaking { get; }

public string Key { get; }
/// <summary>
/// The failure-success ratio that will cause the circuit to break/open.
/// </summary>
/// <value>
/// An <see cref="double"/> ratio of exceptions/requests (0.8 means 80% failed of all sampled executions).
/// </value>
public double FailureRatio { get; } = .8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually FailureRatio has the default value is 0.1
So, I don't recommend to override library settings!


/// <summary>
/// Value for TimeoutStrategy in milliseconds.
/// The time period over which the failure-success ratio is calculated (in milliseconds).
/// </summary>
/// <remarks>
/// If using Polly version 8 or above, this value must be 1000 (1 sec) or greater.
/// </remarks>
/// <value>
/// An <see cref="int"/> value (milliseconds).
/// An <see cref="int"/> Time period in milliseconds.
/// </value>
public int SamplingDuration { get; } = 10000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree to assign custom default value which differs from the lib one!
So, according to the docs for SamplingDuration, the default value is 30 seconds! So, we override library value by our custom one (10s). I believe we must assign lib value in QoSOptions


public string Key { get; }

/// <summary>Value for TimeoutStrategy in milliseconds.</summary>
/// <value>An <see cref="int"/> value (milliseconds).</value>
public int TimeoutValue { get; }

public bool UseQos => ExceptionsAllowedBeforeBreaking > 0 || TimeoutValue > 0;
public bool UseQos => ExceptionsAllowedBeforeBreaking >= 2 || TimeoutValue > 0;

public bool IsValid() =>
ExceptionsAllowedBeforeBreaking <= 0 ||
ExceptionsAllowedBeforeBreaking >= 2 && DurationOfBreak > 0 && !(FailureRatio <= 0) &&
!(FailureRatio > 1) && SamplingDuration > 0;
Comment on lines +109 to +112
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it used for JSON config validation?

}
}
8 changes: 4 additions & 4 deletions test/Ocelot.AcceptanceTests/PollyQoSTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ public override void Dispose()
public void Should_not_timeout()
{
var port = PortFinder.GetRandomPort();
var route = GivenRoute(port, new QoSOptions(10, 500, 1000, null), HttpMethods.Post);
var route = GivenRoute(port, new QoSOptions(10, 500, .5, 5, 1000, null), HttpMethods.Post);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test readability is low❗

We have to switch to QoSOptionsBuilder in tests avoiding direct usage of constructors!
With QoSOptionsBuilder test readability will be excellent. 😉

var configuration = GivenConfiguration(route);

this.Given(x => x.GivenThereIsAServiceRunningOn(port, HttpStatusCode.OK, string.Empty, 10))
Expand Down Expand Up @@ -91,7 +91,7 @@ public void Should_open_circuit_breaker_for_DefaultBreakDuration()
{
int invalidDuration = QoSOptions.LowBreakDuration; // valid value must be >500ms, exact 500ms is invalid
var port = PortFinder.GetRandomPort();
var route = GivenRoute(port, new QoSOptions(2, invalidDuration, 100000, null));
var route = GivenRoute(port, new QoSOptions(2, invalidDuration, .005,1,100000, null));
var configuration = GivenConfiguration(route);

this.Given(x => x.GivenThereIsABrokenServiceRunningOn(port, HttpStatusCode.InternalServerError))
Expand All @@ -101,7 +101,7 @@ public void Should_open_circuit_breaker_for_DefaultBreakDuration()
.Then(x => ThenTheStatusCodeShouldBe(HttpStatusCode.InternalServerError))
.And(x => WhenIGetUrlOnTheApiGateway("/"))
.Then(x => ThenTheStatusCodeShouldBe(HttpStatusCode.InternalServerError))
.When(x => WhenIGetUrlOnTheApiGateway("/")) // opened
.When(x => WhenIGetUrlOnTheApiGateway("/")) // opened
.Then(x => ThenTheStatusCodeShouldBe(HttpStatusCode.ServiceUnavailable)) // Polly status
.Given(x => GivenIWaitMilliseconds(QoSOptions.DefaultBreakDuration - 500)) // BreakDuration is not elapsed
.When(x => WhenIGetUrlOnTheApiGateway("/")) // still opened
Expand Down Expand Up @@ -186,7 +186,7 @@ public void Should_timeout_per_default_after_90_seconds()
var port = PortFinder.GetRandomPort();
var route = GivenRoute(port, new QoSOptions(new FileQoSOptions()), HttpMethods.Get);
var configuration = GivenConfiguration(route);

this.Given(x => x.GivenThereIsAServiceRunningOn(port, HttpStatusCode.Created, string.Empty, 3500)) // 3.5s > 3s -> ServiceUnavailable
.And(x => GivenThereIsAConfiguration(configuration))
.And(x => GivenOcelotIsRunningWithPolly())
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,6 @@ public void ShouldNotBuild_ReturnedEmpty()

[Theory]
[Trait("Bug", "2085")]
[InlineData(0, QoSOptions.DefaultBreakDuration)] // default
[InlineData(QoSOptions.LowBreakDuration - 1, QoSOptions.DefaultBreakDuration)] // default
[InlineData(QoSOptions.LowBreakDuration, QoSOptions.DefaultBreakDuration)] // default
[InlineData(QoSOptions.LowBreakDuration + 1, QoSOptions.LowBreakDuration + 1)] // not default, exact
Expand Down