-
Notifications
You must be signed in to change notification settings - Fork 43
Mission: Circuit Breaker Istio
Group | Owner |
---|---|
Circuit Breaker |
Christophe Laprun |
ID | Short Name |
---|---|
|
|
This mission showcases how Istio can be used to implement the Circuit Breaker architectural pattern. Indeed, deploying the application in an Istio service mesh allows for some cross-cutting, application-wide concerns, such as introducing the Circuit Breaker pattern to protect some of the services that comprise the application, to be moved outside of the application itself. We will demonstrate this functionality by showing how the Istio circuit breaker can be activated using only configuration files, without modifying the application.
See the Circuit Breaker Mission wiki page for and in-depth user problem description and how it was implemented with Hystrix.
Circuit breaking to protect sensitive services in a microservice application against cascading failure is a useful feature to have. However, implementing such support traditionally meant injecting a library (and supporting code) in your application, thus diluting the core functionality with operational concerns. Service meshes, and Istio more specifically, offer the option of moving such operational concerns outside of the application itself. It is therefore interesting to see how Istio’s approach to circuit breaking compares with the original approach leveraging Netflix’ Hystrix library.
-
Istio
-
Circuit Breaker
-
Application Routing
-
Runtime fallbacks
-
Fault tolerance
-
Disaster recovery
-
High availability
-
The Circuit Breaker booster application is deployed.
-
Istio is configured on the target cluster.
-
Initially, no Istio rule is applied to the application.
While we originally intended to compare Istio’s and Hystrix' Circuit Breaker using the same application, it turned out that this isn’t easily feasible because Istio’s approach to circuit breaking is targeted more towards larger deployments and works in a fuzzier way than Hystrix. While Hystrix can easily protect a single instance of a service, able to interrupt completely all calls to a given instance, Istio’s Circuit Breaker is concerned more with managing the overall health of a pool of service instances and works closely with the load balancer. Thus, the approach taken with the original Circuit Breaker mission doesn’t work with Istio because the mesh goes into panic mode when all instances (in the original scenario, only one, really) of the target service become unhealthy. See Christian Posta’s comparison between the Netflix Hystrix OSS Circuit Breaker and Istio Circuit Breaker implementations for more details.
We therefore decided to modify the application that was used to demonstrate Hystrix’s circuit breaker to make Istio’s circuit breaker more easily demonstrated. The application still invokes the greeting
service which in turns calls the name
service (one call to greeting
equates to one call to name
). The user journey is however slightly different since the triggering of the circuit breaker now involves generating load on the name
service (via the greeting
service) as opposed to purposefully failing it. The idea is then to demonstrate how enabling / changing the Istio’s Circuit Breaker configuration impacts the application without having to modify or redeploy it.
To this end, the user interface is now configured to trigger up to 20 concurrent calls to the name
service. The user can change the number of concurrent calls using an input field from 1 to 20, 10 being the default.
Moreover, in order to make observation easier, the application will be calling greeting
using the configured concurrency level until instructed to stop. This behavior is controlled by 2 buttons: one to start calling greeting
and another one to stop. Additionally, call result output has been changed to show which "thread" performed the concurrent calls, further highlighted by using a color background for each "thread". Simple statistics about the number of calls that went through, the number of failed calls (i.e. blocked by the circuit breaker) and their ratio, have also been added. Finally, since the name
service responds quite quickly to calls, a checkbox has also been added to let the user artificially slow it down in order to increase concurrency (if the name
service replies too fast, then contention for it is reduced thereby limiting the need for the circuit breaker to trip).
We removed the following bits that are not useful/effective anymore with this new scenario:
* UI and API to fail the name
service,
* display of the circuit breaker state since Istio currently doesn’t provide a definitive way to query it and inferring it from the returned error code proved unreliable.
We now detail the steps in the user journey, below. More details on each steps can be found in the booster’s README.
Clicking on Start
button starts concurrent invocations to the name
service. All requests should go through as expected.
The initial configuration allows 100 concurrent connections to the name service. Since the UI restricts the number of concurrent calls to 20, there should be no change in behavior.
The operator changes the Istio CB configuration so that the configuration now only allows 1 concurrent connections. Instead of seeing all the calls go through, we now observe fallback messages, which should also be highlighted in an obvious way. Additionally, statistics about number successful vs. failed calls should be computed and displayed to explain the "statistical" nature of the Istio Circuit Breaker. Setting the concurrent level to 1 should also show that all calls go through as expected. Increasing the concurrency level should show an elevated number of requests failing.