Skip to content
This repository has been archived by the owner on May 29, 2024. It is now read-only.

Mission: Circuit Breaker Istio

Andrew Lee Rubinger edited this page Apr 26, 2018 · 8 revisions

ISTIO Circuit Breaker Mission

Group Owner

Circuit Breaker

Christophe Laprun

ID Short Name

110

istio-circuit-breaker

Description

This mission showcases how Istio can be used to implement the Circuit Breaker architectural pattern. Indeed, deploying the application in an Istio service mesh allows for some cross-cutting, application-wide concerns, such as introducing the Circuit Breaker pattern to protect some of the services that comprise the application, to be moved outside of the application itself. We will demonstrate this functionality by showing how the Istio circuit breaker can be activated using only configuration files, without modifying the application.

User Problem

See the Circuit Breaker Mission wiki page for and in-depth user problem description and how it was implemented with Hystrix.

Circuit breaking to protect sensitive services in a microservice application against cascading failure is a useful feature to have. However, implementing such support traditionally meant injecting a library (and supporting code) in your application, thus diluting the core functionality with operational concerns. Service meshes, and Istio more specifically, offer the option of moving such operational concerns outside of the application itself. It is therefore interesting to see how Istio’s approach to circuit breaking compares with the original approach leveraging Netflix’ Hystrix library.

Concepts and Architectural Patterns

  • Istio

  • Circuit Breaker

  • Application Routing

  • Runtime fallbacks

  • Fault tolerance

  • Disaster recovery

  • High availability

Prerequisites

  • The Circuit Breaker booster application is deployed.

  • Istio is configured on the target cluster.

  • Initially, no Istio rule is applied to the application.

Use Case

While we originally intended to compare Istio’s and Hystrix' Circuit Breaker using the same application, it turned out that this isn’t easily feasible because Istio’s approach to circuit breaking is targeted more towards larger deployments and works in a fuzzier way than Hystrix. While Hystrix can easily protect a single instance of a service, able to interrupt completely all calls to a given instance, Istio’s Circuit Breaker is concerned more with managing the overall health of a pool of service instances and works closely with the load balancer. Thus, the approach taken with the original Circuit Breaker mission doesn’t work with Istio because the mesh goes into panic mode when all instances (in the original scenario, only one, really) of the target service become unhealthy. See Christian Posta’s comparison between the Netflix Hystrix OSS Circuit Breaker and Istio Circuit Breaker implementations for more details.

We therefore decided to modify the application that was used to demonstrate Hystrix’s circuit breaker to make Istio’s circuit breaker more easily demonstrated. The application still invokes the greeting service which in turns calls the name service (one call to greeting equates to one call to name). The user journey is however slightly different since the triggering of the circuit breaker now involves generating load on the name service (via the greeting service) as opposed to purposefully failing it. The idea is then to demonstrate how enabling / changing the Istio’s Circuit Breaker configuration impacts the application without having to modify or redeploy it.

To this end, the user interface is now configured to trigger up to 20 concurrent calls to the name service. The user can change the number of concurrent calls using an input field from 1 to 20, 10 being the default.

Moreover, in order to make observation easier, the application will be calling greeting using the configured concurrency level until instructed to stop. This behavior is controlled by 2 buttons: one to start calling greeting and another one to stop. Additionally, call result output has been changed to show which "thread" performed the concurrent calls, further highlighted by using a color background for each "thread". Simple statistics about the number of calls that went through, the number of failed calls (i.e. blocked by the circuit breaker) and their ratio, have also been added. Finally, since the name service responds quite quickly to calls, a checkbox has also been added to let the user artificially slow it down in order to increase concurrency (if the name service replies too fast, then contention for it is reduced thereby limiting the need for the circuit breaker to trip).

We removed the following bits that are not useful/effective anymore with this new scenario: * UI and API to fail the name service, * display of the circuit breaker state since Istio currently doesn’t provide a definitive way to query it and inferring it from the returned error code proved unreliable.

We now detail the steps in the user journey, below. More details on each steps can be found in the booster’s README.

No Istio configuration: normal interaction with the booster

Clicking on Start button starts concurrent invocations to the name service. All requests should go through as expected.

Initial Istio configuration: normal interaction with the booster

The initial configuration allows 100 concurrent connections to the name service. Since the UI restricts the number of concurrent calls to 20, there should be no change in behavior.

Restrictive Istio configuration

The operator changes the Istio CB configuration so that the configuration now only allows 1 concurrent connections. Instead of seeing all the calls go through, we now observe fallback messages, which should also be highlighted in an obvious way. Additionally, statistics about number successful vs. failed calls should be computed and displayed to explain the "statistical" nature of the Istio Circuit Breaker. Setting the concurrent level to 1 should also show that all calls go through as expected. Increasing the concurrency level should show an elevated number of requests failing.

Acceptance Criteria

Vert.x-specific Acceptance Criteria

Swarm-specific Acceptance Criteria

Boot-specific Acceptance Criteria

Node.js Acceptance Criteria

Integration Requirements

Tags

Notes

Approval

Spring Boot

Swarm

DevExp