Skip to content

Releases: fluxninja/aperture

Aperture v0.10.0-rc.2

14 Nov 12:28
cbb7ecb
Compare
Choose a tag to compare
Aperture v0.10.0-rc.2 Pre-release
Pre-release

Changelog

List of aperture PRs merged since 0.10.0-rc.1 release. For the full list of changes, see list of changes

Learning period via EMA warm up window (#921)

Description of change

  • EMA emits invalids during warm up by default.
  • Increase the EMA warm period in latency gradient policy to 1 minute.
  • This would ensure no actuation for at least one minute of initial
    traffic while Aperture learns the latency profile of a service.

Aperture v0.10.0-rc.1

11 Nov 12:55
8c52e6d
Compare
Choose a tag to compare
Aperture v0.10.0-rc.1 Pre-release
Pre-release

Changelog

List of aperture PRs merged since 0.9.0 release. For the full list of changes, see list of changes

Remove unused CheckResponse.Error (#906)

This field described only authz-specific errors and was filled in
envoy.Handler.Check() response when also returning non-nil error, but in
such case the grpc framework was not using the response anyway.
This field was also used for metrics, but no codepath was actually
setting them, as flowcontrol never set these.

Also:

  • Create errors using grpc/status package, so that we have control on
    the grpc
    status.
  • Add missing sampled logs for error conditions.

Drive-by:

  • Remove unused error from ClassifierEngine.Classify(), as it's
    infallible (all errors are reported individually per-label).
  • Remove unused code from authz.go.

Aperture SDK for Javascript (#817)

Co-authored-by: Hasit Mistry [email protected]

Add authzHandler to sdk-validator's grpc server (#797)

Description of change

Add authzHandler to sdk-validator's grpc server

  • Add CommonHandler
  • Refactor FlowControlHandler with CommonHandler

Alerts pipelines (#893)

Description of change

This introduces basic pipelines for Alerts including the following.

alerts.Alerter interface

This interface is being propagated as part of the platform. It can be
used by any party interested by calling AddAlert(*alerts.Alert)
method. In particular, it will be used by components like
#863.
There are helper functions and methods provided to alerts.Alert struct
for easy construction of such alerts.

Alerts receiver

This receiver calls AlertsChan() method of alerts.Alerter, converts
received alert.Alert structs into OpenTelemetry Logs format and pushes
into the next consumer.
There are convenient functions provided for easy conversions in both
ways, to be used in the Alertmanager exporter
#862.

Alerts processors

Alerts processor add proper labels to the alerts i.e. agent_group,
instance and controller_id.

Ref: GH-861

flowcontrol: restructure codebase II (#898)

Description of change

Making room for adding more APIs (adapters, previews etc) under
flowcontrol.

Document Prometheus metrics and OLAP Flow events (#878)

Description of change

Closes: #720

Speed up ser/deserialization of CheckResponse in envoy authz (#881)

Now CheckResponse is binary-encoded in protobuf wire format and stored
in DynamicMetadata as base64 string. This speeds up serialization, but
also deserialization (in metrics processor).

No changes in envoyfilter defition were needed as envoy's access logger passes
StringValue from dynamic meatadata as-is (previously, it was JSON-encoding a
StructValue into string)

Note: metrics processor still accepts JSON-encoding, so other SDKs should
continue working without changes.

Aperture v0.9.0

07 Nov 13:46
009ea43
Compare
Choose a tag to compare

Changelog

List of aperture PRs merged since 0.8.0 release. For the full list of changes, see list of changes

flowcontrol: restructure codebase II (#898)

Description of change

Making room for adding more APIs (adapters, previews etc) under
flowcontrol.

Document Prometheus metrics and OLAP Flow events (#878)

Speed up ser/deserialization of CheckResponse in envoy authz (#881)

Now CheckResponse is binary-encoded in protobuf wire format and stored
in DynamicMetadata as base64 string. This speeds up serialization, but
also deserialization (in metrics processor).

No changes in envoyfilter defition were needed as envoy's access logger passes
StringValue from dynamic meatadata as-is (previously, it was JSON-encoding a
StructValue into string)

Note: metrics processor still accepts JSON-encoding, so other SDKs should
continue working without changes.

Results

(Based on looking at pprof data)

  • createExtAuthzResponse went from total 18% to total 2.6% (from about 50% of
    authz.Check to about 10%).
  • GetStruct went from total 6% to total 3% (from about 75% of
    metricsprocessor.ConsumeLogs to about 40%)
  • total ~20% improvement
  • now agent's overhead is either comparable or slightly higher than istio
    proxy's (before, it was noticably higher). (Note: istio proxy might also had
    sped up as a result of this change due to not needing to serialize
    protobuf.Struct in access logs, although I haven't measured this precisely)

Use envoy authz in java sdk (#816)

buf dependencies were updated resulting in changes in many generated files.

Restructure flowcontrol directories (#884)

Description of change

Restructure directories

Invalid signals telemetry (#876)

Description of change

  • valid label on signal_reading metric for indicating whether the
    reading was valid.
  • Rename label attribute_found on FluxMeter metric to valid to be
    consistent with Signal metrics.
  • A new panel in Signals dashboard: "Signal Validity (Frequency)"

panichandler: process panic handlers in the same go routine (#875)

Aperture v0.9.0-rc.3

07 Nov 13:31
009ea43
Compare
Choose a tag to compare
Aperture v0.9.0-rc.3 Pre-release
Pre-release

Changelog

List of aperture PRs merged since 0.8.0 release. For the full list of changes, see list of changes

flowcontrol: restructure codebase II (#898)

Description of change

Making room for adding more APIs (adapters, previews etc) under
flowcontrol.

Document Prometheus metrics and OLAP Flow events (#878)

Speed up ser/deserialization of CheckResponse in envoy authz (#881)

Now CheckResponse is binary-encoded in protobuf wire format and stored
in DynamicMetadata as base64 string. This speeds up serialization, but
also deserialization (in metrics processor).

No changes in envoyfilter defition were needed as envoy's access logger passes
StringValue from dynamic meatadata as-is (previously, it was JSON-encoding a
StructValue into string)

Note: metrics processor still accepts JSON-encoding, so other SDKs should
continue working without changes.

Results

(Based on looking at pprof data)

  • createExtAuthzResponse went from total 18% to total 2.6% (from about 50% of
    authz.Check to about 10%).
  • GetStruct went from total 6% to total 3% (from about 75% of
    metricsprocessor.ConsumeLogs to about 40%)
  • total ~20% improvement
  • now agent's overhead is either comparable or slightly higher than istio
    proxy's (before, it was noticably higher). (Note: istio proxy might also had
    sped up as a result of this change due to not needing to serialize
    protobuf.Struct in access logs, although I haven't measured this precisely)

Use envoy authz in java sdk (#816)

buf dependencies were updated resulting in changes in many generated files.

Restructure flowcontrol directories (#884)

Description of change

Restructure directories

Invalid signals telemetry (#876)

Description of change

  • valid label on signal_reading metric for indicating whether the
    reading was valid.
  • Rename label attribute_found on FluxMeter metric to valid to be
    consistent with Signal metrics.
  • A new panel in Signals dashboard: "Signal Validity (Frequency)"

panichandler: process panic handlers in the same go routine (#875)

Aperture v0.9.0-rc.2

07 Nov 13:11
4a29e40
Compare
Choose a tag to compare
Aperture v0.9.0-rc.2 Pre-release
Pre-release

Changelog

List of aperture PRs merged since 0.9.0-rc.1 release. For the full list of changes, see list of changes

Aperture v0.9.0-rc.1

04 Nov 13:01
ffb6aee
Compare
Choose a tag to compare
Aperture v0.9.0-rc.1 Pre-release
Pre-release

Changelog

List of aperture PRs merged since 0.8.0 release. For the full list of changes, see list of changes

Use envoy authz in java sdk (#816)

buf dependencies were updated resulting in changes in many generated files.

Restructure flowcontrol directories (#884)

Description of change

Restructure directories

Invalid signals telemetry (#876)

Description of change

  • valid label on signal_reading metric for indicating whether the
    reading was valid.
  • Rename label attribute_found on FluxMeter metric to valid to be
    consistent with Signal metrics.
  • A new panel in Signals dashboard: "Signal Validity (Frequency)"

panichandler: process panic handlers in the same go routine (#875)

remove unused panic handler

Aperture v0.8.0

31 Oct 09:59
77ac2c2
Compare
Choose a tag to compare

Changelog

List of aperture PRs merged since 0.7.0 release. For the full list of changes, see list of changes

Revamp workload and flux meter metrics and labels (#843)

Description of change

  • New label attribute_found in FluxMeter to denote if the attribute on
    which the flux meter is based was found in the access log/span
  • Removed label decision_type on summary workload_latency_ms since
    it is now emitted only if response was received.
  • New counter workload_requests_total to measure the workload
    decisions count since the summary does not take into account the
    scenarios where response is not received e.g. rejects or connection
    resets.
  • A new column response_received on OLAP Flow events to denote the
    case when response is not received.

Ignore negative workload latency (#839)

Issue

  • Workload latency in case of Envoy is calculated as:
workload_latency = response_latency - aperture_latency
  • Workload Latency can become negative in case of connection reset
  • If the connection is aborted by Client or Server Envoy immediately
    terminates the connection for the other endpoint.
  • In the Access Log, status code is set as 0 and response_latency is
    set as zero.
  • If Authz call to Aperture Agent had succeeded for this request, then
    aperture_latency is greater than zero.
    • This would lead the workload_latency to be computed as negative.
      Screenshot from 2022-10-28 19-24-44

Fix

  • Ignore negative workload latency I.E. don't populate the workload
    latency column
  • Publish Prometheus metrics for flux-meter or workload latency only if
    the metric column is found

TickInfo in LoadDecision (#836)

Description of change

  • Put TickInfo in LoadDecision` to re-trigger fill-rate evaluation at
    Agent.

Re-structure protos (#831)

Fix telemetry labels propagation (#835)

Description of change

This fixes regression introduced in
#828.

Dynamic Telemetry Flow Labels were added before labels filtering, which
led them to be incorrectly filtered out.

Fix telemetry labels propagation (#835)

Description of change

This fixes regression introduced in
#828.

Dynamic Telemetry Flow Labels were added before labels filtering, which
led them to be incorrectly filtered out.

Bump OTEL to 0.63.0 (#834)

Description of change

Bumps OTEL and FN OTEL to 0.63.0. This removes Istio 1.15 compat hack as
it is included in the upstream OTEL.

Response status in telemetry (#828)

Description of change

This introduces aperture.response_status column in telemetry. It
mirrors the implementation of response_status label for metrics.
This also extends above logic to include 1xx, 2xx, and 3xx codes
as OK instead of only 2xx codes.

Besides this, some cleanup is done:

  1. Above logic is moved from FluxMeter to OTEL package. This changes
    FluxMeter interface!
  2. A log of logic is moved from metricsprocessor to
    metricsprocessor/internal for better visibility and easier separation
    of functions which are called directly in metricsprocessor and helpers,
  3. The above made creating UT much easier, so this PR also includes
    some.

Ref: fluxninja/cloud#6788

Dry run mode for Load Actuator (#826)

Description of change

  • Dry run mode for Load Actuator. No traffic can get dropped due to this
    Load Actuator in this mode. Useful for observing the behavior of Load
    Actuator without any disruptions.
  • Load Actuator has a new Pass through mode
  • Default to Pass through mode in case multiplier is invalid and also
    when there is no decision available at the Agent including
    initialization

Rollup based on metrics (#821)

Closes: GH-515

docs: playground doc updates (#819)

Description of change

  • Moved demo_app to playground
  • Added more details to playground documentation
  • Bump istio and other tools

Aperture v0.8.0-rc.4

31 Oct 09:34
77ac2c2
Compare
Choose a tag to compare
Aperture v0.8.0-rc.4 Pre-release
Pre-release

Changelog

List of aperture PRs merged since 0.8.0-rc.3 release. For the full list of changes, see list of changes

Aperture v0.8.0-rc.3

31 Oct 09:32
bd95d66
Compare
Choose a tag to compare
Aperture v0.8.0-rc.3 Pre-release
Pre-release

Changelog

List of aperture PRs merged since 0.8.0-rc.2 release. For the full list of changes, see list of changes

Revamp workload and flux meter metrics and labels (#843)

Description of change

  • New label attribute_found in FluxMeter to denote if the attribute on
    which the flux meter is based was found in the access log/span
  • Removed label decision_type on summary workload_latency_ms since
    it is now emitted only if response was received.
  • New counter workload_requests_total to measure the workload
    decisions count since the summary does not take into account the
    scenarios where response is not received e.g. rejects or connection
    resets.
  • A new column response_received on OLAP Flow events to denote the
    case when response is not received.

Skip NaN auto-tokens (#840)

Ignore negative workload latency (#839)

Issue

  • Workload latency in case of Envoy is calculated as:
workload_latency = response_latency - aperture_latency
  • Workload Latency can become negative in case of connection reset
  • If the connection is aborted by Client or Server Envoy immediately
    terminates the connection for the other endpoint.
  • In the Access Log, status code is set as 0 and response_latency is
    set as zero.
  • If Authz call to Aperture Agent had succeeded for this request, then
    aperture_latency is greater than zero.
    • This would lead the workload_latency to be computed as negative.
      Screenshot from 2022-10-28 19-24-44

Fix

  • Ignore negative workload latency I.E. don't populate the workload
    latency column
  • Publish Prometheus metrics for flux-meter or workload latency only if
    the metric column is found
Checklist
  • Tested in playground or other setup

TickInfo in LoadDecision (#836)

Description of change

  • Put TickInfo in LoadDecision` to re-trigger fill-rate evaluation at
    Agent.

Re-structure protos (#831)

Fix telemetry labels propagation (#835)

Description of change

This fixes regression introduced in
#828.

Dynamic Telemetry Flow Labels were added before labels filtering, which
led them to be incorrectly filtered out.

Checklist
  • Tested in playground or other setup
  • Breaking changes

Aperture v0.8.0-rc.2

28 Oct 13:13
Compare
Choose a tag to compare
Aperture v0.8.0-rc.2 Pre-release
Pre-release

Changelog

List of aperture PRs merged since 0.8.0-rc.1 release. For the full list of changes, see list of changes

Fix telemetry labels propagation (#835)

Description of change

This fixes regression introduced in
#828.

Dynamic Telemetry Flow Labels were added before labels filtering, which
led them to be incorrectly filtered out.

Checklist
  • Tested in playground or other setup
  • Breaking changes