Skip to content

Commit

Permalink
Prepare for migration to new runtime metrics (#5747)
Browse files Browse the repository at this point in the history
Part of #5655

This is a refactoring to prepare for the implementation of the new
runtime metrics. It:

* Adds support for `OTEL_GO_X_DEPRECATED_RUNTIME_METRICS`, which can be
set to `true` or `false` to enable/disable the existing runtime metrics.
It initially defaults to `true` while the new metrics are under
development.
* Moves the existing runtime metrics implementation to its own internal
package, deprecatedruntime, to clearly separate it from the new metrics
being added, and to make the eventual removal easier.

This does not change any of the metrics generated, or the public API
surface of the runtime metrics package.
  • Loading branch information
dashpole authored Jun 13, 2024
1 parent a794a70 commit ae2a4f0
Show file tree
Hide file tree
Showing 9 changed files with 456 additions and 268 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
- The `go.opentelemetry.io/contrib/config` add support to configure periodic reader interval and timeout. (#5661)
- Add support to configure views when creating MeterProvider using the config package. (#5654)
- Add log support for the autoexport package. (#5733)
- Add support for disabling the old runtime metrics using the `OTEL_GO_X_DEPRECATED_RUNTIME_METRICS=false` environment variable. (#5747)

### Fixed

Expand Down
4 changes: 4 additions & 0 deletions instrumentation/runtime/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,16 @@ module go.opentelemetry.io/contrib/instrumentation/runtime
go 1.21

require (
github.com/stretchr/testify v1.9.0
go.opentelemetry.io/otel v1.27.0
go.opentelemetry.io/otel/metric v1.27.0
)

require (
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/go-logr/logr v1.4.2 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
go.opentelemetry.io/otel/trace v1.27.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)
2 changes: 2 additions & 0 deletions instrumentation/runtime/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,7 @@ go.opentelemetry.io/otel/metric v1.27.0 h1:hvj3vdEKyeCi4YaYfNjv2NUje8FqKqUY8IlF0
go.opentelemetry.io/otel/metric v1.27.0/go.mod h1:mVFgmRlhljgBiuk/MP/oKylr4hs85GZAylncepAX/ak=
go.opentelemetry.io/otel/trace v1.27.0 h1:IqYb813p7cmbHk0a5y6pD5JPakbVfftRXABGt5/Rscw=
go.opentelemetry.io/otel/trace v1.27.0/go.mod h1:6RiD1hkAprV4/q+yd2ln1HG9GoPx39SuvvstaLBl+l4=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
22 changes: 22 additions & 0 deletions instrumentation/runtime/internal/deprecatedruntime/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

// Package deprecatedruntime implements the deprecated runtime metrics for OpenTelemetry.
//
// The metric events produced are:
//
// runtime.go.cgo.calls - Number of cgo calls made by the current process
// runtime.go.gc.count - Number of completed garbage collection cycles
// runtime.go.gc.pause_ns (ns) Amount of nanoseconds in GC stop-the-world pauses
// runtime.go.gc.pause_total_ns (ns) Cumulative nanoseconds in GC stop-the-world pauses since the program started
// runtime.go.goroutines - Number of goroutines that currently exist
// runtime.go.lookups - Number of pointer lookups performed by the runtime
// runtime.go.mem.heap_alloc (bytes) Bytes of allocated heap objects
// runtime.go.mem.heap_idle (bytes) Bytes in idle (unused) spans
// runtime.go.mem.heap_inuse (bytes) Bytes in in-use spans
// runtime.go.mem.heap_objects - Number of allocated heap objects
// runtime.go.mem.heap_released (bytes) Bytes of idle spans whose physical memory has been returned to the OS
// runtime.go.mem.heap_sys (bytes) Bytes of heap memory obtained from the OS
// runtime.go.mem.live_objects - Number of live objects is the number of cumulative Mallocs - Frees
// runtime.uptime (ms) Milliseconds since application was initialized
package deprecatedruntime // import "go.opentelemetry.io/contrib/instrumentation/runtime/internal/deprecatedruntime"
282 changes: 282 additions & 0 deletions instrumentation/runtime/internal/deprecatedruntime/runtime.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,282 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package deprecatedruntime // import "go.opentelemetry.io/contrib/instrumentation/runtime/internal/deprecatedruntime"

import (
"context"
goruntime "runtime"
"sync"
"time"

"go.opentelemetry.io/otel/metric"
)

// Runtime reports the work-in-progress conventional runtime metrics specified by OpenTelemetry.
type runtime struct {
minimumReadMemStatsInterval time.Duration
meter metric.Meter
}

// Start initializes reporting of runtime metrics using the supplied config.
func Start(meter metric.Meter, minimumReadMemStatsInterval time.Duration) error {
r := &runtime{
meter: meter,
minimumReadMemStatsInterval: minimumReadMemStatsInterval,
}
return r.register()
}

func (r *runtime) register() error {
startTime := time.Now()
uptime, err := r.meter.Int64ObservableCounter(
"runtime.uptime",
metric.WithUnit("ms"),
metric.WithDescription("Milliseconds since application was initialized"),
)
if err != nil {
return err
}

goroutines, err := r.meter.Int64ObservableUpDownCounter(
"process.runtime.go.goroutines",
metric.WithDescription("Number of goroutines that currently exist"),
)
if err != nil {
return err
}

cgoCalls, err := r.meter.Int64ObservableUpDownCounter(
"process.runtime.go.cgo.calls",
metric.WithDescription("Number of cgo calls made by the current process"),
)
if err != nil {
return err
}

_, err = r.meter.RegisterCallback(
func(ctx context.Context, o metric.Observer) error {
o.ObserveInt64(uptime, time.Since(startTime).Milliseconds())
o.ObserveInt64(goroutines, int64(goruntime.NumGoroutine()))
o.ObserveInt64(cgoCalls, goruntime.NumCgoCall())
return nil
},
uptime,
goroutines,
cgoCalls,
)
if err != nil {
return err
}

return r.registerMemStats()
}

func (r *runtime) registerMemStats() error {
var (
err error

heapAlloc metric.Int64ObservableUpDownCounter
heapIdle metric.Int64ObservableUpDownCounter
heapInuse metric.Int64ObservableUpDownCounter
heapObjects metric.Int64ObservableUpDownCounter
heapReleased metric.Int64ObservableUpDownCounter
heapSys metric.Int64ObservableUpDownCounter
liveObjects metric.Int64ObservableUpDownCounter

// TODO: is ptrLookups useful? I've not seen a value
// other than zero.
ptrLookups metric.Int64ObservableCounter

gcCount metric.Int64ObservableCounter
pauseTotalNs metric.Int64ObservableCounter
gcPauseNs metric.Int64Histogram

lastNumGC uint32
lastMemStats time.Time
memStats goruntime.MemStats

// lock prevents a race between batch observer and instrument registration.
lock sync.Mutex
)

lock.Lock()
defer lock.Unlock()

if heapAlloc, err = r.meter.Int64ObservableUpDownCounter(
"process.runtime.go.mem.heap_alloc",
metric.WithUnit("By"),
metric.WithDescription("Bytes of allocated heap objects"),
); err != nil {
return err
}

if heapIdle, err = r.meter.Int64ObservableUpDownCounter(
"process.runtime.go.mem.heap_idle",
metric.WithUnit("By"),
metric.WithDescription("Bytes in idle (unused) spans"),
); err != nil {
return err
}

if heapInuse, err = r.meter.Int64ObservableUpDownCounter(
"process.runtime.go.mem.heap_inuse",
metric.WithUnit("By"),
metric.WithDescription("Bytes in in-use spans"),
); err != nil {
return err
}

if heapObjects, err = r.meter.Int64ObservableUpDownCounter(
"process.runtime.go.mem.heap_objects",
metric.WithDescription("Number of allocated heap objects"),
); err != nil {
return err
}

// FYI see https://github.com/golang/go/issues/32284 to help
// understand the meaning of this value.
if heapReleased, err = r.meter.Int64ObservableUpDownCounter(
"process.runtime.go.mem.heap_released",
metric.WithUnit("By"),
metric.WithDescription("Bytes of idle spans whose physical memory has been returned to the OS"),
); err != nil {
return err
}

if heapSys, err = r.meter.Int64ObservableUpDownCounter(
"process.runtime.go.mem.heap_sys",
metric.WithUnit("By"),
metric.WithDescription("Bytes of heap memory obtained from the OS"),
); err != nil {
return err
}

if ptrLookups, err = r.meter.Int64ObservableCounter(
"process.runtime.go.mem.lookups",
metric.WithDescription("Number of pointer lookups performed by the runtime"),
); err != nil {
return err
}

if liveObjects, err = r.meter.Int64ObservableUpDownCounter(
"process.runtime.go.mem.live_objects",
metric.WithDescription("Number of live objects is the number of cumulative Mallocs - Frees"),
); err != nil {
return err
}

if gcCount, err = r.meter.Int64ObservableCounter(
"process.runtime.go.gc.count",
metric.WithDescription("Number of completed garbage collection cycles"),
); err != nil {
return err
}

// Note that the following could be derived as a sum of
// individual pauses, but we may lose individual pauses if the
// observation interval is too slow.
if pauseTotalNs, err = r.meter.Int64ObservableCounter(
"process.runtime.go.gc.pause_total_ns",
// TODO: nanoseconds units
metric.WithDescription("Cumulative nanoseconds in GC stop-the-world pauses since the program started"),
); err != nil {
return err
}

if gcPauseNs, err = r.meter.Int64Histogram(
"process.runtime.go.gc.pause_ns",
// TODO: nanoseconds units
metric.WithDescription("Amount of nanoseconds in GC stop-the-world pauses"),
); err != nil {
return err
}

_, err = r.meter.RegisterCallback(
func(ctx context.Context, o metric.Observer) error {
lock.Lock()
defer lock.Unlock()

now := time.Now()
if now.Sub(lastMemStats) >= r.minimumReadMemStatsInterval {
goruntime.ReadMemStats(&memStats)
lastMemStats = now
}

o.ObserveInt64(heapAlloc, int64(memStats.HeapAlloc))
o.ObserveInt64(heapIdle, int64(memStats.HeapIdle))
o.ObserveInt64(heapInuse, int64(memStats.HeapInuse))
o.ObserveInt64(heapObjects, int64(memStats.HeapObjects))
o.ObserveInt64(heapReleased, int64(memStats.HeapReleased))
o.ObserveInt64(heapSys, int64(memStats.HeapSys))
o.ObserveInt64(liveObjects, int64(memStats.Mallocs-memStats.Frees))
o.ObserveInt64(ptrLookups, int64(memStats.Lookups))
o.ObserveInt64(gcCount, int64(memStats.NumGC))
o.ObserveInt64(pauseTotalNs, int64(memStats.PauseTotalNs))

computeGCPauses(ctx, gcPauseNs, memStats.PauseNs[:], lastNumGC, memStats.NumGC)

lastNumGC = memStats.NumGC

return nil
},
heapAlloc,
heapIdle,
heapInuse,
heapObjects,
heapReleased,
heapSys,
liveObjects,

ptrLookups,

gcCount,
pauseTotalNs,
)
if err != nil {
return err
}
return nil
}

func computeGCPauses(
ctx context.Context,
recorder metric.Int64Histogram,
circular []uint64,
lastNumGC, currentNumGC uint32,
) {
delta := int(int64(currentNumGC) - int64(lastNumGC))

if delta == 0 {
return
}

if delta >= len(circular) {
// There were > 256 collections, some may have been lost.
recordGCPauses(ctx, recorder, circular)
return
}

length := uint32(len(circular))

i := lastNumGC % length
j := currentNumGC % length

if j < i { // wrap around the circular buffer
recordGCPauses(ctx, recorder, circular[i:])
recordGCPauses(ctx, recorder, circular[:j])
return
}

recordGCPauses(ctx, recorder, circular[i:j])
}

func recordGCPauses(
ctx context.Context,
recorder metric.Int64Histogram,
pauses []uint64,
) {
for _, pause := range pauses {
recorder.Record(ctx, int64(pause))
}
}
38 changes: 38 additions & 0 deletions instrumentation/runtime/internal/x/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Feature Gates

The runtime package contains a feature gate used to ease the migration
from the [previous runtime metrics conventions] to the new [OpenTelemetry Go
Runtime conventions].

Note that the new runtime metrics conventions are still experimental, and may
change in backwards incompatible ways as feedback is applied.

## Features

- [Include Deprecated Metrics](#include-deprecated-metrics)

### Include Deprecated Metrics

Once new experimental runtime metrics are added, they will be produced
**in addition to** the existing runtime metrics. Users that migrate right away
can disable the old runtime metrics:

```console
export OTEL_GO_X_DEPRECATED_RUNTIME_METRICS=false
```

In a later release, the deprecated runtime metrics will stop being produced by
default. To temporarily re-enable the deprecated metrics:

```console
export OTEL_GO_X_DEPRECATED_RUNTIME_METRICS=true
```

After two additional releases, the deprecated runtime metrics will be removed,
and setting the environment variable will no longer have any effect.

The value set must be the case-insensitive string of `"true"` to enable the
feature, and `"false"` to disable the feature. All other values are ignored.

[previous runtime metrics conventions]: go.opentelemetry.io/contrib/instrumentation/runtime/internal/deprecatedruntime
[OpenTelemetry Go Runtime conventions]: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/runtime/go-metrics.md
Loading

0 comments on commit ae2a4f0

Please sign in to comment.