This tutorial section covers the manual instrumentation of a go application with the opentelemetry-sdk.
As a basis for the instrumentation we use backend4-uninstrumented (backend4-instrumented). To compile the application you need go 1.22 or newer.
Before we start instrumenting our application, we should create a standard tracer and register it globally. This ensures that the same exporter is used throughout the application and that the same process steps are performed.
func main() {
+ otelExporter, err := otlptracegrpc.New(context.Background())
+ if err != nil {
+ fmt.Printf("failed to create trace exporter: %s\n", err)
+ os.Exit(1)
+ }
+ tp := sdktrace.NewTracerProvider(sdktrace.WithBatcher(otelExporter))
+ otel.SetTracerProvider(tp)
...
First we have to create a tracer that identifies our specific service.
+var tracer = otel.GetTracerProvider().Tracer("github.com/kubecon-eu-2024/backend")
As we begin to instrument our application, we should remember that no data is free. By focusing on the most critical endpoint and its associated functions, we can ensure that our instrumentation efforts are focused on capturing the most valuable telemetry. Let's explore how we can instrument these key components to improve the observability and reliability of our application.
In our example backend, the entry point /rolldice
starts our critial operation.
Using the prevously defined tracer, we can create a new spans. The method Start
creates a span and a context.Context
containing the newly-created span. If the context.Context provided in ctx
contains a Span then the newly-created Span will be a child of that span, otherwise it will be a root span.
Keep in mind that any Span that is created MUST
also be ended. This is the responsibility of the user. Implementations of this API may leak memory or other resources if Spans are not ended. Documentation.
When defining a span name, it's important to choose descriptive and meaningful names that accurately reflect the operation being performed.
mux := http.NewServeMux()
+ registerHandleFunc := func(pattern string, h http.HandlerFunc) {
+ route := strings.Split(pattern, " ")
+ mux.Handle(pattern, otelhttp.NewHandler(otelhttp.WithRouteTag(route[len(route)-1], h), pattern))
+ }
To simulate a more complex behaviour, we find a causeError
function in the /rolldice
handler source code of the backend4 application. Since there is a defined probability that errors will occur, it makes sense to take this part into account as well.
Therefore we have to make sure that the context, which was previously created with the rootspan, is passed to this function.
RecordError will record err as an exception span event for this span. An additional call to SetStatus is required if the Status of the Span should be set to Error, as this method does not change the Span status. If this span is not being recorded or err is nil then this method does nothing.
func causeError(ctx context.Context, rate int) error {
+ _, span := tracer.Start(ctx, "causeError")
+ defer span.End()
randomNumber := rand.Intn(100)
+ span.AddEvent("roll", trace.WithAttributes(attribute.Int("number", randomNumber)))
if randomNumber < rate {
err := fmt.Errorf("number(%d)) < rate(%d)", randomNumber, rate)
+ span.RecordError(err)
+ span.SetStatus(codes.Error, "some error occured")
return err
}
return nil
}
In the same execution path we also find a function that ensures high delays of our /rolldice
endpoint with a fixed probability.
AddEvent adds an event with the provided name and optionsAddEvent adds an event with the provided name and options.
func causeDelay(ctx context.Context, rate int) {
+ _, span := tracer.Start(ctx, "causeDelay")
+ defer span.End()
randomNumber := rand.Intn(100)
+ span.AddEvent("roll", trace.WithAttributes(attribute.Int("number", randomNumber)))
if randomNumber < rate {
time.Sleep(time.Duration(2+rand.Intn(3)) * time.Second)
}
}
Once the code has been instrumented, we can use go mod tidy
to update the existing go.mod
file and start testing our application.
Now that we have instrumentalised backend4
, we can use it as a drop-in replacement for backend2
.
For this we need to build and provide a new container image or use the prepared backend4:with-instr
version.
kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2024-opentelemetry-kubernetes-tracing-tutorial/main/app/instrumentation-replace-backend2.yaml
When using kubectl diff
we should see something similar to this.
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend2-deployment
namespace: tutorial-application
labels:
app: backend2
spec:
template:
metadata:
labels:
app: backend2
annotations:
prometheus.io/scrape: "true"
+ instrumentation.opentelemetry.io/inject-sdk: "true"
template:
spec:
containers:
- name: backend2
- image: ghcr.io/pavolloffay/kubecon-eu-2024-opentelemetry-kubernetes-tracing-tutorial-backend2:latest
+ image: ghcr.io/pavolloffay/kubecon-eu-2024-opentelemetry-kubernetes-tracing-tutorial-backend4:latest
env:
+ - name: RATE_ERROR
+ value: 20
+ - name: RATE_HIGH_DELAY
+ value: 20
Note
This is an optional section.
Run and test backend 4 locally (shorter development cycle)
To get quick feedback, we can run a Jaeger instance locally and point our application at it. Jaeger all-in-one will make this easy.
docker run --rm -it -p 127.0.0.1:4317:4317 -p 127.0.0.1:16686:16686 -e COLLECTOR_OTLP_ENABLED=true -e LOG_LEVEL=debug jaegertracing/all-in-one:latest
Now we can configure our application with a specific RATE_ERROR
and RATE_DELAY
in %
. This indicates how many traces should be delayed and/or cause an error.
Finally we need to configure the OpenTelemetry-SDK, by default we can use common environment variables. Documentation
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 OTEL_SERVICE_NAME=go-backend RATE_ERROR=20 RATE_HIGH_DELAY=20 go run main.go
By instrumenting our applications, whether manually or automatically, we get more telemetry data to help us understand our system. However, since a large amount of telemetry data also generates costs, in the next chapter we will discuss how we can utilise this amount in a meaningful way.