Skip to content

rogerhmar/OTEL-NDC-2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenTelemetry Workshop

OpenTelemetry Starter Pack NDC {Oslo} 2024

Run Infrastructure - Getting started

1. Verify prerequisites

  • Docker is running? Run docker ps. You should not get any errors-
    • error during connect means Rancher Desktop/Podman Desktop/Docker Desktop has not been starterd
  • Docker-Compose is installed? Run Docker-Compose -v
  • Dotnet is installed? dotnet --version. This should display the version.

2. Run the infrastructure

  • Start by running docker compose up add -d to run detatched (just start it without displaying all logs)
  • To clean up any docker container run docker-compose down from this folder.

3. Run the demo application - ExampleApi

Run it with dotnet run or inside an IDE. This will give you more about URLs you can visit.

4. Run a test to verify the setup

Go to localhost:8080/test

Setup

Prerequisites

  • Tool for building and running containers, e.g. Docker Desktop, Podman or Rancher Desktop - Including Compose.
  • Dotnet 8

Overall Architecture

flowchart LR
    A["Application(example)"] --> B("OpenTelemetry Collector \n :4317 (gRPC)")
    H["dependency1"] --> B
    I["dependency2"] --> B
    J["dependency3"] --> B
    B --> D("Loki \n :3100") --> G("Grafana")
    B --> E("Tempo \n :3200") --> G
    B --> F(Prometheus \n :9090)--> G
    K(k6)-->|remote write|F

    style A fill:red
    style B fill:green
    style H fill:blue
    style I fill:blue
    style J fill:blue
    style D fill:yellow
    style E fill:orange
    style F fill:cyan
    style G fill:purple
Loading
  • Your application, called example, send data directly to the OTEl collector over gRPC
  • The "legacy" dependencies use autoinstrumentation and do the same
  • Prometheus scrapes data from the OTEL collector
  • Collector writes data to Loki and Tempo
  • Grafana uses the 3 sources to display data
  • k6 writes to prometheus using remote write

Versions

Versions are defined in .env

Note 09.06.2024: Tempo is currently running version 2.4.2. The latest version 2.5.0 have breaking changes related to the ownership of Ownership of /var/tempo. This was causing issues. Refer to https://github.com/grafana/tempo/releases/tag/v2.5.0 for more info.

Alternative to this setup

Grafana has release a simplified setup with a single container, This is found here: https://github.com/grafana/docker-otel-lgtm/.

Instrumentation - What are our options?

There are 2 options for setting up OpenTelemetry in .NET applications.

  • Setup with code with the option of using both manual and automatic instrumentation
  • No-code automatic setup of automatic instrumentation. Manual instrumentation cannot be added

Manual setup

The example app uses this setup. Refer to SetupOpentelemetry to see how this may be done. For more infomation and examples refer to

The self-paced tasks focus on the manual setup.

Automatic instrumentation

This setup has include 3 containers (dependency1..3) with automatic instrumentation. The 2 main solutions for doing this is:

  • Download and run otel-dotnet-auto-install.sh or OpenTelemetry.DotNet.Auto.psm1 or
  • Include Nuget OpenTelemetry.AutoInstrumentation

We have used the latter approch. Refer to dependency1 docker-compose.yaml for an example. This example includes ENV variables for easier debugging.

For more information refer to:

Example App

    .
    ├── exampleAPI.http        <- To run HTTP command. An alternative to using Swagger or browser
    ├── MapRoutesExtensions.cs <- Sets up the routes
    ├── SetupOpentelemetry.cs  <- All OpenTelemetry setup for Logging, Tracing and Metrics
    └── Program.cs...          <- All the normal stuff

Startup app and go to http://localhost:5000/

Grafana and some fundamentals for viewing data

Grafana shows the data using 3 data sources:

  • Tempo for tracing
  • Prometheus for metrics
  • Loki for Logging

PS: Message Failed to authenticate request might appear. This should not have any impact, but is noisy. Go to Sign in in the top right corner and sign in with user and password, admin and admin

Before looking at data, you need to populate some data. You can do that by running some of the HTTP requests in:

Then open Grafana on localhost:3000

Tempo - Tracing - Connecting components

flowchart LR
    A["Grafana Home"] --> B("Explore") --> C("Select source 'Tempo' - Default data source is 'Loki'") --> D("Select 'Search' - Default is 'TraceQL'")

    style A fill:blue
    style B fill:Yellow
    style C fill:Green
    style D fill:Red
Loading

Read more about TraceQL here: https://grafana.com/docs/tempo/latest/traceql/

Should look something like this:

Prometheus - Metrics - Statistics

flowchart LR
    A["Grafana Home"] --> B("Explore") --> C("Select source 'Prometheus' - Default data source is 'Loki'") --> D("Metrics browser'")

    style A fill:blue
    style B fill:Yellow
    style C fill:Green
    style D fill:Red
Loading

Prometheus uses PromQL as a query language. Here are some examples: https://prometheus.io/docs/prometheus/latest/querying/examples/

You should be able to run http_client_request_duration_seconds_count{}. Send HTTP requests e.g. from exampleAPI.http

Should look something like this:

Loki - Logging - Telling the story

flowchart LR
    A["Grafana Home"] --> B("Explore") --> C("Default data source is 'Loki'") --> D("Add a LogQL")

    style A fill:blue
    style B fill:Yellow
    style C fill:Green
    style D fill:Red
Loading

Loki uses LogQL. Refer to https://grafana.com/docs/loki/latest/query/. You can start by adding LogQL: {exporter="OTLP"}. This will show all log records exported by the OpenTelemtry Collector

Should look something like this:

Dashboards

Self-paced Tasks

Verify Setup and getting started

Has everything started?

Is the .NET app connected correctly?

  • Go to Loki last 5 min
  • Confirm that Service name has not been set. It should be unknown_service:dotnet (at this stage)
  • Understand the basics for the setup. Refer to SetupOpentelemetry.cs
  • Verify that ExampleApi is also sending Metrics and Traces the OpenTelemetry Collector, by checking Prometheus and Tempo in Grafana. Refer to section Grafana and some fundamentals for viewing data

Tracing

Task T1: Reducing noise

Task T2: Use the API and understand the delay in the execution

  • send requests to the APIs
  • It is easy understand what is causing the delay? Check if you understand it in Tempo
  • Find out why tracing is missing in ThisNeedsToBeTraced. Note that you need to start the activity, not only create it.

Task T3: Use the API and get exceptions

Metrics

Task M1: Understanding the data flow

Task M2: What metrics are added?

Task M3: Add a custom metric

  • Add another counter to the setup
  • Verify that this has been added to your metrics

Logging

Task L1: Add Logging and understand the log record

  • Open /parallel
  • Open Loki with LogQL {exporter="OTLP"}
    • Get to know the log record. Some key fields are:
      • body
      • severety
      • attributes
      • resources
      • instrumentation scope name
  • Open /metric/inc/10
    • How did instrumentation scope change?
  • Stop the application
    • Look at the log record with body "Application is shutting down...". Does it have spanid and traceid? Why?
  • PS: LogQL {exporter="OTLP"} | json | line_format "{{.body}}" gives a clean prinout of log record body. Try this. Observe how the log records are flattended

Task L2: Find Traces from log record

  • Send request to http://localhost:5000/parallel
  • Go to log lines created during this request processing, and find the trace ID
  • Go to the trace in Tempo.

Task L3: Using scopes

  • Go to SuperService.cs, and add logger.BeginScope
  • What do you see in Loki. What and where is the scope state added in the log record?
  • Advanced/optional:
    • Try to duplicate the scope call
    • Do you see the duplicated entry?
    • What do you get if you mutate the value (and not the key) logger.BeginScope?
    • Look at the log record in the custom processer in SetupOpentelemetry.cs
    • And look in he OtlpLogRecordTransformer. Can you find the duplicated attribute key?

Task L4: Include formatted message?

  • Go to SetupOpentelemetry.cs, and set IncludeFormattedMessage = false in the configuration of the logging.
  • Open /parallel
  • Go to Loki and observe how the body changes.

Bonus

Task B1: Run k6 test and observe the system under load

  • Install k6. E.g. by running this https://dl.k6.io/msi/k6-latest-amd64.msi
  • Docs: https://k6.io/docs/
  • Run the test with K6_PROMETHEUS_RW_SERVER_URL=http://localhost:9090/api/v1/write K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM=true k6 run -o experimental-prometheus-rw test/script.js
  • View the result in the dashboard called k6 Prometheus

Task B2: Configure using Signal specific AddOtlpExpoerter methods

Task B3: Debugging

Where can I go from here?

You can e.g.

Credit

This setup originally is based on: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/examples/demo