Skip to content

Conversation

Copy link

Copilot AI commented Nov 18, 2025

Implements standard Prometheus metrics across all services according to ADR-0014 (Observability Tools) and ADR-0018 (Service Metrics). Previously, services lacked consistent metrics for version tracking, SLA monitoring, and cache performance.

Changes

Metrics Implementation

Added standard metric sets to 4 services (link, metadata, bff, proxy):

Build Information - Exposes version, commit, build time

// Go services
buildInfo.WithLabelValues(version, commit, buildTime).Set(1)
// Namespace: <service>_application_build_info

SLA/SLO/SLI Metrics

  • service_availability_ratio - Uptime tracking for SLA compliance
  • http_request_duration_seconds - Response time histogram for SLO monitoring
  • error_rate_per_minute - Error counter for reliability SLI

Cache Metrics

  • cache_hit_total / cache_miss_total - Cache effectiveness tracking
  • Hit ratio calculated as hits / (hits + misses)

Service Metrics

  • requests_per_second - Load tracking
  • response_time_seconds - Performance histogram
  • error_rate_percentage - Error rate gauge

Service Updates

  • Go services (link, metadata, bff): Added metrics.go with Prometheus client using promauto
  • Proxy service (TypeScript): Added StandardMetrics.ts using OpenTelemetry Metrics API
  • Updated all service main files to set build info on startup

Documentation

  • docs/tutorial/observability-metrics.md - Complete metrics reference with PromQL examples
  • docs/tutorial/observability-upgrade-summary.md - Implementation guide and integration steps
  • Updated docs/tutorial/observability.md with references

Integration Requirements

Services require build-time variable injection:

go build -ldflags "-X main.version=$VERSION -X main.commit=$COMMIT -X main.buildTime=$(date -u +%Y-%m-%dT%H:%M:%SZ)"

Metrics are exposed but not yet actively recorded. Applications need to call metric functions in:

  • Cache implementations (record hits/misses)
  • HTTP middleware (record durations)
  • Error handlers (increment error counters)

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/repos/open-telemetry/opentelemetry-go/releases/latest
    • Triggering command: curl -s REDACTED (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Upgrade observability in the project


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI self-assigned this Nov 18, 2025
Copilot AI and others added 3 commits November 18, 2025 14:58
- Implement build_info metrics per ADR-0014
- Add SLA/SLO/SLI metrics per ADR-0018
- Add cache metrics (hit/miss tracking)
- Add standard service metrics (RPS, response time, error rate)
- Update all Go services (link, metadata, bff)
- Update TypeScript service (proxy)
- Add comprehensive documentation

Co-authored-by: batazor <[email protected]>
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
70.9% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Copilot AI changed the title [WIP] Upgrade observability tools in the project Add standard observability metrics across all services per ADR-0018 Nov 18, 2025
Copilot AI requested a review from batazor November 18, 2025 15:06
Copilot finished work on behalf of batazor November 18, 2025 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants