Skip to content

Conversation

@endigma
Copy link
Member

@endigma endigma commented Nov 14, 2025

Summary by CodeRabbit

  • New Features

    • Added FlightRecorder to capture detailed traces when request latency exceeds a configurable threshold.
    • New command-line entrypoint to run the FlightRecorder mode.
  • Configuration

    • Exposed settings: output directory, latency threshold (ms), and single vs. multiple trace captures.
  • Documentation

    • Usage guide with examples, run/build instructions, configuration details, and environment setup notes.

✏️ Tip: You can customize this high-level summary in your review settings.

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
  • I have read the Contributors Guide.

@coderabbitai
Copy link

coderabbitai bot commented Nov 14, 2025

Walkthrough

Adds a new FlightRecorder module (recording request traces to files when latency exceeds a threshold), a Go executable entrypoint, and README documentation; integrates with router lifecycle via Provision and Cleanup and supports single or multiple recordings.

Changes

Cohort / File(s) Summary
Documentation
router/cmd/flightrecorder/README.md
Adds README describing module usage, configuration fields (outputPath, requestLatencyRecordThreshold, recordMultiple), minimal example main, run/build commands, environment setup, and references to integration tests.
Entrypoint
router/cmd/flightrecorder/main.go
New Go executable entrypoint that empty-imports the flightrecorder module for side effects and delegates to routercmd.Main().
Module Implementation
router/cmd/flightrecorder/module/module.go
Adds FlightRecorder type and init() registration. Implements Provision, Cleanup, RouterOnRequest, and RecordTrace. Validates config, prepares OutputPath, configures and starts a trace.FlightRecorder, records traces to timestamped files when latency exceeds RequestLatencyRecordThreshold, and optionally stops after a single record when RecordMultiple is false. Adds public fields OutputPath, RecordMultiple, RequestLatencyRecordThreshold, interface guards, and module metadata.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Check conversion/validation of RequestLatencyRecordThreshold to time.Duration.
  • Verify creation, permissions, and error handling for OutputPath files/directories.
  • Inspect lifecycle and concurrency when stopping the trace.FlightRecorder (especially RecordMultiple == false) and any timing/sleep logic in RouterOnRequest.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a custom module that implements a flight recorder feature.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de3f5eb and 4ddd0c3.

📒 Files selected for processing (1)
  • router/cmd/flightrecorder/module/module.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • router/cmd/flightrecorder/module/module.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: build-router
  • GitHub Check: build_test
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: Analyze (go)

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Nov 14, 2025

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-8737374085a2ce1e5bacaea2135574db6f18f1be-nonroot

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
router/cmd/flightrecorder/README.md (1)

5-17: Replace hard tabs with spaces in code example.

The code example uses hard tabs on lines 9, 10, 11, and 15, which should be replaced with spaces to align with standard Go formatting conventions.

Apply this diff:

 package main
 
 import (
-	routercmd "github.com/wundergraph/cosmo/router/cmd"
-	// Import your modules here
-	_ "github.com/wundergraph/cosmo/router/cmd/flightrecorder/module"
+    routercmd "github.com/wundergraph/cosmo/router/cmd"
+    // Import your modules here
+    _ "github.com/wundergraph/cosmo/router/cmd/flightrecorder/module"
 )
 
 func main() {
-	routercmd.Main()
+    routercmd.Main()
 }
router/cmd/flightrecorder/module/module.go (3)

29-29: Remove template comment.

This comment appears to be leftover from scaffolding and should be removed.

Apply this diff:

     requestLatencyRecordThresholdDuration time.Duration
 
-    // Add a new property here
     fl *trace.FlightRecorder

100-100: Fix error message format.

The error message uses a format string with %w but should rely on zap.Error for structured logging.

Apply this diff:

-        m.Logger.Error("failed to create trace file: %w", zap.Error(err))
+        m.Logger.Error("failed to create trace file", zap.Error(err))

92-113: Add file cleanup mechanism and sanitize operation names.

Two operational concerns:

  1. No file cleanup/rotation: Trace files accumulate indefinitely in the output directory, which could eventually exhaust disk space. Consider implementing a retention policy (e.g., delete files older than N days, or keep only the last N files).

  2. Unsanitized operation names in filenames: Operation names are used directly in filenames (line 94), but may contain characters that are invalid for filenames (/, \, :, *, ?, ", <, >, |). This could cause os.Create to fail.

Example sanitization approach:

import (
    "regexp"
    // ... other imports
)

var invalidFilenameChars = regexp.MustCompile(`[<>:"/\\|?*]`)

func sanitizeOperationName(name string) string {
    // Replace invalid characters with underscores
    sanitized := invalidFilenameChars.ReplaceAllString(name, "_")
    // Limit length to avoid filesystem limits
    if len(sanitized) > 100 {
        sanitized = sanitized[:100]
    }
    return sanitized
}

func (m *FlightRecorder) RecordTrace(operationName string) {
    sanitized := sanitizeOperationName(operationName)
    filename := fmt.Sprintf("trace-%s-%s.out", sanitized, time.Now().Format("2006-01-02-15-04-05"))
    // ... rest of the method
}

For file cleanup, consider adding a cleanup routine in the Provision method or as a background goroutine.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f94a099 and 12c2f7d.

📒 Files selected for processing (3)
  • router/cmd/flightrecorder/README.md (1 hunks)
  • router/cmd/flightrecorder/main.go (1 hunks)
  • router/cmd/flightrecorder/module/module.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
router/cmd/flightrecorder/main.go (1)
router/cmd/main.go (1)
  • Main (42-277)
router/cmd/flightrecorder/module/module.go (2)
router/core/modules.go (7)
  • RegisterModule (56-73)
  • ModuleContext (160-164)
  • Module (52-54)
  • ModuleInfo (44-50)
  • RouterOnRequestHandler (113-115)
  • Provisioner (149-152)
  • Cleaner (154-157)
router/core/context.go (1)
  • RequestContext (61-142)
🪛 markdownlint-cli2 (0.18.1)
router/cmd/flightrecorder/README.md

9-9: Hard tabs
Column: 1

(MD010, no-hard-tabs)


10-10: Hard tabs
Column: 1

(MD010, no-hard-tabs)


11-11: Hard tabs
Column: 1

(MD010, no-hard-tabs)


15-15: Hard tabs
Column: 1

(MD010, no-hard-tabs)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-router
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_test
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
router/cmd/flightrecorder/main.go (1)

1-10: LGTM!

The entrypoint correctly delegates to the router's main function and imports the module for side-effect registration. This is the standard pattern for module initialization.

@endigma endigma force-pushed the jesse/eng-8466-implement-flight-recorder-telemetry branch from 67237dd to e0770d1 Compare November 20, 2025 14:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
router/cmd/flightrecorder/README.md (2)

21-21: Clarify which configuration file contains the modules section.

Line 21 references "the main router configuration file," but doesn't specify the filename (e.g., config.yaml, router.yaml). Provide the explicit filename to reduce ambiguity for users.


45-45: Provide more context for .env.example setup.

The note mentions copying .env.example but doesn't explain what environment variables are needed or what the router should connect to. Either expand this section with required env vars or reference the main router documentation where this is covered.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12c2f7d and e0770d1.

📒 Files selected for processing (3)
  • router/cmd/flightrecorder/README.md (1 hunks)
  • router/cmd/flightrecorder/main.go (1 hunks)
  • router/cmd/flightrecorder/module/module.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • router/cmd/flightrecorder/main.go
  • router/cmd/flightrecorder/module/module.go
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
router/cmd/flightrecorder/README.md

9-9: Hard tabs
Column: 1

(MD010, no-hard-tabs)


10-10: Hard tabs
Column: 1

(MD010, no-hard-tabs)


11-11: Hard tabs
Column: 1

(MD010, no-hard-tabs)


15-15: Hard tabs
Column: 1

(MD010, no-hard-tabs)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-router
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: Analyze (go)

@endigma endigma force-pushed the jesse/eng-8466-implement-flight-recorder-telemetry branch from e0770d1 to de3f5eb Compare November 20, 2025 15:24
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
router/cmd/flightrecorder/README.md (2)

9-16: Replace hard tabs with spaces in the Go code block.

The fenced Go example still uses hard tabs on the import and routercmd.Main() lines, which violates the repo’s markdownlint rule (MD010) and can render inconsistently. Please replace those tabs with the project’s standard indentation (e.g., 4 spaces).


3-3: Clarify wording: reference the runtime/trace FlightRecorder API, not a standalone module.

The text calls this the "new flightrecorder module from Go 1.25". To match Go’s terminology, this should point to the FlightRecorder API in the runtime/trace package (the blog link is fine), e.g., “…uses the new FlightRecorder API in the runtime/trace package (Go 1.25)…”.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0770d1 and de3f5eb.

📒 Files selected for processing (3)
  • router/cmd/flightrecorder/README.md (1 hunks)
  • router/cmd/flightrecorder/main.go (1 hunks)
  • router/cmd/flightrecorder/module/module.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • router/cmd/flightrecorder/main.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router/cmd/flightrecorder/module/module.go
  • router/cmd/flightrecorder/README.md
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router/cmd/flightrecorder/README.md
🧬 Code graph analysis (1)
router/cmd/flightrecorder/module/module.go (2)
router/core/modules.go (7)
  • RegisterModule (56-73)
  • ModuleContext (160-164)
  • Module (52-54)
  • ModuleInfo (44-50)
  • RouterOnRequestHandler (113-115)
  • Provisioner (149-152)
  • Cleaner (154-157)
router/core/context.go (1)
  • RequestContext (61-142)
🪛 markdownlint-cli2 (0.18.1)
router/cmd/flightrecorder/README.md

9-9: Hard tabs
Column: 1

(MD010, no-hard-tabs)


10-10: Hard tabs
Column: 1

(MD010, no-hard-tabs)


11-11: Hard tabs
Column: 1

(MD010, no-hard-tabs)


15-15: Hard tabs
Column: 1

(MD010, no-hard-tabs)

🔇 Additional comments (1)
router/cmd/flightrecorder/module/module.go (1)

83-97: Router hook logic looks sound.

The request hook wraps next.ServeHTTP, measures duration, and only records when Enabled() is true and the latency threshold is exceeded. This is a clean, low-overhead integration point and aligns with the module’s purpose.

Copy link
Contributor

@StarpTech StarpTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

type FlightRecorder struct {
OutputPath string `mapstructure:"outputPath"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why mapstructure is used here and other custom modules?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just how it works, the actual config struct has just map[any]any so we use map structure to get it out of that

}

m.fl = trace.NewFlightRecorder(trace.FlightRecorderConfig{
MinAge: m.requestLatencyRecordThresholdDuration,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a pro at flightrecorder, but their blog recommends to set MinAge twice the duration of even we want to capture:

MinAge configures the duration for which trace data is reliably retained, and we suggest setting it to around 2x the time window of the event. For example, if you are debugging a 5-second timeout, set it to 10 seconds.

https://go.dev/blog/flight-recorder

Copy link
Member Author

@endigma endigma Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I double it when I parse it?


m.Logger.Warn("Request took longer than threshold", zap.Duration("duration", requestDuration), zap.String("operation", operation.Name()))

m.RecordTrace(operation.Name())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We capture events synchronously. I don't know how much time is takes to capture it. But if it is a long time we could do it in the goroutine, thus reducing the latency for the client (but creating a little bit of additional load on router).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do it synchronously so we get the correct period of time, if we goroutine it it could happen later which would defeat the point

modules:
flightRecorder:
outputPath: './flight_recorder_data'
requestLatencyRecordThreshold: 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding units would simplify editing configs:

Suggested change
requestLatencyRecordThreshold: 100
requestLatencyRecordThresholdMillis: 100


### `recordMultiple`

The `recordMultiple` is a boolean that indicates whether the flight recorder should record multiple traces or not. Defaults to `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Looks like "or not" is redundant here

Suggested change
The `recordMultiple` is a boolean that indicates whether the flight recorder should record multiple traces or not. Defaults to `false`.
The `recordMultiple` is a boolean that indicates whether the flight recorder should record multiple traces. Defaults to `false`.


## Run tests

Tests for this module can be found within the [integration tests](../router-tests/module).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Tests for this module can be found within the [integration tests](../router-tests/module).
Tests for this module can be found within the [integration tests](../router-tests/modules).

But I don't see any such tests there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it's copypasted README

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants