-
Notifications
You must be signed in to change notification settings - Fork 192
feat: add custom module implementing flight recorder #2334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new FlightRecorder module (recording request traces to files when latency exceeds a threshold), a Go executable entrypoint, and README documentation; integrates with router lifecycle via Provision and Cleanup and supports single or multiple recordings. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
Comment |
Router-nonroot image scan passed✅ No security vulnerabilities found in image: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (4)
router/cmd/flightrecorder/README.md (1)
5-17: Replace hard tabs with spaces in code example.The code example uses hard tabs on lines 9, 10, 11, and 15, which should be replaced with spaces to align with standard Go formatting conventions.
Apply this diff:
package main import ( - routercmd "github.com/wundergraph/cosmo/router/cmd" - // Import your modules here - _ "github.com/wundergraph/cosmo/router/cmd/flightrecorder/module" + routercmd "github.com/wundergraph/cosmo/router/cmd" + // Import your modules here + _ "github.com/wundergraph/cosmo/router/cmd/flightrecorder/module" ) func main() { - routercmd.Main() + routercmd.Main() }router/cmd/flightrecorder/module/module.go (3)
29-29: Remove template comment.This comment appears to be leftover from scaffolding and should be removed.
Apply this diff:
requestLatencyRecordThresholdDuration time.Duration - // Add a new property here fl *trace.FlightRecorder
100-100: Fix error message format.The error message uses a format string with
%wbut should rely onzap.Errorfor structured logging.Apply this diff:
- m.Logger.Error("failed to create trace file: %w", zap.Error(err)) + m.Logger.Error("failed to create trace file", zap.Error(err))
92-113: Add file cleanup mechanism and sanitize operation names.Two operational concerns:
No file cleanup/rotation: Trace files accumulate indefinitely in the output directory, which could eventually exhaust disk space. Consider implementing a retention policy (e.g., delete files older than N days, or keep only the last N files).
Unsanitized operation names in filenames: Operation names are used directly in filenames (line 94), but may contain characters that are invalid for filenames (
/,\,:,*,?,",<,>,|). This could causeos.Createto fail.Example sanitization approach:
import ( "regexp" // ... other imports ) var invalidFilenameChars = regexp.MustCompile(`[<>:"/\\|?*]`) func sanitizeOperationName(name string) string { // Replace invalid characters with underscores sanitized := invalidFilenameChars.ReplaceAllString(name, "_") // Limit length to avoid filesystem limits if len(sanitized) > 100 { sanitized = sanitized[:100] } return sanitized } func (m *FlightRecorder) RecordTrace(operationName string) { sanitized := sanitizeOperationName(operationName) filename := fmt.Sprintf("trace-%s-%s.out", sanitized, time.Now().Format("2006-01-02-15-04-05")) // ... rest of the method }For file cleanup, consider adding a cleanup routine in the Provision method or as a background goroutine.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
router/cmd/flightrecorder/README.md(1 hunks)router/cmd/flightrecorder/main.go(1 hunks)router/cmd/flightrecorder/module/module.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
router/cmd/flightrecorder/main.go (1)
router/cmd/main.go (1)
Main(42-277)
router/cmd/flightrecorder/module/module.go (2)
router/core/modules.go (7)
RegisterModule(56-73)ModuleContext(160-164)Module(52-54)ModuleInfo(44-50)RouterOnRequestHandler(113-115)Provisioner(149-152)Cleaner(154-157)router/core/context.go (1)
RequestContext(61-142)
🪛 markdownlint-cli2 (0.18.1)
router/cmd/flightrecorder/README.md
9-9: Hard tabs
Column: 1
(MD010, no-hard-tabs)
10-10: Hard tabs
Column: 1
(MD010, no-hard-tabs)
11-11: Hard tabs
Column: 1
(MD010, no-hard-tabs)
15-15: Hard tabs
Column: 1
(MD010, no-hard-tabs)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-router
- GitHub Check: build_push_image
- GitHub Check: build_push_image (nonroot)
- GitHub Check: image_scan
- GitHub Check: image_scan (nonroot)
- GitHub Check: build_test
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
router/cmd/flightrecorder/main.go (1)
1-10: LGTM!The entrypoint correctly delegates to the router's main function and imports the module for side-effect registration. This is the standard pattern for module initialization.
67237dd to
e0770d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
router/cmd/flightrecorder/README.md (2)
21-21: Clarify which configuration file contains the modules section.Line 21 references "the main router configuration file," but doesn't specify the filename (e.g.,
config.yaml,router.yaml). Provide the explicit filename to reduce ambiguity for users.
45-45: Provide more context for.env.examplesetup.The note mentions copying
.env.examplebut doesn't explain what environment variables are needed or what the router should connect to. Either expand this section with required env vars or reference the main router documentation where this is covered.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
router/cmd/flightrecorder/README.md(1 hunks)router/cmd/flightrecorder/main.go(1 hunks)router/cmd/flightrecorder/module/module.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- router/cmd/flightrecorder/main.go
- router/cmd/flightrecorder/module/module.go
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
router/cmd/flightrecorder/README.md
9-9: Hard tabs
Column: 1
(MD010, no-hard-tabs)
10-10: Hard tabs
Column: 1
(MD010, no-hard-tabs)
11-11: Hard tabs
Column: 1
(MD010, no-hard-tabs)
15-15: Hard tabs
Column: 1
(MD010, no-hard-tabs)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: build-router
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: integration_test (./events)
- GitHub Check: build_test
- GitHub Check: build_push_image (nonroot)
- GitHub Check: build_push_image
- GitHub Check: Analyze (go)
e0770d1 to
de3f5eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
router/cmd/flightrecorder/README.md (2)
9-16: Replace hard tabs with spaces in the Go code block.The fenced Go example still uses hard tabs on the import and
routercmd.Main()lines, which violates the repo’s markdownlint rule (MD010) and can render inconsistently. Please replace those tabs with the project’s standard indentation (e.g., 4 spaces).
3-3: Clarify wording: reference theruntime/traceFlightRecorder API, not a standalone module.The text calls this the "new
flightrecordermodule from Go 1.25". To match Go’s terminology, this should point to the FlightRecorder API in theruntime/tracepackage (the blog link is fine), e.g., “…uses the newFlightRecorderAPI in theruntime/tracepackage (Go 1.25)…”.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
router/cmd/flightrecorder/README.md(1 hunks)router/cmd/flightrecorder/main.go(1 hunks)router/cmd/flightrecorder/module/module.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- router/cmd/flightrecorder/main.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.
Applied to files:
router/cmd/flightrecorder/module/module.gorouter/cmd/flightrecorder/README.md
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.
Applied to files:
router/cmd/flightrecorder/README.md
🧬 Code graph analysis (1)
router/cmd/flightrecorder/module/module.go (2)
router/core/modules.go (7)
RegisterModule(56-73)ModuleContext(160-164)Module(52-54)ModuleInfo(44-50)RouterOnRequestHandler(113-115)Provisioner(149-152)Cleaner(154-157)router/core/context.go (1)
RequestContext(61-142)
🪛 markdownlint-cli2 (0.18.1)
router/cmd/flightrecorder/README.md
9-9: Hard tabs
Column: 1
(MD010, no-hard-tabs)
10-10: Hard tabs
Column: 1
(MD010, no-hard-tabs)
11-11: Hard tabs
Column: 1
(MD010, no-hard-tabs)
15-15: Hard tabs
Column: 1
(MD010, no-hard-tabs)
🔇 Additional comments (1)
router/cmd/flightrecorder/module/module.go (1)
83-97: Router hook logic looks sound.The request hook wraps
next.ServeHTTP, measures duration, and only records whenEnabled()is true and the latency threshold is exceeded. This is a clean, low-overhead integration point and aligns with the module’s purpose.
StarpTech
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| } | ||
|
|
||
| type FlightRecorder struct { | ||
| OutputPath string `mapstructure:"outputPath"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why mapstructure is used here and other custom modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just how it works, the actual config struct has just map[any]any so we use map structure to get it out of that
| } | ||
|
|
||
| m.fl = trace.NewFlightRecorder(trace.FlightRecorderConfig{ | ||
| MinAge: m.requestLatencyRecordThresholdDuration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a pro at flightrecorder, but their blog recommends to set MinAge twice the duration of even we want to capture:
MinAge configures the duration for which trace data is reliably retained, and we suggest setting it to around 2x the time window of the event. For example, if you are debugging a 5-second timeout, set it to 10 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I double it when I parse it?
|
|
||
| m.Logger.Warn("Request took longer than threshold", zap.Duration("duration", requestDuration), zap.String("operation", operation.Name())) | ||
|
|
||
| m.RecordTrace(operation.Name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We capture events synchronously. I don't know how much time is takes to capture it. But if it is a long time we could do it in the goroutine, thus reducing the latency for the client (but creating a little bit of additional load on router).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do it synchronously so we get the correct period of time, if we goroutine it it could happen later which would defeat the point
| modules: | ||
| flightRecorder: | ||
| outputPath: './flight_recorder_data' | ||
| requestLatencyRecordThreshold: 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding units would simplify editing configs:
| requestLatencyRecordThreshold: 100 | |
| requestLatencyRecordThresholdMillis: 100 |
|
|
||
| ### `recordMultiple` | ||
|
|
||
| The `recordMultiple` is a boolean that indicates whether the flight recorder should record multiple traces or not. Defaults to `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. Looks like "or not" is redundant here
| The `recordMultiple` is a boolean that indicates whether the flight recorder should record multiple traces or not. Defaults to `false`. | |
| The `recordMultiple` is a boolean that indicates whether the flight recorder should record multiple traces. Defaults to `false`. |
|
|
||
| ## Run tests | ||
|
|
||
| Tests for this module can be found within the [integration tests](../router-tests/module). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Tests for this module can be found within the [integration tests](../router-tests/module). | |
| Tests for this module can be found within the [integration tests](../router-tests/modules). |
But I don't see any such tests there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, it's copypasted README
Summary by CodeRabbit
New Features
Configuration
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.
Checklist