Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 38 additions & 4 deletions cmd/epp/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,52 @@ limitations under the License.
package main

import (
"fmt"
"os"
"runtime/debug"

ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/log/zap"

"sigs.k8s.io/gateway-api-inference-extension/cmd/epp/runner"
)

func main() {
// For adding out-of-tree plugins to the plugins registry, use the following:
// plugins.Register(my-out-of-tree-plugin-name, my-out-of-tree-plugin-factory-function)

if err := runner.NewRunner().Run(ctrl.SetupSignalHandler()); err != nil {
// Delegate to a run function.
// This ensures that if run() returns an error, we print it and exit.
// Crucially, it ensures that any defers inside run() execute BEFORE os.Exit is called.
if err := run(); err != nil {
// We use the global logger here assuming it was set during bootstrap or runner init.
// If setup failed completely, this writes to stderr.
ctrl.Log.Error(err, "Application exited with error")
os.Exit(1)
}
}

func run() error {
// Setup bootstrap logger.
// This logger is used for initialization errors before the Runner configures the user-specified logging format
// (JSON/Console, Verbosity).
bootstrapLog := zap.New(zap.UseDevMode(true))
ctrl.SetLogger(bootstrapLog)

// Panic Recovery: This catches panics on the main goroutine during initialization.
// Note: It will NOT catch panics in child goroutines spawned by the Manager.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching only some of the Panics is not very good. I think it should be all or nothing.

GIE code should not call Panic(). Errors should be returned as errors and not via calls to Panic().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching only some of the Panics is not very good. I think it should be all or nothing.

Capturing all runtime panics (from child goroutines) is the ideal end state. I was trying to keep this PR small in scope and focused solely on the boostrap failure modes.

GIE code should not call Panic(). Errors should be returned as errors and not via calls to Panic().

I agree that we should never use panic() for control flow or error handling. However, this recover block isn't intended to handle intentional panics. It is a safety net for unintentional bugs (e.g., nil pointer dereferences) during the bootstrap/initialization phase (and runtime if we address the point above).

Currently, if a panic occurs, the pod exits silently, making CrashLoopBackOff very difficult to debug. This change ensures we catch those failures and print a structured stack trace.

I personally have found this useful for debugging custom local builds when I am testing new features before sending out PRs. Ideally we never hit this case in production.

defer func() {
if r := recover(); r != nil {
err, ok := r.(error)
if !ok {
err = fmt.Errorf("%v", r)
}
bootstrapLog.Error(err, "CRITICAL: Process panic recovered", "stack", string(debug.Stack()))
os.Exit(1)
}
}()

ctx := ctrl.SetupSignalHandler()

// Execute Runner.
// For adding out-of-tree plugins to the plugins registry, use the following:
// plugins.Register(name, factory)
return runner.NewRunner().Run(ctx)
}