Skip to content

Conversation

@sgress454
Copy link
Contributor

@sgress454 sgress454 commented Oct 28, 2025

Related issue: Resolves #34528

Details

This PR implements the agent changes for allowing Fleet admins to require that users authenticate with an IdP prior to having their devices set up. I'll comment on changes inline but the high-level is:

  1. Orbit calls the enroll endpoint as usual. This is triggered lazily by any one of a number of subsystems like device token rotation or requesting Fleet config
  2. If the enroll endpoint returns the new ErrEndUserAuthRequired response, then it opens a window to the /mdm/sso Fleet page and retries the enroll endpoint every 30 seconds indefinitely.
  3. Any other non-200 response to the enroll request is treated as before (limited # of retries, with backoff)

Checklist for submitter

If some of the following don't apply, delete the relevant line.

Testing

  • Added/updated automated tests
    Added test for new retry logic

  • QA'd all new/changed functionality manually

This is kinda hard to test without the associated backend PR: #34835

fleetd/orbit/Fleet Desktop

  • Verified compatibility with the latest released version of Fleet (see Must rule)
    This is compatible with all Fleet versions, since older ones won't send the new error.
  • If the change applies to only one platform, confirmed that runtime.GOOS is used as needed to isolate changes
    This is compatible with all platforms, although it currently should only ever run on Windows and Linux since macOS devices will have end-user auth taken care of before they even download Orbit.
  • Verified that fleetd runs on macOS, Linux and Windows
    Testing this now.
  • Verified auto-update works from the released version of component to the new version (see tools/tuf/test)

Summary by CodeRabbit

  • New Features

    • Added SSO (Single Sign-On) enrollment support for end-user authentication
    • Enhanced error messaging for authentication-required scenarios
  • Bug Fixes

    • Improved error handling and retry logic for enrollment failures

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented a new WithErrorFilter option for the retry package which allows you to change the behavior of the retry loop based on what kind of error is returned from the function being retried. This allows us to continuously retry the enroll endpoint every 30 seconds indefinitely when an "end user auth required" error is returned, regardless of the max number of attempts or backoff multiplier configured.

Comment on lines 545 to +555
func() error {
var err error
orbitNodeKey_, err = oc.enrollAndWriteNodeKeyFile()
switch {
case err == nil:
return nil
case errors.Is(err, notFoundErr{}):
// Do not retry if the endpoint does not exist.
endpointDoesNotExist = true
return nil
default:
logging.LogErrIfEnvNotSet(constant.SilenceEnrollLogErrorEnvVar, err, "enroll failed, retrying")
return err
}
return err
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this logic is now moved into the WithErrorFilter() function, which was implemented so that we could retry indefinitely when waiting for the user to log in to the IdP.

Comment on lines 570 to 587
// If we get an ErrEndUserAuthRequired error, then the user
// needs to authenticate with the identity provider.
//
// Open a browser window to the sign-on page and
// then keep retrying until they authenticate.
log.Debug().Msg("enroll unauthenticated, waiting for end-user to authenticate via SSO")
if !oc.initiatedIdpAuth {
oc.initiatedIdpAuth = true
log.Debug().Msg("opening SSO window")
openWindowErr := oc.openSSOWindow()
if openWindowErr != nil {
log.Error().Err(openWindowErr).Msg("opening SSO window")
return retry.ErrorOutcomeNormalRetry
}
}
// Sleep for 20 seconds, making the total retry interval 30 seconds
time.Sleep(20 * time.Second)
return retry.ErrorOutcomeResetAttempts
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we can't reliably close a browser window we've opened, for this first iteration we're just gonna open the window once per Orbit start. If the user closes it, they'll need to restart their computer to get it back. We may later implement a feature in Fleet Desktop to allow re-opening the SSO login window.

@sgress454
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 28, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 28, 2025

Walkthrough

This PR implements SSO authentication during Orbit enrollment by adding error handling for end-user authentication requirements, introducing flexible error filtering for retries, registering an SSO window callback, and modifying the enrollment flow to trigger SSO authentication when needed with stateful retry management.

Changes

Cohort / File(s) Summary
Retry error filtering enhancement
pkg/retry/retry.go, pkg/retry/retry_test.go
Introduces ErrorOutcome enum (NormalRetry, ResetAttempts, Ignore, DoNotRetry) and WithErrorFilter option to customize retry behavior based on error type. New tests verify ignore and donotretry paths.
End-user auth error handling
server/service/base_client.go, server/service/base_client_errors.go
Adds ErrEndUserAuthRequired error variable and parseResponse logic to return this error when HTTP 401 contains "END_USER_AUTH_REQUIRED" text.
Orbit enrollment SSO integration
server/service/orbit_client.go
Adds initiatedIdpAuth state tracking, openSSOWindow hook, and SetOpenSSOWindowFunc setter. Refactors getNodeKeyOrEnroll to use error filter for retry logic: NotFound halts retry, ErrEndUserAuthRequired triggers SSO window and resets attempts, others retry normally.
Orbit command SSO callback
orbit/cmd/orbit/orbit.go
Registers SetOpenSSOWindowFunc callback on orbit client to open browser to /mdm/sso with initiator and host_uuid parameters.

Sequence Diagram

sequenceDiagram
    participant OrbitCmd as Orbit Command
    participant OrbitClient as Orbit Client
    participant Enroll as Enroll Endpoint
    participant SSO as SSO Window
    participant IdP as IdP

    OrbitCmd->>OrbitClient: SetOpenSSOWindowFunc(callback)
    OrbitCmd->>OrbitClient: Enrollment flow starts
    OrbitClient->>Enroll: Attempt enrollment
    Enroll-->>OrbitClient: 401 END_USER_AUTH_REQUIRED
    OrbitClient->>OrbitClient: Detect error via filter
    rect rgb(200, 220, 255)
        Note over OrbitClient,SSO: First SSO attempt only
        OrbitClient->>SSO: openSSOWindow() → browser callback
        SSO->>IdP: Browser opens IdP auth
        IdP-->>SSO: User completes auth
    end
    OrbitClient->>OrbitClient: Mark initiatedIdpAuth, wait 20s
    OrbitClient->>OrbitClient: Reset retry attempts
    OrbitClient->>Enroll: Retry enrollment (with auth context)
    Enroll-->>OrbitClient: 200 OK (enrollment succeeds)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • server/service/orbit_client.go: Verify error filter logic correctly handles all ErrorOutcome cases; check state management of initiatedIdpAuth flag and 20-second wait timing; validate that NotFound specifically identifies missing enroll endpoint.
  • pkg/retry/retry.go: Confirm errorFilter precedence over normal retry path; verify all four ErrorOutcome branches behave as intended.
  • Callback registration flow: Ensure SSO window callback correctly constructs mdm/sso URL with initiator and host_uuid parameters and properly propagates errors.

Suggested reviewers

  • sharon-fdm
  • lukeheath
  • mostlikelee
  • georgekarrv

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title Check ✅ Passed The title "End-user authentication for Window/Linux setup experience: agent" directly and accurately describes the main change in this pull request. The modifications across six files all work together to implement exactly this feature: detecting when end-user authentication is required during enrollment, opening a browser window to the SSO page, and retrying enrollment indefinitely until authentication is completed. The title is specific and clear enough that a reviewer scanning git history would immediately understand the primary change without being misleading or overly vague.
Linked Issues Check ✅ Passed The PR implements the core coding requirements from issue #34528: detecting ErrEndUserAuthRequired responses from the enroll endpoint, opening a browser window to the SSO page via the SetOpenSSOWindowFunc callback, and implementing indefinite retry logic for authentication attempts (aligning with the chosen design to block enrollment until IdP authentication is complete). The retry infrastructure additions enable the new error filtering needed for this flow. Automated tests validate the new retry logic with error filtering. All changes directly address the stated objective to implement agent-side blocking of enrollment until IdP authentication is completed.
Out of Scope Changes Check ✅ Passed All six modified files contain changes directly related to the end-user authentication implementation objective. The retry package additions (ErrorOutcome enum, errorFilter field, WithErrorFilter option) are generic utilities but are necessary to support the new conditional retry behavior required by this feature. The orbit client changes directly implement the SSO flow and indefinite retries. The error detection in base_client.go and the new ErrEndUserAuthRequired variable support the authentication requirement signaling. No changes appear to introduce unrelated functionality or modifications outside the scope of implementing end-user authentication for Windows/Linux setup.
Description Check ✅ Passed The pull request description addresses the essential template requirements: it includes the related issue (#34528), provides clear details of the implementation with the three-step flow, indicates that automated tests were added and manual QA was performed, and notes compatibility considerations across platforms. While some checklist items remain unchecked (changes file, fleetd platform verification, auto-update testing), the author has acknowledged these as pending or deferred, which is acceptable for an active draft PR. The description provides sufficient context for understanding the change without being overly vague or incomplete.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sgress454/34259-agent

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
server/service/base_client_errors.go (1)

19-19: Add a doc comment for the new exported error.

Exported identifiers should have GoDoc. Add a short comment to satisfy linters and improve discoverability.

-	ErrEndUserAuthRequired   = errors.New("end user authentication required")
+	// ErrEndUserAuthRequired indicates the server requires end‑user SSO authentication before proceeding.
+	ErrEndUserAuthRequired   = errors.New("end user authentication required")
server/service/base_client.go (1)

52-60: Make 401 handling robust: parse error name and reason once.

Relying on a substring in the reason is brittle. Parse the JSON and check the error name; keep a case‑insensitive check for password reset.

-		errText := extractServerErrorText(response.Body)
-		if strings.Contains(errText, "password reset required") {
+		errName, errText := extractServerErrorNameReason(response.Body)
+		if strings.Contains(strings.ToLower(errText), "password reset required") {
 			return ErrPasswordResetRequired
 		}
-		if strings.Contains(errText, "END_USER_AUTH_REQUIRED") {
+		if errName == "END_USER_AUTH_REQUIRED" || strings.Contains(errText, "END_USER_AUTH_REQUIRED") {
 			return ErrEndUserAuthRequired
 		}
pkg/retry/retry.go (1)

8-20: Document new exported retry extension points.

Add brief GoDoc for the exported type, constants, and option to aid usage and linting. Logic looks good.

-type ErrorOutcome int
+// ErrorOutcome tells retry.Do how to react to a returned error.
+// Use with WithErrorFilter to control retry behavior.
+type ErrorOutcome int
@@
-const (
-	ErrorOutcomeNormalRetry ErrorOutcome = iota
-	// Reset attempts counter and backoff.
-	// Useful for hijacking the retry cycle to retry
-	// indefinitely until a certain condition is met.
-	ErrorOutcomeResetAttempts
-	// Ignore the error and return as if successful.
-	ErrorOutcomeIgnore
-	// Bail out of the retry loop immediately.
-	ErrorOutcomeDoNotRetry
-)
+const (
+	// ErrorOutcomeNormalRetry continues with the configured retry policy.
+	ErrorOutcomeNormalRetry ErrorOutcome = iota
+	// ErrorOutcomeResetAttempts resets attempts/backoff to initial values.
+	ErrorOutcomeResetAttempts
+	// ErrorOutcomeIgnore treats the error as success (no more retries).
+	ErrorOutcomeIgnore
+	// ErrorOutcomeDoNotRetry stops immediately and returns the error.
+	ErrorOutcomeDoNotRetry
+)
@@
-func WithErrorFilter(f func(error) ErrorOutcome) Option {
+// WithErrorFilter sets a function that maps errors to retry outcomes.
+// The filter is evaluated before max‑attempts/backoff handling.
+func WithErrorFilter(f func(error) ErrorOutcome) Option {
 	return func(c *config) {
 		c.errorFilter = f
 	}
 }

Also applies to: 26-26, 55-59

pkg/retry/retry_test.go (2)

72-108: Prefer sentinel errors and errors.Is over string comparisons.

Eliminates brittle string matching and makes intent clearer.

 	t.Run("with error filter (test ignore)", func(t *testing.T) {
 		count := 0
+		errNormal := errors.New("normal")
+		errReset := errors.New("reset")
+		errIgnore := errors.New("ignore")
 		err := Do(func() error {
 			count++
 			if count == 1 {
-				return errors.New("normal")
+				return errNormal
 			}
 			if count == 2 {
-				return errors.New("reset")
+				return errReset
 			}
 			if count == 3 {
-				return errors.New("ignore")
+				return errIgnore
 			}
 			return nil
 		},
@@
-			WithErrorFilter(func(err error) ErrorOutcome {
-				if err.Error() == "normal" {
+			WithErrorFilter(func(err error) ErrorOutcome {
+				if errors.Is(err, errNormal) {
 					return ErrorOutcomeNormalRetry
 				}
-				if err.Error() == "reset" {
+				if errors.Is(err, errReset) {
 					return ErrorOutcomeResetAttempts
 				}
-				if err.Error() == "ignore" {
+				if errors.Is(err, errIgnore) {
 					return ErrorOutcomeIgnore
 				}
 				return ErrorOutcomeDoNotRetry
 			}),

110-145: Apply the same sentinel pattern to the no‑retry test.

Reduces flakiness and mirrors production usage with errors.Is.

 	t.Run("with error filter (test noretry)", func(t *testing.T) {
 		count := 0
+		errNormal := errors.New("normal")
+		errReset := errors.New("reset")
+		errStop := errors.New("stop")
 		err := Do(func() error {
 			count++
 			if count == 1 {
-				return errors.New("normal")
+				return errNormal
 			}
 			if count == 2 {
-				return errors.New("reset")
+				return errReset
 			}
 			if count == 3 {
-				return errors.New("stop")
+				return errStop
 			}
 			return nil
 		},
@@
-			WithErrorFilter(func(err error) ErrorOutcome {
-				if err.Error() == "normal" {
+			WithErrorFilter(func(err error) ErrorOutcome {
+				if errors.Is(err, errNormal) {
 					return ErrorOutcomeNormalRetry
 				}
-				if err.Error() == "reset" {
+				if errors.Is(err, errReset) {
 					return ErrorOutcomeResetAttempts
 				}
-				if err.Error() == "stop" {
+				if errors.Is(err, errStop) {
 					return ErrorOutcomeDoNotRetry
 				}
 				return ErrorOutcomeNormalRetry
 			}),
server/service/orbit_client.go (2)

89-92: Add doc comment to exported setter; allow nil to clear.

Small polish for API clarity.

-func (oc *OrbitClient) SetOpenSSOWindowFunc(f func() error) {
+// SetOpenSSOWindowFunc sets the callback used to open the SSO login window.
+// Pass nil to clear the callback.
+func (oc *OrbitClient) SetOpenSSOWindowFunc(f func() error) {
 	oc.openSSOWindow = f
 }

594-596: Prefer errors.As for not-found sentinel.

errors.As is more reliable than comparing against a zero value of a struct error type.

-		if errors.Is(err, notFoundErr{}) {
+		if errors.As(err, new(notFoundErr)) {
 			return "", errors.New("enroll endpoint does not exist")
 		}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00004f2 and 2999ca3.

📒 Files selected for processing (6)
  • orbit/cmd/orbit/orbit.go (2 hunks)
  • pkg/retry/retry.go (3 hunks)
  • pkg/retry/retry_test.go (1 hunks)
  • server/service/base_client.go (1 hunks)
  • server/service/base_client_errors.go (1 hunks)
  • server/service/orbit_client.go (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

⚙️ CodeRabbit configuration file

When reviewing SQL queries that are added or modified, ensure that appropriate filtering criteria are applied—especially when a query is intended to return data for a specific entity (e.g., a single host). Check for missing WHERE clauses or incorrect filtering that could lead to incorrect or non-deterministic results (e.g., returning the first row instead of the correct one). Flag any queries that may return unintended results due to lack of precise scoping.

Files:

  • server/service/base_client.go
  • server/service/base_client_errors.go
  • pkg/retry/retry.go
  • pkg/retry/retry_test.go
  • orbit/cmd/orbit/orbit.go
  • server/service/orbit_client.go
🔇 Additional comments (2)
pkg/retry/retry.go (1)

86-99: LGTM on integration into Do().

Ordering (filter before max‑attempts/backoff) and state resets look correct.

orbit/cmd/orbit/orbit.go (1)

58-58: LGTM!

The import is necessary for opening the SSO browser window in the callback function below.

@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

❌ Patch coverage is 28.57143% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.23%. Comparing base (437cb9c) to head (ab5f3f9).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
server/service/orbit_client.go 0.00% 28 Missing ⚠️
orbit/cmd/orbit/orbit.go 0.00% 5 Missing ⚠️
server/service/base_client.go 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #34847      +/-   ##
==========================================
+ Coverage   66.04%   66.23%   +0.18%     
==========================================
  Files        2066     2071       +5     
  Lines      172701   174614    +1913     
  Branches     6952     6952              
==========================================
+ Hits       114061   115649    +1588     
- Misses      48118    48335     +217     
- Partials    10522    10630     +108     
Flag Coverage Δ
backend 67.86% <28.57%> (+0.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sgress454 sgress454 marked this pull request as ready for review October 28, 2025 04:02
@sgress454 sgress454 requested a review from a team as a code owner October 28, 2025 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent: initiate and monitor IdP authentication for Windows/Linux setup experience

2 participants