Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major refactoring & implement standalone mode #12

Merged
merged 19 commits into from
Mar 25, 2024
Merged

Major refactoring & implement standalone mode #12

merged 19 commits into from
Mar 25, 2024

Conversation

radito3
Copy link
Member

@radito3 radito3 commented Jan 31, 2024

Overhaul the code from a PoC to a semi-productive state.

Major changes done to the Processors, mainly the UpdateWatchConfig handler, which provisions k8s managers.

Simplify large parts of the codebase.

Handle context cancellation propagation properly.

Ensure resource cleanup on graceful shutdown - do not leave hanging goroutines.


Refactoring main points:

  • general refactoring and simplification
  • improve OOP
  • prefer composition over embedding
  • improve usage of context.Context
  • remove usage of most log.Fatal* calls
  • improve error handling
  • remove usage of channels in Processors as they aren't used in asynchronous code
  • reduce usage of goroutines when they aren't necessary
  • remove usage of global variables
  • remove unused structs, functions, variables
  • add graceful shutdown and resource cleanup
  • (WIP) add standalone mode so that the RWP can work outside of a k8s cluster

Details about context usage

Context flow between objects:

flowchart LR
root(Root Context) --> grpc(gRPC Client)
grpc --> hb(Heartbeat)
root --> proc(Processor)
proc --> mng(K8s Manager)
Loading

The root context traps the SIGINT and SIGTERM signals, so it can be used for graceful shutdown.
The gRPC client then wraps it with its own cancel() function, so it can stop on Send() or Recv() errors.
Additionally, it adds the Session ID and Binary Version header values to the context being sent to the AutoPi.
The client has a heartbeat in which it select's on the context Done() channel and a 30-second timer.

The processors accept the root context as a parameter to their Process method.
Currently, the context is only used in the UpdateWatchConfig handler, as it's needed for proper flow control of the K8s Manager(s) it spawns.
The processor builds the watched resources controllers, wraps the root context with its own cancel() function and starts the Manager with the wrapped context.
The internal cancellation will happen when the manager is stopped - in the case where a newer watch configuration is received and the manager needs to be recreated.
The manager will then block until either the context is cancelled or an error occurs within its controllers.

---
title: Flowchart for UpdateWatchConfig
---
stateDiagram-v2
[*] --> rootCtx
rootCtx : Create Root Context
rootCtx --> createDrainChan
createDrainChan : Create Drain Channel
createDrainChan --> recv
recv : Receive gRPC message
recv --> createProc
createProc : Create UpdateWatchConfig Processor
createProc --> createMng
createMng : Create K8s Manager
createMng --> startMng : Spawn new goroutine
startMng : Start K8s Manager
createMng --> sig : main goroutine
sig : Wait for termination via SIGTERM
sig --> CancelRootCtx
CancelRootCtx : Cancel Root Context
CancelRootCtx --> waitDrainCh
waitDrainCh : Wait on Drain Channel
waitDrainCh --> [*]
startMng --> drainCh : Root context cancellation propagated
drainCh : Send signal to Drain Channel
drainCh --> endG
endG : Goroutine returns
Loading
---
title: Flowchart for when the K8s manager fails to start
---
stateDiagram-v2
[*] --> begin
begin : Same as previous first steps
begin --> createMng
createMng : Create Manager
createMng --> startMng : Spawn first goroutine
startMng : Start Manager
startMng --> fail
fail : Error occurs
fail --> errToCh
errToCh : Send signal to Drain Channel
errToCh --> endG1
endG1 : Goroutine returns
createMng --> createMng2 : main goroutine
createMng2 : Receive new request and create Manager
createMng2 --> drainErr : Spawn second goroutine
drainErr : Drain channel for previous error
drainErr --> startMng2
startMng2 : Start Manager
startMng2 --> drainCh : Root context cancellation propagated
drainCh : Send signal to Drain Channel
drainCh --> endG2
endG2 : Goroutine returns
createMng2 --> sig
sig : Wait for termination via SIGTERM
sig --> CancelRootCtx
CancelRootCtx : Cancel Root Context
CancelRootCtx --> waitDrainCh
waitDrainCh : Wait on Drain Channel
waitDrainCh --> [*]
Loading

Overhaul the code from a PoC to a semi-productive state.

Major changes done to the Processors, mainly the UpdateWatchConfig handler,
which provisions k8s managers.

Simplify large parts of the codebase.

Handle context cancellation propagation properly.

Ensure resource cleanup on graceful shutdown - do not leave hanging goroutines.
cmd/remote-work-processor/main.go Outdated Show resolved Hide resolved
cmd/remote-work-processor/main.go Outdated Show resolved Hide resolved
internal/executors/errors.go Show resolved Hide resolved
internal/executors/factory/executor_factory.go Outdated Show resolved Hide resolved
internal/grpc/client_metadata.go Outdated Show resolved Hide resolved
internal/grpc/processors/disable_processor.go Outdated Show resolved Hide resolved
internal/kubernetes/controller/manager-builder.go Outdated Show resolved Hide resolved
internal/kubernetes/controller/controller.go Outdated Show resolved Hide resolved
for connAttempts < maxRetries {
select {
case <-rootCtx.Done():
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that this break is not applicable only for select statement?

if res.Err != nil {
log.Fatalf("Error occurred while processing operation: %v\n", err)
}
isInStandaloneMode, _ := strconv.ParseBool(flag.Lookup(standaloneModeOpt).Value.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen in case standalone-mode flag is not passed as boolean, e.g unintentionally? Shouldn't we need to handle the possible error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag package handles parsing errors on the flags. If something that is not a boolean is provided for --standalone-mode it will fail with a parse error.

cmd/remote-work-processor/main.go Outdated Show resolved Hide resolved
Fix comments

Exit Pod with status 1 in case the Manager can't be started (in K8s mode)

Remove cahing of auth header in HTTP executor

Improvement of error messages
VaskoBozhurski
VaskoBozhurski previously approved these changes Feb 22, 2024
Nizamski
Nizamski previously approved these changes Feb 23, 2024
@radito3 radito3 dismissed stale reviews from Nizamski and VaskoBozhurski via ef30518 February 26, 2024 12:59
@radito3 radito3 changed the title Major refactoring & start work on standalone mode (part 1) Major refactoring & implement standalone mode Mar 11, 2024
internal/executors/http/basic_authorization_header.go Outdated Show resolved Hide resolved
internal/executors/http/basic_authorization_header.go Outdated Show resolved Hide resolved
codes, err2 := parseSuccessResponseCodes(successResponseCodes...)
if err2 != nil {
return false, err2
func isSuccessfulResponseCode(statusCode int, successResponseCodes ...string) (bool, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's our reason for having the successResponseCodes as string? From what I can see, we always try to convert them to integers. Any reason why we can't just do the conversion at the top-most location where they are first inputed and then only pass them as integer slices?

Looking at HttpRequestParameters we already have non-string fields there, so why can't we do the whole string to integer there? Would also mean we get an error when we are building the request parameters and not when we are all the way at checking the response.

Maybe I'm missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From AutoPi, the success codes are interpreted as strings because they can be "2xx", not just "200", "404", etc.

Comment on lines +37 to +43
h.fetcher = NewOAuthTokenFetcher(
withExecutor(executor),
withTokenUrl(tokenUrl),
withRequestBody(requestBody),
withCertificateAuthentication(h.certAuthentication),
withAuthHeader(h.authHeader),
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these with* arguments all optional? Can we make an OAuth TokenFetcher without an URL for instance? Feels weird to me to have a builder pattern where non-optional fields are in fact optional (compilation wise) as any errors would be discovered on runtime rather than on the much more preferred compile time.

Can we perhaps consider making the always required parameters as concrete parameters of the function and the optional ones (I think f.certAuthentication might be?) to be left as the optionals at the end of the function?

As an example the NewOAuthorizationHeaderGenerator seems to be written in such a way.

Comment on lines +85 to +86
c.selector.LabelSelector.Matches(labels.Set(o.GetLabels())) &&
c.selector.FieldSelector.Matches(o)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential nil pointer access if withSelector is never called and c.selector is left as nil. This currently isn't as issue as the only place it is used will always call it, but it still leaves a potential landmine in any future code refactoring/modification.

Comment on lines +30 to +43
func (c *ControllerBuilder) WithReconcilicationPeriodInMinutes(period int32) *ControllerBuilder {
c.reconciliationPeriodInMinutes = period
return c
}

func (c *ControllerBuilder) WithSelector(selector *selector.Selector) *ControllerBuilder {
c.selector = selector
return c
}

func (c *ControllerBuilder) ManagedBy(manager *Manager) *ControllerBuilder {
c.manager = manager
return c
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, are these fields really optional or are they always required in order for the code to work?

If they are always required, I dislike the idea of having them as optional parameters just so we save some horizontal space (in return for using more vertical one) and introducing potential future runtime errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you say to adding a validation in the build() method of all builders that checks for the required fields, and if any aren't set to return an error? This would be semantically similar to the Immutables library for Java.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be an improvement, but it would still be a runtime check rather than a compile time error. My main gripe with required "optionals" is that you have no compile time information that you forgot to set a field or not. If you're worried about the "constructor" function having too much horizontal space taken, you can always format it as:

func example(
	firstRequired string,
	secondRequired int,
	thirdRequired byte,
	optionals ...Option
) {
<code>
}

Comment on lines +25 to +60
func (b *ManagerBuilder) SetGrpcClient(client *grpc.RemoteWorkProcessorGrpcClient) *ManagerBuilder {
b.grpcClient = client
return b
}

func (b *ManagerBuilder) BuildDynamicClient(config *rest.Config) *ManagerBuilder {
dc, err := dynamic.NewDynamicClient(config)
if err != nil {
log.Panicln("Failed to create dynamic client:", err)
}
b.dynamicClient = dc
return b
}

func (b *ManagerBuilder) BuildInternalManager(config *rest.Config, scheme *runtime.Scheme) *ManagerBuilder {
t := time.Duration(0)
options := manager.Options{
Scheme: scheme,
GracefulShutdownTimeout: &t,
WebhookServer: nil,
LeaderElection: false,
HealthProbeBindAddress: "localhost:8811",
MetricsBindAddress: "0",
}

mgr, err := ctrl.NewManager(config, options)
if err != nil {
log.Panicln("Failed to create manager:", err)
}

// these can fail only if the manager has been started prior to calling them
mgr.AddHealthzCheck("healthz", healthz.Ping)
mgr.AddReadyzCheck("readyz", healthz.Ping)
b.delegate = mgr
return b
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at these, the only reason they are a struct method instead of a separate function is so they can assign the result in the end. Any reason why they are optionals again and are they really optional in our case? Any reason they can't be separate functions that have their results passed as arguments to the actual build function so that we can avoid having optional parameters that are actually always needed?

@@ -60,6 +60,9 @@ func (b *ManagerBuilder) BuildInternalManager(config *rest.Config, scheme *runti
}

func (b *ManagerBuilder) Build() Manager {
if b.delegate == nil || b.dynamicClient == nil || b.grpcClient == nil {
panic("Manager is missing required parameters")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can have the build return a (*Manager, error) so that error is handled by the upper function WatchResources which already has a flow handling exiting the app in case of error?

@radito3 radito3 merged commit 433c35f into main Mar 25, 2024
5 checks passed
@radito3 radito3 deleted the refactor branch March 25, 2024 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants