Squashed commit of adding support for cli plugin validation reporting. #5665

edwbuck · 2024-11-26T19:17:52Z

Signed-off-by: Edwin Buck <[email protected]>

This unit test crashes the server with an intentinally bad config. The problem was, it crashed in the wrong location. Signed-off-by: Edwin Buck <[email protected]>

Fix import order on fakedatastore.go Comparison on boolean only, not boolean == true on server.go remove whitespace in "// nolint" for listener.go, ca.go, which was just added due to baseline merge conflict. Signed-off-by: Edwin Buck <[email protected]>

Signed-off-by: Edwin Buck <[email protected]>

edwbuck · 2024-11-27T15:30:13Z

@amartinezfayo All checks are passing. If you get the review in, I'll make an effort to address it prior to the holidays.

Signed-off-by: Edwin Buck <[email protected]>

amartinezfayo

Thank you @edwbuck for this contribution and all the effort done to have validation of plugin configurations implemented in SPIRE.
My main concern is around the execution of the Server.run function as part of the validate command. I think that this is problematic. I've added a comment about that.

I noticed that the validate command is not implemented in the agent. Are you planning to add that separately?

I also noticed that validation of plugin configurations is not exercised in unit tests. We should add tests for that.

pkg/common/catalog/catalog.go

pkg/common/catalog/plugin.go

pkg/common/catalog/catalog.go

amartinezfayo · 2024-12-03T13:31:28Z

cmd/spire-server/cli/validate/validate.go

+		ctx = context.Background()
+	}
+
+	err = s.Run(ctx)


This runs a new server instance. I think that this approach is problematic in multiple ways. It requires to maintain the ValidateOnly setting in the config, and have different flows of the run function depending of its value. Since we don't want to run a server instance, executing this function doesn't seem to be appropriate when running the validate command.
This requires that when the run function is updated, we need to remember that it's also called as part of the validate command, and consider if the new code can be executed as part of the validation or not.

I believe that as is in this PR, running the validate command will connect to the backend database, run migrations, create the data directory if needed and start the runtime collector for metrics. It also has the side effect of output the logs as if the server is running, which is a surprising behavior as part of a validate command, and makes the parsing of the output more difficult.
Although it could be changed to not do those things, I think that is an example of how problematic this can be if we miss to update the run function with the appropriate ValidateOnly condition. Running a non intended database migration can be particularly problematic.

I think that a better approach would be to not involve the run function as part of the execution of a validate command.
The validate command would just 1) Load the server/agent config as is it currently does (which validates the server/agent config) and 2) Go through each configured plugin and call the plugin's Validate function.

Since we don't want to run a server instance

I think there is a disconnect here. The entire plan since the beginning was to run a limited server instance that could load the plugins, and then run the plugin Validate() funcs, as detailed over the last six months of development. This was mentioned in multiple contributor's meetings, and never met a hint of "we don't want to do this". The exact words were "we will launch a new server instance in such a way it doesn't interfere with the running server instance, it will load the plugins, validate the configuration, and then shut down."

The approach you suggested was the initial approach, and it didn't work. In a call with @azdagron on October 29th, the reason it didn't work was explained to me:

The server config cannot launch the plugins without exposing various server services that the plugins will attempt to connect to prior to the ability to call Validate(). These included a datastore connection, a metrics connection, and a logging connection.

The Plugin launching isn't partitioned in such a way that common/catalog can be used without the same data in server/catalog.

The server/catalog depends on the various plugin setups that are specific to the server.

Those setups include the server having initialized the support services the plugins may require. Passing nil into these support service setups doesn't work.

Andrew directed the solution to include a "ValidateOnly" flag that previously didn't exist, which is why the submission took a couple of extra weeks to refactor and debug. While the solution seems less elegant than other approaches, it actually resulted in less code (meaning less maintenance footprint).

The ValidateOnly returns from Run() to shutdown the server as soon as the plugins validate, before any of the processing of data is performed.

Thanks for the detailed response. I've analyzed this and I still think that we can and should avoid executing Server.run as part of the validate command.
I believe that we can achieve this by having a separate function for validation (one for the server, one for the agent) that is similar to the catalog.Load function but it calls the Validate function instead of Configure. We should also be able to use non-functional loggers and metrics, avoiding to output logs. Validation of the datastore plugin configuration should also be possible, I think.

I've done a quick POC of how this would work, it can be found here: main...amartinezfayo:validate-server-cli
It can validate all configured regular plugins and the sql datastore plugin, without the need of opening database connections, creating the data directory or going through migrations. It should be updated to leverage all the notes from the Validate response and not only the error.

amartinezfayo · 2024-12-03T13:58:24Z

pkg/server/datastore/sqlstore/sqlstore.go

@@ -854,6 +858,10 @@ func (ds *Plugin) Configure(_ context.Context, hclConfiguration string) error {
 	return ds.openConnections(config)
 }

+func (ds *Plugin) Validate(_ context.Context, _ catalog.CoreConfig, hclConfiguration string) (*catalog.ValidateResult, error) {
+	return nil, nil


Could we decode the configuration and perform the validation based on the result of configuration.Validate?

We can, but it should go into a future Issue. The entire SQLStore plugins aren't 100% conformant with the "real" plugins, and I did some work to make them more conformant. However, going through each one and actually implementing a proper handling of "configure / validate" is a outside the scope of this effort.

cmd/spire-server/cli/validate/validate.go

pkg/common/catalog/configure.go

Signed-off-by: Edwin Buck <[email protected]>

edwbuck · 2024-12-03T18:11:04Z

Thank you @edwbuck for this contribution and all the effort done to have validation of plugin configurations implemented in SPIRE.
My main concern is around the execution of the Server.run function as part of the validate command. I think that this is problematic. I've added a comment about that.

That's because the prior attempts to add the functionality as Server.validate function(s) led to not enough of the server coming up to properly load the plugins. The intent was always to "launch a server instance with just enough to validate the plugins and shutdown". The solution here uses the ValidateOnly flag (suggested by maintainer @azdagron ) to shut the server down prior to it doing any processing. Alas, plugins can't be loaded into the server correctly unless the server has a datastore, a logging proxy, and a metrics server proxy.

Prior attempts along the path of creating new Server.validate function(s) also led to a large amount of code duplication, and while I don't enjoy the "ValidateOnly" flag, the code is less likely to require duplicate changes, increasing maintenance.

If we can identify other areas where we want to avoid staring items, we can always check it against ValidateOnly. My analysis of the system is not 100% complete, I didn't see any unnecessary systems launch (although I would happily accept suggestions for squelching the logging).

I noticed that the validate command is not implemented in the agent. Are you planning to add that separately?

The work on the server took a lot of time and effort. I would like to implement this for the Agent, but agent plugin verification is not as critical as server plugin verification. I've filed #5677 and will work on it as time permits.

I also noticed that validation of plugin configurations is not exercised in unit tests. We should add tests for that.

The entire validation commands in the cli are unlike any other cli command. They borrow heavily from the run command, and much of the logic is already covered in unit test under the run commands' unit tests. That's why we cannot use our typical unit testing of cli commands for the validate commands (and probably why nobody valued fixing it enough to make it work).

If we want to cut this sharing and refactor it into making a common, shared cli config / config file loading, then it will be significantly easier to structure the testing appropriately. Again, my goal was not to redo the cli commands as they exist, I'm trying to keep the commit minimal. I'm open to opening a new Issue / PR to fix this, but I would like to keep it outside of the scope of this effort (getting the plugins to report their validation notes).

Signed-off-by: Edwin Buck <[email protected]>

Tests include: 1. One minimal config file that should succeed. 2. One minimal config file that has an error in the disk keymanager. Signed-off-by: Edwin Buck <[email protected]>

The logging output is redirected to /dev/null, so the actual validation output can more easily be read. Signed-off-by: Edwin Buck <[email protected]>

edwbuck · 2024-12-04T15:46:59Z

@amartinezfayo Thank you for you attentiveness and direction on this PR. I'm attempting to get the requested changes in as early as possible, to give you additional time to review them.

I'm tracking the remaining items here:

Remove logging output as it adds noise to the validation output.
Make all Validation plugin funcs return without error, using only the Valid flag to show if the plugin configuration is valid / invalid.
Ensure that the sql datastore is launched without the ability to perform database migrations
Restructure the call patterns such that the ValidateOnly flag doesn't exist.

The suggestion @amartinezfayo made to redirect logs to /dev/null has been followed and pushed; however, /dev/null is a distinctly POSIX construct, and I need the equivalent for a Windows system, so we don't release the validate command to potentially attempt to write to /dev/null on a Windows platform.

The last one "Restructure call patterns" will remove the ValidateOnly flag, but I additionally added fields to the same configure struct to capture the output of validation. With these additional fields, are they also to be removed?

This permits getting the entire output of the notes, and aligns with the desired output in the code review. As we don't know which plugin note was the failure, an additional note is added when the plugin's configuration is invalid. Signed-off-by: Edwin Buck <[email protected]>

amartinezfayo · 2024-12-04T22:47:26Z

however, /dev/null is a distinctly POSIX construct, and I need the equivalent for a Windows system, so we don't release the validate command to potentially attempt to write to /dev/null on a Windows platform.

We should use the os.DevNull constant, that will be replaced by "/dev/null" on Unix-like systems and "NUL" in Windows, which is the way to refer a null device.

The last one "Restructure call patterns" will remove the ValidateOnly flag, but I additionally added fields to the same configure struct to capture the output of validation. With these additional fields, are they also to be removed?

Yes, I don't think that we need those additional fields and I think that they should also be removed. The output of the validation should be returned by the function called by the command (using the configv1.ValidateResponse type), without having to add additional fields in the server configuration. The main...amartinezfayo:validate-server-cli branch is updated to reflect that.

Signed-off-by: Edwin Buck <[email protected]>

Squahed commit of adding support for cli plugin validation reporting.

6c8a88c

Signed-off-by: Edwin Buck <[email protected]>

edwbuck requested review from evan2645, amartinezfayo, azdagron, MarcosDY and rturner3 as code owners November 26, 2024 19:17

edwbuck mentioned this pull request Nov 26, 2024

cli command to validate configuration using plugin validation routines. #5626

Closed

Fix a few unit tests.

7c42e8f

Signed-off-by: Edwin Buck <[email protected]>

amartinezfayo self-assigned this Nov 26, 2024

Fix datastore unit tests not that sql datastores require trust domains.

4fccfd2

Signed-off-by: Edwin Buck <[email protected]>

amartinezfayo added this to the 1.11.1 milestone Nov 26, 2024

edwbuck added 4 commits November 27, 2024 04:08

Fix the last unit test.

40c57bf

This unit test crashes the server with an intentinally bad config. The problem was, it crashed in the wrong location. Signed-off-by: Edwin Buck <[email protected]>

Fix linting issues.

800f890

Fix import order on fakedatastore.go Comparison on boolean only, not boolean == true on server.go remove whitespace in "// nolint" for listener.go, ca.go, which was just added due to baseline merge conflict. Signed-off-by: Edwin Buck <[email protected]>

Fix windows unit test.

a005a10

Signed-off-by: Edwin Buck <[email protected]>

Merge branch 'main' into squash-validation-cli

25ebf31

Signed-off-by: Edwin Buck <[email protected]>

Merge branch 'main' into squash-validation-cli

5ddf246

Signed-off-by: Edwin Buck <[email protected]>

amartinezfayo reviewed Dec 3, 2024

View reviewed changes

Remove plugin load / binding logging, per review comments.

afe6263

Signed-off-by: Edwin Buck <[email protected]>

edwbuck mentioned this pull request Dec 3, 2024

Improve the Agent Validation command to include plugin validation. #5677

Open

Remove unused logger, now that plugin loading logging has been removed.

7064969

Signed-off-by: Edwin Buck <[email protected]>

edwbuck force-pushed the squash-validation-cli branch from 152d99b to 7064969 Compare December 3, 2024 18:20

edwbuck added 3 commits December 3, 2024 10:26

Merge branch 'main' into squash-validation-cli

8a6aa40

Signed-off-by: Edwin Buck <[email protected]>

Minimal unit testing for spire-server validate.

4995563

Tests include: 1. One minimal config file that should succeed. 2. One minimal config file that has an error in the disk keymanager. Signed-off-by: Edwin Buck <[email protected]>

Remove logger output for validat launch.

c0615b9

The logging output is redirected to /dev/null, so the actual validation output can more easily be read. Signed-off-by: Edwin Buck <[email protected]>

amartinezfayo removed this from the 1.11.1 milestone Dec 5, 2024

amartinezfayo added this to the 1.11.2 milestone Dec 5, 2024

Work to align to Agustin's design goals.

4f2cafa

Signed-off-by: Edwin Buck <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Squashed commit of adding support for cli plugin validation reporting. #5665

Squashed commit of adding support for cli plugin validation reporting. #5665

edwbuck commented Nov 26, 2024

edwbuck commented Nov 27, 2024

amartinezfayo left a comment

amartinezfayo Dec 3, 2024

edwbuck Dec 3, 2024 •

edited

Loading

amartinezfayo Dec 4, 2024

amartinezfayo Dec 3, 2024

edwbuck Dec 3, 2024

edwbuck commented Dec 3, 2024

edwbuck commented Dec 4, 2024 •

edited

Loading

amartinezfayo commented Dec 4, 2024

Squashed commit of adding support for cli plugin validation reporting. #5665

Are you sure you want to change the base?

Squashed commit of adding support for cli plugin validation reporting. #5665

Conversation

edwbuck commented Nov 26, 2024

edwbuck commented Nov 27, 2024

amartinezfayo left a comment

Choose a reason for hiding this comment

amartinezfayo Dec 3, 2024

Choose a reason for hiding this comment

edwbuck Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

amartinezfayo Dec 4, 2024

Choose a reason for hiding this comment

amartinezfayo Dec 3, 2024

Choose a reason for hiding this comment

edwbuck Dec 3, 2024

Choose a reason for hiding this comment

edwbuck commented Dec 3, 2024

edwbuck commented Dec 4, 2024 • edited Loading

amartinezfayo commented Dec 4, 2024

edwbuck Dec 3, 2024 •

edited

Loading

edwbuck commented Dec 4, 2024 •

edited

Loading