Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: reader failover wait for complete batch #390

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

crystall-bitquill
Copy link
Contributor

@crystall-bitquill crystall-bitquill commented Jan 30, 2025

Summary

Fix connections being unclosed in reader failover handler

Description

Currently connection attempts are made in batches of two, and the first settled promise is returned. However, if the first connection task fails, the second is also abandoned. This PR changes the reader failover implementation to wait for the entire batch to complete before moving on.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@crystall-bitquill crystall-bitquill requested a review from a team as a code owner January 30, 2025 18:21
@crystall-bitquill crystall-bitquill force-pushed the fix/reader-failover-clear-failed-batch branch from 21f2ead to ef40f06 Compare January 31, 2025 00:12
@crystall-bitquill crystall-bitquill changed the title fix: reader failover connection attempt batch not cleaned up fix: reader failover wait for complete batch Jan 31, 2025
@crystall-bitquill crystall-bitquill force-pushed the fix/reader-failover-clear-failed-batch branch from ef40f06 to 1d4de58 Compare January 31, 2025 01:18
let errors: string = "";
for (const e of error.errors) {
// Propagate errors that are not caused by network errors.
if (!this.pluginService.isNetworkError(error)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the goal of this check here? Are we just checking to see if the aggregate error contains the AwsWrapperError we throw to indicate task aborted?

throw new AwsWrapperError("Selected task has been chosen. Abort client for host: " + this.newHost.host);

In this case would it be more clear if we just check for error instanceof AwsWrapperError?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also be more specific and create a new ReaderTaskAbortedError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this check we want to ignore any network errors so we can continue trying connections to other instances (for example, connect ECONNREFUSED). If there are non-network errors, they will be propagated back to the user through the ReaderFailoverResult. This catch block should only be hit when both connection attempts in the batch failed, but the selected task error is only thrown when there has been a successful connection.

Comment on lines +220 to +219
Messages.get(
"ClusterAwareReaderFailoverHandler.batchFailed",
`[${hosts[i].hostId}${numTasks === 2 ? `, ${hosts[i + 1].hostId}` : ``}]`,
`[\n${errors}\n]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if we need to build this custom message or if we could just use the aggregate error's message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we return a ReaderFailoverResult with an error here it will be sent back to the user, so I'd prefer for them to see an AwsWrapperError with some additional context rather than an AggregateError

common/lib/plugins/failover/reader_failover_handler.ts Outdated Show resolved Hide resolved
common/lib/plugins/failover/reader_failover_handler.ts Outdated Show resolved Hide resolved
@crystall-bitquill crystall-bitquill force-pushed the fix/reader-failover-clear-failed-batch branch 2 times, most recently from 04725a9 to 4688bc6 Compare February 1, 2025 03:57
@crystall-bitquill crystall-bitquill force-pushed the fix/reader-failover-clear-failed-batch branch from 4688bc6 to 665a9ea Compare February 4, 2025 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants