fix: reader failover wait for complete batch #390

crystall-bitquill · 2025-01-30T18:21:13Z

Summary

Fix connections being unclosed in reader failover handler

Description

Currently connection attempts are made in batches of two, and the first settled promise is returned. However, if the first connection task fails, the second is also abandoned. This PR changes the reader failover implementation to wait for the entire batch to complete before moving on.
This PR also changes the failover plugin to initialize the failover settings prior to creating the writer and reader failover handlers.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

common/lib/plugins/failover/reader_failover_handler.ts

karenc-bq · 2025-01-31T10:03:25Z

common/lib/plugins/failover/reader_failover_handler.ts

+      let errors: string = "";
+      for (const e of error.errors) {
+        // Propagate errors that are not caused by network errors.
+        if (!this.pluginService.isNetworkError(error)) {


What is the goal of this check here? Are we just checking to see if the aggregate error contains the AwsWrapperError we throw to indicate task aborted?

throw new AwsWrapperError("Selected task has been chosen. Abort client for host: " + this.newHost.host);

In this case would it be more clear if we just check for error instanceof AwsWrapperError?

We could also be more specific and create a new ReaderTaskAbortedError

In this check we want to ignore any network errors so we can continue trying connections to other instances (for example, connect ECONNREFUSED). If there are non-network errors, they will be propagated back to the user through the ReaderFailoverResult. This catch block should only be hit when both connection attempts in the batch failed, but the selected task error is only thrown when there has been a successful connection.

karenc-bq · 2025-01-31T10:04:16Z

common/lib/plugins/failover/reader_failover_handler.ts

+          Messages.get(
+            "ClusterAwareReaderFailoverHandler.batchFailed",
+            `[${hosts[i].hostId}${numTasks === 2 ? `, ${hosts[i + 1].hostId}` : ``}]`,
+            `[\n${errors}\n]`


Curious if we need to build this custom message or if we could just use the aggregate error's message

If we return a ReaderFailoverResult with an error here it will be sent back to the user, so I'd prefer for them to see an AwsWrapperError with some additional context rather than an AggregateError

common/lib/plugins/failover/reader_failover_handler.ts

…r handlers

crystall-bitquill requested a review from a team as a code owner January 30, 2025 18:21

crystall-bitquill force-pushed the fix/reader-failover-clear-failed-batch branch from 21f2ead to ef40f06 Compare January 31, 2025 00:12

crystall-bitquill changed the title ~~fix: reader failover connection attempt batch not cleaned up~~ fix: reader failover wait for complete batch Jan 31, 2025

crystall-bitquill force-pushed the fix/reader-failover-clear-failed-batch branch from ef40f06 to 1d4de58 Compare January 31, 2025 01:18

karenc-bq reviewed Jan 31, 2025

View reviewed changes

common/lib/plugins/failover/reader_failover_handler.ts Outdated Show resolved Hide resolved

karenc-bq reviewed Jan 31, 2025

View reviewed changes

crystall-bitquill force-pushed the fix/reader-failover-clear-failed-batch branch 5 times, most recently from 2bcc699 to 23e553a Compare February 6, 2025 00:28

karenc-bq approved these changes Feb 6, 2025

View reviewed changes

crystall-bitquill added 2 commits February 5, 2025 18:39

fix: reader failover wait for completed batch

b0419c2

fix: init failover settings before creating reader and writer failove…

d640937

…r handlers

crystall-bitquill force-pushed the fix/reader-failover-clear-failed-batch branch from 23e553a to d640937 Compare February 6, 2025 02:40

crystall-bitquill merged commit c6478cb into main Feb 6, 2025
2 checks passed

crystall-bitquill deleted the fix/reader-failover-clear-failed-batch branch February 6, 2025 02:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: reader failover wait for complete batch #390

fix: reader failover wait for complete batch #390

Uh oh!

crystall-bitquill commented Jan 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

karenc-bq Jan 31, 2025

Uh oh!

karenc-bq Jan 31, 2025

Uh oh!

crystall-bitquill Feb 1, 2025

Uh oh!

karenc-bq Jan 31, 2025

Uh oh!

crystall-bitquill Feb 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix: reader failover wait for complete batch #390

fix: reader failover wait for complete batch #390

Uh oh!

Conversation

crystall-bitquill commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Description

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Uh oh!

Uh oh!

karenc-bq Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

karenc-bq Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

crystall-bitquill Feb 1, 2025

Choose a reason for hiding this comment

Uh oh!

karenc-bq Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

crystall-bitquill Feb 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

crystall-bitquill commented Jan 30, 2025 •

edited

Loading