Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transaction manager refactoring #19889

Merged
merged 15 commits into from
Jul 12, 2024

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Jun 19, 2024

Major refactor of Redpanda transaction coordinator. The refactoring involves replication of every transaction state update into coordinator partition. The changes involve fixing the transaction state machine and providing a strict and centralised definition of possible transaction state transitions. Replicating each transaction state update allowed us to drop the tx_stm_cache and tx_stm_cache_manager completely.

Fixes: #18184
Fixes: #18251

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

@mmaslankaprv mmaslankaprv marked this pull request as draft June 19, 2024 09:09
@mmaslankaprv mmaslankaprv force-pushed the tx-tm-stm-refector branch 3 times, most recently from 1a1c84c to 169342f Compare June 25, 2024 12:30
@mmaslankaprv
Copy link
Member Author

/dt

@mmaslankaprv mmaslankaprv force-pushed the tx-tm-stm-refector branch 2 times, most recently from a99252a to cb3023c Compare June 25, 2024 15:06
@mmaslankaprv
Copy link
Member Author

/dt

@mmaslankaprv
Copy link
Member Author

/dt

@mmaslankaprv mmaslankaprv force-pushed the tx-tm-stm-refector branch 2 times, most recently from a7ebc00 to 70d07d1 Compare June 28, 2024 13:43
@mmaslankaprv
Copy link
Member Author

/dt

@mmaslankaprv
Copy link
Member Author

/dt

src/v/kafka/server/errors.h Outdated Show resolved Hide resolved
src/v/kafka/server/errors.h Outdated Show resolved Hide resolved
@@ -113,4 +114,48 @@ constexpr error_code map_topic_error_code(cluster::errc code) {
return error_code::unknown_server_error;
}

constexpr error_code map_tx_errc(cluster::tx::errc ec) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if this belongs in tx_err.h? kafka/server/errors.h depending on (internal) tx_err seems a bit odd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the other way around would be incorrect as it would make cluster submodule depend on kafka submodule (it already kind of does), this is the kafka protocol implementation detail on how to map a more internal error code to user facing protocol defined errors.

@@ -29,7 +29,7 @@ class CompactionE2EIdempotencyTest(RedpandaTest):
def __init__(self, test_context):
extra_rp_conf = {}

self.segment_size = 5 * 1024 * 1024
self.segment_size = 1 * 1024 * 1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test was failing to me even before the refactoring.

Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still looking at the last two (main) commits, fleshed out the remaining comments.

src/v/cluster/tm_stm_types.cc Outdated Show resolved Hide resolved
src/v/cluster/tm_stm_types.h Outdated Show resolved Hide resolved
src/v/cluster/tm_stm_types.h Show resolved Hide resolved
src/v/cluster/tm_stm_types.h Outdated Show resolved Hide resolved
src/v/cluster/tm_stm_types.cc Outdated Show resolved Hide resolved
src/v/cluster/tm_stm_types.h Outdated Show resolved Hide resolved
case empty:
// ready is an initial state a transaction can never go back to that
// state
return is_one_of(current.status, tx_status::empty);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think one confusing bit here is the state transition from to itself (current == target) is a valid transition. Is that intentional? That seems to be the case for all the states.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is intentional for the transitions to be idempotent, i will think more about this

src/v/cluster/tm_stm_types.cc Outdated Show resolved Hide resolved
src/v/cluster/tm_stm_types.cc Outdated Show resolved Hide resolved
@mmaslankaprv mmaslankaprv force-pushed the tx-tm-stm-refector branch 2 times, most recently from 50585f4 to 20d8a4e Compare July 1, 2024 14:57
@mmaslankaprv
Copy link
Member Author

/dt

@bharathv
Copy link
Contributor

bharathv commented Jul 9, 2024

Full chaos run using the latest build is at https://buildkite.com/redpanda/vtools/builds/15441#_

Previously the test was not testing what it expected to test as the
error was validated against was incorrect. The test passed as the
transaction was already timeouted when group was finally rebalanced.

Signed-off-by: Michał Maślanka <[email protected]>
When producers are evicted from the `rm_stm` the timer is fired based on
the requested eviction timeout. Added an upper bound on the frequency of
timer fires to prevent busy loops.

Signed-off-by: Michał Maślanka <[email protected]>
The epoch is exhausted if it is equal to the max possible epoch - 1.

Signed-off-by: Michał Maślanka <[email protected]>
Introduced an error code indicating existence of transaction concurrent
to the one client is interested in. Concurrent transactions error
indicates that client should retry without requesting a metadata
update.

Signed-off-by: Michał Maślanka <[email protected]>
Unified mapping of transactional error codes to kafka protocol error
codes.

Signed-off-by: Michał Maślanka <[email protected]>
Replaced absl flat hash maps with chunked version to prevent memory
fragmentation.

Signed-off-by: Michał Maślanka <[email protected]>
When transaction is expired on one of the partitions the coordinator is
requested to send abort requests to other transactions participants.
Changed the test to account for that logic instead of requesting each
data partition separately to abort the transaction.

Signed-off-by: Michał Maślanka <[email protected]>
Added a function validating state transition of transaction FSM.

Signed-off-by: Michał Maślanka <[email protected]>
Previously add partition to transaction and add group to transaction
data batches were not persisted in transaction manager log. This made
transactions implementation vulnerable to coordinator leadership
changes. Changed the code to base on replicating each transaction state
update.

This commit includes a major refactoring of transactions subsystem
including:
- `tx_status` values rename
- dropping the tm_stm_cache and moving the state to tm_stm
- refining the logic of transaction coordinator and gateway

Signed-off-by: Michał Maślanka <[email protected]>
The dependency on `cluster::controller` is not required as it is enough
for the gateway to know current node id.

Signed-off-by: Michał Maślanka <[email protected]>
Now the transaction state validation is more strict so the coordinator
started to send more appropriate error codes.

Signed-off-by: Michał Maślanka <[email protected]>
The transactional coordinator state machine is waiting for application
of batches to the state when handling client requests it is critical for
the application to happen timely. Previously the coordinator topic
wasn't using `storage::batch_cache` which made the appended batches not
immediately available to read by the state machine manager. In this case
the manager waits for some time to prevent busy looping.

Enabling the cache for the transaction manager topic makes the
operations much faster and decreases the client latency.

Signed-off-by: Michał Maślanka <[email protected]>
Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@mmaslankaprv
Copy link
Member Author

@mmaslankaprv mmaslankaprv merged commit 45434d1 into redpanda-data:dev Jul 12, 2024
16 of 19 checks passed
@mmaslankaprv mmaslankaprv deleted the tx-tm-stm-refector branch July 12, 2024 15:40
benesch added a commit to benesch/materialize that referenced this pull request Aug 14, 2024
Redpanda v24.2 includes a fix that allows our sinks to restart after a
Redpanda broker failure without waiting out the full transaction timeout
(often 10m+).

For details about the original issue and the fix in Redpanda, see:

  * redpanda-data/redpanda#19889
  * MaterializeInc/incidents-and-escalations#33

Fix MaterializeInc/incidents-and-escalations#33.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants