Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions cloud/src/resource-manager/resource_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1414,18 +1414,19 @@ std::pair<MetaServiceCode, std::string> ResourceManager::refresh_instance(
void ResourceManager::refresh_instance(const std::string& instance_id,
const InstanceInfoPB& instance) {
bool is_successor_instance = instance.has_original_instance_id();
std::string source_instance_id = is_successor_instance ? instance.source_instance_id() : "";
std::string predecessor_instance_id =
is_successor_instance ? instance.predecessor_instance_id() : "";

std::lock_guard l(mtx_);
for (auto i = node_info_.begin(); i != node_info_.end();) {
// erase all nodes not belong to this instance_id
if (i->second.instance_id != instance_id &&
// ... or, if is_successor_instance, erase nodes belong to source_instance_id
(!is_successor_instance || i->second.instance_id != source_instance_id)) {
// erase all nodes belong to this instance_id
if (i->second.instance_id == instance_id ||
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This switches cache eviction from source_instance_id to predecessor_instance_id, but I can't find any writer for the new field in the rollback flow. The current chain/rollback paths and the existing RollbackInstance test still populate only source_instance_id and original_instance_id. In that path, refreshing the rollback target stops removing the predecessor's cached nodes, so get_node() can return both old and new entries for the same cloud_unique_id until the predecessor is refreshed separately. get_instance_id() then logs a one-to-many warning and picks the last entry. Please either populate predecessor_instance_id everywhere a rollback successor is created, or keep the old fallback here until all producers are updated and tested.`

// ... or, if is_successor_instance, erase nodes belong to predecessor_instance_id
(is_successor_instance && i->second.instance_id == predecessor_instance_id)) {
i = node_info_.erase(i);
} else {
++i;
continue;
}
i = node_info_.erase(i);
}

// If successor_instance_id is set, it means this instance has a successor instance,
Expand Down
7 changes: 6 additions & 1 deletion gensrc/proto/cloud.proto
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ message InstanceInfoPB {
// For snapshot
optional MultiVersionStatus multi_version_status = 110;
optional SnapshotSwitchStatus snapshot_switch_status = 111;
optional string source_instance_id = 112; // The instance cloned from.
optional string source_instance_id = 112; // The instance cloned from (the snapshot instance id).
optional string source_snapshot_id = 113; // The snapshot cloned from.

// Inherited from which instance (only used during rollback, the earliest instance id).
Expand All @@ -147,6 +147,11 @@ message InstanceInfoPB {
optional int64 snapshot_retained_data_size = 121;
optional int64 snapshot_billable_data_size = 122;
optional SnapshotCompactStatus snapshot_compact_status = 123;

// The instance that is being rolled back, only used during rollback.
// It is not always same as source_instance_id, because the source instance may inherit from another
// instance during rollback, and the predecessor instance is the real instance which to execute the rollback.
optional string predecessor_instance_id = 124;
}

message StagePB {
Expand Down
Loading