Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add New VRCs & VSCs, FixDuplicate SC-ID Issue #1823

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rakeshgm
Copy link
Member

@rakeshgm rakeshgm commented Feb 11, 2025

This PR continues the improvements introduced in [1] and [2] by adding new VRCs and VSCs based on the newly created SCs. Additionally, it includes fixes to prevent duplicate SC-ID generation.

References:
[1] #1756
[2] #1756

Here is the tree of all the resources. I have not made any changes to the existing names. I will try to fix them in the next updates. However, I believe this is a necessary change with multiple SCs in the cluster.

clusters:
  - dr1:
      RDB-SC:
        - sc-name: rook-ceph-block
          sc-id: rook-ceph-block-dr1-1
        - sc-name: rook-ceph-block-2
          sc-id: rook-ceph-block-2-dr1-1
      Ceph-FS-SC:
        - sc-name: rook-cephfs-test-fs1
          sc-id: rook-cephfs-test-fs1-dr1-1
        - sc-name: rook-cephfs-test-fs2
          sc-id: rook-cephfs-test-fs2-dr1-1
      VolumeReplicationClass:
        - name: vrc-sample
          sc-id: rook-ceph-block-dr1-1
          replication-id: rook-ceph-block-replication-1
        - name: vrc-sample-2
          sc-id: rook-ceph-block-2-dr1-1
          replication-id: rook-ceph-block-2-replication-1
      VolumeGroupReplicationClass:
        - name: vgrc-sample
          sc-id: rook-ceph-block-dr1-1
          replication-id: rook-ceph-block-replication-1
        - name: vgrc-sample-2
          sc-id: rook-ceph-block-2-dr1-1
          replication-id: rook-ceph-block-2-replication-1
      SnapshotClass-RDB:
        - name: csi-rbdplugin-snapclass
          sc-id: rook-ceph-block-dr1-1
        - name: csi-rbdplugin-snapclass-2
          sc-id: rook-ceph-block-2-dr1-1
      SnapshotClass-CephFS:
        - name: csi-cephfsplugin-snapclass
          sc-id: rook-cephfs-test-fs1-dr1-1
        - name: csi-cephfsplugin-snapclass-2
          sc-id: rook-cephfs-test-fs2-dr1-1

  - dr2:
      RDB-SC:
        - sc-name: rook-ceph-block
          sc-id: rook-ceph-block-dr2-1
        - sc-name: rook-ceph-block-2
          sc-id: rook-ceph-block-2-dr2-1
      Ceph-FS-SC:
        - sc-name: rook-cephfs-test-fs1
          sc-id: rook-cephfs-test-fs1-dr2-1
        - sc-name: rook-cephfs-test-fs2
          sc-id: rook-cephfs-test-fs2-dr2-1
      VolumeReplicationClass:
        - name: vrc-sample
          sc-id: rook-ceph-block-dr2-1
          replication-id: rook-ceph-block-replication-1
        - name: vrc-sample-2
          sc-id: rook-ceph-block-2-dr2-1
          replication-id: rook-ceph-block-2-replication-1
      VolumeGroupReplicationClass:
        - name: vgrc-sample
          sc-id: rook-ceph-block-dr2-1
          replication-id: rook-ceph-block-replication-1
        - name: vgrc-sample-2
          sc-id: rook-ceph-block-2-dr2-1
          replication-id: rook-ceph-block-2-replication-1
      SnapshotClass-RDB:
        - name: csi-rbdplugin-snapclass
          sc-id: rook-ceph-block-dr2-1
        - name: csi-rbdplugin-snapclass-2
          sc-id: rook-ceph-block-2-dr2-1
      SnapshotClass-CephFS:
        - name: csi-cephfsplugin-snapclass
          sc-id: rook-cephfs-test-fs1-dr2-1
        - name: csi-cephfsplugin-snapclass-2
          sc-id: rook-cephfs-test-fs2-dr2-1

This is the status of PeerClasses from DRPolicy

status:
  async:
    peerClasses:
      - clusterIDs:
          - cluster-id-1
          - cluster-id-2
        replicationID: rook-ceph-block-replication-1
        storageClassName: rook-ceph-block
        storageID:
          - rook-ceph-block-dr1-1
          - rook-ceph-block-dr2-1
      - clusterIDs:
          - cluster-id-1
          - cluster-id-2
        replicationID: rook-ceph-block-2-replication-1
        storageClassName: rook-ceph-block-2
        storageID:
          - rook-ceph-block-2-dr1-1
          - rook-ceph-block-2-dr2-1
      - clusterIDs:
          - cluster-id-1
          - cluster-id-2
        storageClassName: rook-cephfs-test-fs1
        storageID:
          - rook-cephfs-test-fs1-dr1-1
          - rook-cephfs-test-fs1-dr2-1
      - clusterIDs:
          - cluster-id-1
          - cluster-id-2
        storageClassName: rook-cephfs-test-fs2
        storageID:
          - rook-cephfs-test-fs2-dr1-1
          - rook-cephfs-test-fs2-dr2-1

@nirs
Copy link
Member

nirs commented Feb 12, 2025

Additionally, it includes minor fixes to prevent duplicate SC-ID generation.

Can you separate the fix so we can merge it quickly? If the previous changes introduced an issue, we should fix it first before adding more changes.

@nirs
Copy link
Member

nirs commented Feb 12, 2025

Also it is not clear to me what is the purpose of this change. What are the changes and why they are needed?

The pr message shows list of resources but it is not clear what we had before and what is new.

@rakeshgm
Copy link
Member Author

results of basic-test:

./basic-test/run $env
2025-02-12 16:00:16,820 INFO    [deploy] Deploying application
2025-02-12 16:00:16,820 INFO    [deploy] Deploying application 'deployment-rbd'
2025-02-12 16:00:20,310 INFO    [deploy] Waiting for 'placement.cluster.open-cluster-management.io/placement' decisions
2025-02-12 16:00:20,738 INFO    [deploy] Application running on cluster 'dr1'
2025-02-12 16:00:20,971 INFO    [enable-dr] Enable DR
2025-02-12 16:00:21,114 INFO    [enable-dr] Disabling OCM scheduling for 'placement.cluster.open-cluster-management.io/placement'
2025-02-12 16:00:21,520 INFO    [enable-dr] Waiting for 'placement.cluster.open-cluster-management.io/placement' decisions
2025-02-12 16:00:22,463 INFO    [enable-dr] waiting for namespace deployment-rbd
2025-02-12 16:00:22,869 INFO    [enable-dr] Waiting until 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' reports status
2025-02-12 16:00:30,586 INFO    [enable-dr] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' Available condition
2025-02-12 16:00:30,927 INFO    [enable-dr] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' PeerReady condition
2025-02-12 16:00:31,215 INFO    [enable-dr] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' first replication
2025-02-12 16:01:56,212 INFO    [enable-dr] DR enabled
2025-02-12 16:01:56,516 INFO    [failover] Fail over application
2025-02-12 16:01:56,661 INFO    [failover] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' Available condition
2025-02-12 16:01:56,932 INFO    [failover] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' PeerReady condition
2025-02-12 16:01:57,202 INFO    [failover] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' first replication
2025-02-12 16:01:57,457 INFO    [failover] Waiting for 'placement.cluster.open-cluster-management.io/placement' decisions
2025-02-12 16:01:58,041 INFO    [failover] Starting failover for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' to cluster 'dr2'
2025-02-12 16:01:58,662 INFO    [failover] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' Available condition
2025-02-12 16:02:28,768 INFO    [failover] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' PeerReady condition
2025-02-12 16:05:27,139 INFO    [failover] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' first replication
2025-02-12 16:05:28,699 INFO    [failover] Application was failed over
2025-02-12 16:05:28,954 INFO    [relocate] Relocate application
2025-02-12 16:05:29,079 INFO    [relocate] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' Available condition
2025-02-12 16:05:29,347 INFO    [relocate] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' PeerReady condition
2025-02-12 16:05:29,600 INFO    [relocate] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' first replication
2025-02-12 16:05:29,918 INFO    [relocate] Waiting for 'placement.cluster.open-cluster-management.io/placement' decisions
2025-02-12 16:05:30,481 INFO    [relocate] Starting relocate for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' to cluster 'dr1'
2025-02-12 16:05:30,825 INFO    [relocate] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' phase 'Relocated'
2025-02-12 16:07:30,617 INFO    [relocate] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' Available condition
2025-02-12 16:07:30,901 INFO    [relocate] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' PeerReady condition
2025-02-12 16:07:31,228 INFO    [relocate] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' first replication
2025-02-12 16:07:31,466 INFO    [relocate] Application was relocated
2025-02-12 16:07:31,735 INFO    [disable-dr] Disable DR
2025-02-12 16:07:31,955 INFO    [disable-dr] Deleting 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc'
2025-02-12 16:08:34,409 INFO    [disable-dr] DR was disabled
2025-02-12 16:08:34,674 INFO    [undeploy] Deleting application
2025-02-12 16:08:34,674 INFO    [undeploy] Undeploying application 'deployment-rbd'
2025-02-12 16:08:43,920 INFO    [undeploy] Application was deleted

Copy link
Member

@nirs nirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed only the fix for now.

@@ -108,11 +108,11 @@ def configure_rbd_mirroring(cluster, peer_info):

print("Creating VolumeReplicationClass")
template = drenv.template("start-data/vrc-sample.yaml")
yaml = template.substitute(cluster=cluster)
yaml = template.substitute(cluster=cluster, scname="rook-ceph-block")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scname is cryptic. How about storage_class="rook-ceph-block"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I would like to take this enhancement in the next PR. the important parts are completed now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to fix this now, this is a simple rename. But I'm also ok with fixing this later if you don't have time now.

@@ -13,7 +13,7 @@ kind: VolumeSnapshotClass
metadata:
name: csi-cephfsplugin-snapclass
labels:
ramendr.openshift.io/storageid: rook-cephfs-$cluster-1
ramendr.openshift.io/storageid: rook-cephfs-$fsname-$cluster-1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we don't use the storage class name instead of hardcoding "rook-cephfs"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@@ -7,7 +7,7 @@ kind: StorageClass
metadata:
name: $name
labels:
ramendr.openshift.io/storageid: rook-ceph-$cluster-1
ramendr.openshift.io/storageid: $name-$cluster-1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@@ -11,7 +11,7 @@ kind: StorageClass
metadata:
name: rook-cephfs-$fsname
labels:
ramendr.openshift.io/storageid: rook-cephfs-$cluster-1
ramendr.openshift.io/storageid: rook-cephfs-$fsname-$cluster-1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we don't use the same method - the storage class name. Using the same way to customize will make the code simpler and more uniform.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@rakeshgm rakeshgm force-pushed the drenv-multisc-cont branch 3 times, most recently from a6cd233 to 9a817f1 Compare February 18, 2025 17:13
@@ -108,11 +108,11 @@ def configure_rbd_mirroring(cluster, peer_info):

print("Creating VolumeReplicationClass")
template = drenv.template("start-data/vrc-sample.yaml")
yaml = template.substitute(cluster=cluster)
yaml = template.substitute(cluster=cluster, scname="rook-ceph-block")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to fix this now, this is a simple rename. But I'm also ok with fixing this later if you don't have time now.

@@ -9,9 +9,9 @@
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-cephfs-$fsname
name: $scname
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called "name" in the rbd storage class - lets be consistent.

VOLUME_REPLICATION_CLASSES,
VOLUME_GROUP_REPLICATION_CLASSES,
)
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STORAGE_CLASSES_MAP is very confusing since this is not a map - it is a list.

This is also very confusing code, I find it very hard to understand.

We can simplify this by creating static list of dicts like this:

CONFIG = [
    {
        "pool": "replicapool",
        "sc": "rook-ceph-block",
        "vrc": "vrc-sample",
        "vgrc": "vgrc-sample",
    },
    {
        "pool": "replicapool-2",
        "sc": "rook-ceph-block-2",
        "vrc": "vrc-sample-2",
        "vgrc": "vgrc-sample-2",
    },
]

The code can can accept a config dict and use the values from the dict.

This structure can also be extracted to external yaml files later, to separate the code from the configuration. Same script will be able to create different configurations.

@@ -33,7 +52,7 @@ def fetch_secret_info(cluster):

print(f"Getting mirroring info site name for cluster '{cluster}'")
info["name"] = drenv.wait_for(
f"cephblockpools.ceph.rook.io/{POOL_NAME}",
f"cephblockpools.ceph.rook.io/{POOL_NAMES[0]}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the simplified configuration I suggested, this will become:

CONFIG[0]["pool"]

vgrcname=storage_class["vgrc"],
scname=storage_class["name"],
)
kubectl.apply("--filename=-", input=yaml, context=cluster)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to complicate the code like this. Lets keep the code simple, creating and setting up mirroring on for single pool.

To create multiple vrc/vgrc/sc and setup mirroring on all pools, we can run the entire code in parallel with 2 configurations. Each thread will create resources and set up mirroring for single pool.

STORAGE_CLASSES_MAP = [
{"fs": fs, "name": f"{STORAGE_CLASS_NAME_PREFIX}{fs}", "vsc": vsc}
for fs, vsc in zip(FS_NAMES, CEPHFS_VOLUME_SNAPSHOT_CLASSES)
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a map but a list. Should be the same static list with all the info as in rbd-mirror/start.

@rakeshgm
Copy link
Member Author

I will make this PR a draft now and open a separate PR for the duplicate SID fix since that changes are mostly reviewed and we want them soon.

@rakeshgm rakeshgm marked this pull request as draft February 19, 2025 11:22
adding new VRCs and VSCs based on the newly created SCs

Signed-off-by: rakeshgm <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants