Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSS] Test object expiration with millions of objects #10153

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

mashetty330
Copy link
Contributor

@mashetty330 mashetty330 commented Jul 22, 2024

@mashetty330 mashetty330 self-assigned this Jul 22, 2024
@pull-request-size pull-request-size bot added the size/M PR that changes 30-99 lines label Jul 22, 2024
Copy link

openshift-ci bot commented Jul 22, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mashetty330

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pull-request-size pull-request-size bot added size/L PR that changes 100-499 lines and removed size/M PR that changes 30-99 lines labels Jul 24, 2024
@mashetty330 mashetty330 marked this pull request as ready for review July 24, 2024 13:47
@mashetty330 mashetty330 requested review from a team as code owners July 24, 2024 13:47
@mashetty330 mashetty330 added team/e2e E2E team related issues/PRs Customer defects Defects automated aspart of GSS closed loop MCG Multi Cloud Gateway / NooBaa related issues Squad/Red labels Jul 31, 2024
Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: mashetty-vma213
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_object_expiration_scale.py
Additional Test Params:
OCP VERSION: 4.15
OCS VERSION: 4.15
tested against branch: master

Job UNSTABLE (some or all tests failed).

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: mashetty-vma213
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_object_expiration_scale.py
Additional Test Params:
OCP VERSION: 4.15
OCS VERSION: 4.15
tested against branch: master

Job UNSTABLE (some or all tests failed).

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: mashetty-vma213
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_object_expiration_scale.py
Additional Test Params:
OCP VERSION: 4.15
OCS VERSION: 4.15
tested against branch: master

Job UNSTABLE (some or all tests failed).

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: mashetty-vma213
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_object_expiration_scale.py
Additional Test Params:
OCP VERSION: 4.15
OCS VERSION: 4.15
tested against branch: master

Job UNSTABLE (some or all tests failed).

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: mashetty-vm21
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_object_expiration_scale.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master

Job UNSTABLE (some or all tests failed).

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: mashetty-vmf03
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml conf/ocsci/enable_huge_pages.yaml conf/ocsci/encryption_in_transit.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_delete_objects.py::TestDeleteObjects:: test_delete_objects_with_expiration
Additional Test Params:
OCP VERSION: 4.18
OCS VERSION: 4.18
tested against branch: master

Job FAILED (installation failed, tests not executed).

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: mashetty-vmf03
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml conf/ocsci/enable_huge_pages.yaml conf/ocsci/encryption_in_transit.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_delete_objects.py::TestDeleteObjects::test_delete_objects_with_expiration
Additional Test Params:
OCP VERSION: 4.18
OCS VERSION: 4.18
tested against branch: master

Job UNSTABLE (some or all tests failed).

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: mashetty-vmf03
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml conf/ocsci/enable_huge_pages.yaml conf/ocsci/encryption_in_transit.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_delete_objects.py::TestDeleteObjects::test_delete_objects_with_expiration
Additional Test Params:
OCP VERSION: 4.18
OCS VERSION: 4.18
tested against branch: master

Job UNSTABLE (some or all tests failed).

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: mashetty-vmf03
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml conf/ocsci/enable_huge_pages.yaml conf/ocsci/encryption_in_transit.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_delete_objects.py::TestDeleteObjects::test_delete_objects_with_expiration
Additional Test Params:
OCP VERSION: 4.18
OCS VERSION: 4.18
tested against branch: master

Job UNSTABLE (some or all tests failed).

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: mashetty-vmf03
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml conf/ocsci/enable_huge_pages.yaml conf/ocsci/encryption_in_transit.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_delete_objects.py::TestDeleteObjects::test_delete_objects_with_expiration
Additional Test Params:
OCP VERSION: 4.18
OCS VERSION: 4.18
tested against branch: master

Job UNSTABLE (some or all tests failed).

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: mashetty-vmf03
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml conf/ocsci/enable_huge_pages.yaml conf/ocsci/encryption_in_transit.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_delete_objects.py::TestDeleteObjects::test_delete_objects_with_expiration
Additional Test Params:
OCP VERSION: 4.18
OCS VERSION: 4.18
tested against branch: master

Job PASSED.

Comment on lines +207 to +208
@polarion_id("OCS-6097")
@polarion_id("OCS-6096")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think adding multiple polarion IDs to a test function is not supported by the polarion importer plugin. Can we combine these two tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done these in the past. There are already many tests with multiple polarion ids

Copy link
Contributor

@PrasadDesala PrasadDesala Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done these in the past. There are already many tests with multiple polarion ids

There is no restriction on adding multiple polarion ids to a test func in ocs-ci. However, I doubt if the test execution status gets updated for all those polarion ids in polarion. At least, it was not supported by the polarion importer plugin the last time I checked, although that was a long time ago. Did you check if the test results were being updated for all those decorated polarion ids tests in polarion?

log.info("Deleted objects from the bucket recursively")

# Verify that all the objects are marked as deleted
verify_objs_deleted_from_objmds(bucket.name, timeout=72000, sleep=90)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20hr timeout is too much. why do we need to have a 20hr timeout?
Is there anything that we can do to have a reasonable timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the weekly syncup, the only way to go forward is with this long timeout. About 1000 objects gets expired every 1 to 2 minutes timespan. Given 1 million object, 1000000/1000=1000 * 1.5 = 1500/60=25 hours. As per my two last runs we need atleast 20 hour for all the objects to expire. Currently we dont have support to increase the object expiration batch size above 1000.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this test is going to be into scale suite. Let's get feedback from scale team on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25 hours for single test seems too much. We should find alternative way to reduce it. CC: @ramkiperiy

Copy link
Contributor

@sagihirshfeld sagihirshfeld Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mashetty330 can't we increase the object expiration batch size via a noobaa-core env-variable? https://github.com/noobaa/noobaa-core/blob/23aec8dcd195c95b121611f45ee210d280baabb8/config.js#L467

Does it result in errors/broken functionality? Or is it a matter of testing integrity compared to the default settings? If it's the latter, then I would say that sacrificing the integrity a bit to allow acceptable runtimes is an acceptable tradeoff in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sagihirshfeld So i was under the same impression that we can use that env variable to modify the batch size. But as per my discussion with Ben we cant modify to a size greater than 1000, you can only modify the size within the limit of 1000 objects. It will work if you set the value 900 but wont work for 1500.
Bug: https://issues.redhat.com/browse/DFBUGS-472

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's unfortunate - we're truly limited by MCG in this case. Then the only way to reduce the time we're waiting for expirations would be to reduce the amount of objects to the minimum acceptable amount.

Comment on lines +248 to +253
generate_empty_files(
awscli_pod_session,
dir=test_directory_setup.origin_dir,
amount=1000000,
timeout=3600,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use multi threading here, it will reduce creation time drastically

log.info("Deleted objects from the bucket recursively")

# Verify that all the objects are marked as deleted
verify_objs_deleted_from_objmds(bucket.name, timeout=72000, sleep=90)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25 hours for single test seems too much. We should find alternative way to reduce it. CC: @ramkiperiy

Comment on lines +182 to +185
# generate 1 million empty files with unique identifiers
generate_empty_files(
awscli_pod_session, dir=test_directory_setup.origin_dir, amount=1000000
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest multi threading here as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Customer defects Defects automated aspart of GSS closed loop MCG Multi Cloud Gateway / NooBaa related issues size/L PR that changes 100-499 lines Squad/Red team/e2e E2E team related issues/PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants