-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSS] Test object expiration with millions of objects #10153
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mashetty330 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: mashetty-vma213
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_object_expiration_scale.py
Additional Test Params:
OCP VERSION: 4.15
OCS VERSION: 4.15
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: mashetty-vma213
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_object_expiration_scale.py
Additional Test Params:
OCP VERSION: 4.15
OCS VERSION: 4.15
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: mashetty-vma213
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_object_expiration_scale.py
Additional Test Params:
OCP VERSION: 4.15
OCS VERSION: 4.15
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: mashetty-vma213
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_object_expiration_scale.py
Additional Test Params:
OCP VERSION: 4.15
OCS VERSION: 4.15
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: mashetty-vm21
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_object_expiration_scale.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: mashetty-vmf03
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml conf/ocsci/enable_huge_pages.yaml conf/ocsci/encryption_in_transit.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_delete_objects.py::TestDeleteObjects:: test_delete_objects_with_expiration
Additional Test Params:
OCP VERSION: 4.18
OCS VERSION: 4.18
tested against branch: master
Job FAILED (installation failed, tests not executed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: mashetty-vmf03
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml conf/ocsci/enable_huge_pages.yaml conf/ocsci/encryption_in_transit.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_delete_objects.py::TestDeleteObjects::test_delete_objects_with_expiration
Additional Test Params:
OCP VERSION: 4.18
OCS VERSION: 4.18
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: mashetty-vmf03
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml conf/ocsci/enable_huge_pages.yaml conf/ocsci/encryption_in_transit.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_delete_objects.py::TestDeleteObjects::test_delete_objects_with_expiration
Additional Test Params:
OCP VERSION: 4.18
OCS VERSION: 4.18
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: mashetty-vmf03
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml conf/ocsci/enable_huge_pages.yaml conf/ocsci/encryption_in_transit.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_delete_objects.py::TestDeleteObjects::test_delete_objects_with_expiration
Additional Test Params:
OCP VERSION: 4.18
OCS VERSION: 4.18
tested against branch: master
Job UNSTABLE (some or all tests failed).
Signed-off-by: Mahesh Shetty <[email protected]>
Signed-off-by: Mahesh Shetty <[email protected]>
Signed-off-by: Mahesh Shetty <[email protected]>
Signed-off-by: Mahesh Shetty <[email protected]>
Signed-off-by: Mahesh Shetty <[email protected]>
Signed-off-by: Mahesh Shetty <[email protected]>
Signed-off-by: Mahesh Shetty <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: mashetty-vmf03
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml conf/ocsci/enable_huge_pages.yaml conf/ocsci/encryption_in_transit.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_delete_objects.py::TestDeleteObjects::test_delete_objects_with_expiration
Additional Test Params:
OCP VERSION: 4.18
OCS VERSION: 4.18
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: mashetty-vmf03
Cluster Configuration: conf/deployment/vsphere/ipi_1az_rhcos_vsan_3m_3w.yaml conf/ocsci/enable_huge_pages.yaml conf/ocsci/encryption_in_transit.yaml
PR Test Suite:
PR Test Path: tests/cross_functional/scale/noobaa/test_delete_objects.py::TestDeleteObjects::test_delete_objects_with_expiration
Additional Test Params:
OCP VERSION: 4.18
OCS VERSION: 4.18
tested against branch: master
@polarion_id("OCS-6097") | ||
@polarion_id("OCS-6096") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think adding multiple polarion IDs to a test function is not supported by the polarion importer plugin. Can we combine these two tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done these in the past. There are already many tests with multiple polarion ids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done these in the past. There are already many tests with multiple polarion ids
There is no restriction on adding multiple polarion ids to a test func in ocs-ci. However, I doubt if the test execution status gets updated for all those polarion ids in polarion. At least, it was not supported by the polarion importer plugin the last time I checked, although that was a long time ago. Did you check if the test results were being updated for all those decorated polarion ids tests in polarion?
log.info("Deleted objects from the bucket recursively") | ||
|
||
# Verify that all the objects are marked as deleted | ||
verify_objs_deleted_from_objmds(bucket.name, timeout=72000, sleep=90) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20hr timeout is too much. why do we need to have a 20hr timeout?
Is there anything that we can do to have a reasonable timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the weekly syncup, the only way to go forward is with this long timeout. About 1000 objects gets expired every 1 to 2 minutes timespan. Given 1 million object, 1000000/1000=1000 * 1.5 = 1500/60=25 hours. As per my two last runs we need atleast 20 hour for all the objects to expire. Currently we dont have support to increase the object expiration batch size above 1000.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this test is going to be into scale suite. Let's get feedback from scale team on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
25 hours for single test seems too much. We should find alternative way to reduce it. CC: @ramkiperiy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mashetty330 can't we increase the object expiration batch size via a noobaa-core env-variable? https://github.com/noobaa/noobaa-core/blob/23aec8dcd195c95b121611f45ee210d280baabb8/config.js#L467
Does it result in errors/broken functionality? Or is it a matter of testing integrity compared to the default settings? If it's the latter, then I would say that sacrificing the integrity a bit to allow acceptable runtimes is an acceptable tradeoff in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sagihirshfeld So i was under the same impression that we can use that env variable to modify the batch size. But as per my discussion with Ben we cant modify to a size greater than 1000, you can only modify the size within the limit of 1000 objects. It will work if you set the value 900 but wont work for 1500.
Bug: https://issues.redhat.com/browse/DFBUGS-472
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's unfortunate - we're truly limited by MCG in this case. Then the only way to reduce the time we're waiting for expirations would be to reduce the amount of objects to the minimum acceptable amount.
generate_empty_files( | ||
awscli_pod_session, | ||
dir=test_directory_setup.origin_dir, | ||
amount=1000000, | ||
timeout=3600, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use multi threading here, it will reduce creation time drastically
log.info("Deleted objects from the bucket recursively") | ||
|
||
# Verify that all the objects are marked as deleted | ||
verify_objs_deleted_from_objmds(bucket.name, timeout=72000, sleep=90) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
25 hours for single test seems too much. We should find alternative way to reduce it. CC: @ramkiperiy
# generate 1 million empty files with unique identifiers | ||
generate_empty_files( | ||
awscli_pod_session, dir=test_directory_setup.origin_dir, amount=1000000 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest multi threading here as well
This PR Addresses: