Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rptest] Azure CDT bringup #20749

Merged
merged 10 commits into from
Aug 2, 2024
Merged

[rptest] Azure CDT bringup #20749

merged 10 commits into from
Aug 2, 2024

Conversation

clee
Copy link
Contributor

@clee clee commented Jun 28, 2024

Bug fixes to tests and test infrastructure for running CDT on Azure. (Note: Currently includes commit from #18827, which this PR continues and builds upon.)

Backports Required

  • none - not a (product) bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

Improvements

  • replaces several instances of deprecated Cloud API usage with new Public API (because the old endpoints don't return Azure resources)
  • successfully installs BYOC Azure agent
  • allow override headers on POST requests for rpcloud_client
  • workaround for /snap/bin not showing up in $PATH (making az CLI not show up as an available command)
  • Azure support for config_profile_verify test

@clee clee requested review from savex, simonlord and rpdevmp June 28, 2024 11:49
@CLAassistant
Copy link

CLAassistant commented Jun 28, 2024

CLA assistant check
All committers have signed the CLA.

@ivotron
Copy link
Member

ivotron commented Jun 28, 2024

what I was trying to mention during our sync today: to ensure that these changes don't break HTT tests on other cloud providers, given that these are not exercised at PR time, some manual jobs can be triggered (pointing to this branch) so that they are executed.

@clee clee force-pushed the clee/PESDLC-1432 branch 3 times, most recently from 6abff40 to c4fadbe Compare July 2, 2024 10:47
@clee
Copy link
Contributor Author

clee commented Jul 2, 2024

/cdt
provider=aws

@clee
Copy link
Contributor Author

clee commented Jul 2, 2024

/cdt
provider=aws
rp_version=build

@clee
Copy link
Contributor Author

clee commented Jul 3, 2024

/cdt
provider=aws
rp_version=build

rpdevmp
rpdevmp previously approved these changes Jul 8, 2024
Copy link
Contributor

@rpdevmp rpdevmp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@clee
Copy link
Contributor Author

clee commented Jul 8, 2024

/cdt
provider=gcp
rp_version=build

@vbotbuildovich
Copy link
Collaborator

'cdt_instance_type' and 'region' is required if 'provider' is not 'aws'

Workflow run logs.

@clee clee marked this pull request as ready for review July 9, 2024 17:44
@clee
Copy link
Contributor Author

clee commented Jul 9, 2024

/cdt
provider=gcp
region=us-west-1
rp_version=build
cdt_instance_type=n2-highmem-4

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jul 9, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/51279#0190990e-2aff-46ac-ad35-6ae316956a3a:

"rptest.tests.services_self_test.KubectlLocalOnlyTest.test_is_redpanda_pod"

new failures in https://buildkite.com/redpanda/redpanda/builds/51287#0190999a-7a83-4c90-af52-5748d8545dd1:

"rptest.tests.services_self_test.KubectlLocalOnlyTest.test_is_redpanda_pod"

new failures in https://buildkite.com/redpanda/redpanda/builds/51287#019099a1-063a-4ac5-9ffa-fdcfea89efbe:

"rptest.tests.services_self_test.KubectlLocalOnlyTest.test_is_redpanda_pod"

new failures in https://buildkite.com/redpanda/redpanda/builds/51330#01909de2-a123-4152-a8e4-074c2331b304:

"rptest.tests.e2e_iam_role_test.AWSRoleFetchTests.test_write"

new failures in https://buildkite.com/redpanda/redpanda/builds/51338#01909f2a-f14c-4d26-a834-f865e353b0a5:

"rptest.tests.e2e_iam_role_test.ShortLivedCredentialsTests.test_short_lived_credentials"

new failures in https://buildkite.com/redpanda/redpanda/builds/51557#0190b970-9cb6-4a72-b795-0e40d6b949b7:

"rptest.tests.services_self_test.KubectlLocalOnlyTest.test_is_redpanda_pod"

new failures in https://buildkite.com/redpanda/redpanda/builds/51557#0190b971-ecbc-41f8-a486-57d9306b3d22:

"rptest.tests.services_self_test.KubectlLocalOnlyTest.test_is_redpanda_pod"

new failures in https://buildkite.com/redpanda/redpanda/builds/51557#0190bcdf-f17d-4ba1-91ad-4671061be6ec:

"rptest.tests.services_self_test.KubectlLocalOnlyTest.test_is_redpanda_pod"

new failures in https://buildkite.com/redpanda/redpanda/builds/51557#0190bcdf-ebfa-469d-965a-791b1607b6ce:

"rptest.tests.services_self_test.KubectlLocalOnlyTest.test_is_redpanda_pod"

new failures in https://buildkite.com/redpanda/redpanda/builds/51683#0190c24f-3171-4b90-bd13-b157d71c7bbc:

"rptest.tests.services_self_test.KubectlLocalOnlyTest.test_is_redpanda_pod"

new failures in https://buildkite.com/redpanda/redpanda/builds/51683#0190c267-ecc5-468a-8cba-9b845265128f:

"rptest.tests.services_self_test.KubectlLocalOnlyTest.test_is_redpanda_pod"

new failures in https://buildkite.com/redpanda/redpanda/builds/52018#0190e92b-7891-47d9-a4e0-fb75f653f972:

"rptest.tests.availability_test.AvailabilityTests.test_recovery_after_catastrophic_failure"

new failures in https://buildkite.com/redpanda/redpanda/builds/52018#0190ea59-4c39-4aa7-a84f-feaa9a9dae2e:

"rptest.tests.availability_test.AvailabilityTests.test_recovery_after_catastrophic_failure"

@clee
Copy link
Contributor Author

clee commented Jul 10, 2024

/cdt
provider=gcp
region=us-west1
rp_version=build
cdt_instance_type=n2-highmem-4

class KubectlLocalOnlyTest(Test):

# @cluster(num_nodes=0)
@dt_cluster(num_nodes=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that without being explicit (i.e. @cluster(num_nodes=...), each test case by default asks for ALL the available nodes in session. Two additional conditions in combination causes test failure by "Test requested X nodes, but used 0":

  • This test should not, and does not actually use any nodes.
  • ducktape is sometimes invoked with --fail-bad-cluster-utilization, depending on the context. cdt_cloud.sh does NOT use --fail-bad-cluster-utilization, but other entrypoints into our DT tests often DO.

We cannot use our custom @cluster defined within our rptest framework because it requires the test to have a redpanda attribute - which we do not want (we don't want to spin up any services nor utilize any nodes).

So we fallback to using the regular @cluster decorator, to specify that we need 0 nodes for this test.

@clee clee requested a review from travisdowns July 11, 2024 17:21
rpdevmp and others added 9 commits July 17, 2024 13:58
- replaces several instances of deprecated Cloud API usage with new
  Public API (because the old endpoints don't return Azure
  resources)
- successfully installs BYOC Azure agent
- allow override headers on POST requests for rpcloud_client
- workaround for /snap/bin not showing up in $PATH (making `az` CLI
  not show up as an available command)
- Azure support for config_profile_verify test
- Provider-specific workarounds for cases where the Azure operator has
  chosen to name things differently than the AWS and GCP operators
savex
savex previously approved these changes Jul 18, 2024
Copy link
Contributor

@savex savex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

nit: some minor pod name detection code flow is questionable, but it is the same on the cloud API anyway.


def _make_client(self):
# TODO: Work on Azure client
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, None here is a part of the flow. 'NotImplementedError' is better, though. We can improve that later

return "hardcode" # HACK

def find_vpc_peering_connection(self, state, params):
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us add this to the list of refactoring once this Azure thing will work.

resp = requests.post(f'{base_url}{endpoint}',
headers=headers,
**kwargs)
} | override_headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good.

@clee
Copy link
Contributor Author

clee commented Jul 23, 2024

/cdt
provider=gcp
region=us-west1
rp_version=build
cdt_instance_type=n2-highmem-4
tests/rptest/test_suite_cloud.yml

@clee
Copy link
Contributor Author

clee commented Jul 24, 2024

/cdt
provider=gcp
region=us-west1
rp_repo=nightly
rp_version=latest
cdt_instance_type=n2-highmem-4
tests/rptest/test_suite_quick.yml

@clee
Copy link
Contributor Author

clee commented Jul 24, 2024

/cdt
provider=aws
rp_repo=nightly
rp_version=latest
tests/rptest/test_suite_quick.yml

@clee
Copy link
Contributor Author

clee commented Jul 25, 2024

AWS CDT results:

================================================================================
| SESSION REPORT (ALL TESTS)
| ducktape version: 0.8.18
| session_id:       2024-07-24--001
| run time:         679 minutes 41.394 seconds
| tests run:        1590
| passed:           1464
| flaky:            0
| failed:           17
| ignored:          2
| opassed:          98
| ofailed:          9
| opassedfips:      0
| ofailedfips:      0
| ================================================================================

@clee clee dismissed stale reviews from savex and jackietung-redpanda via 46266eb July 25, 2024 07:31
@clee
Copy link
Contributor Author

clee commented Jul 25, 2024

/cdt
provider=gcp
region=us-west1
rp_repo=nightly
rp_version=latest
cdt_instance_type=n2-highmem-4
tests/rptest/test_suite_quick.yml

@clee
Copy link
Contributor Author

clee commented Jul 29, 2024

GCP CDT results:

================================================================================
| SESSION REPORT (ALL TESTS)
| ducktape version: 0.8.18
| session_id:       2024-07-25--001
| run time:         707 minutes 4.610 seconds
| tests run:        1590
| passed:           1462
| flaky:            0
| failed:           19
| ignored:          2
| opassed:          98
| ofailed:          9
| opassedfips:      0
| ofailedfips:      0
| ================================================================================

@clee
Copy link
Contributor Author

clee commented Aug 2, 2024

Pulling the trigger on this merge now. If any new CDT nightly failures show up, we'll deal with them as they come.

@clee clee merged commit f2cdd7d into dev Aug 2, 2024
16 of 17 checks passed
@clee clee deleted the clee/PESDLC-1432 branch August 2, 2024 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants