Skip to content

net: Introduce EVPN STP#35

Open
servolkov wants to merge 1 commit intoRedHatQE:mainfrom
servolkov:EVPN
Open

net: Introduce EVPN STP#35
servolkov wants to merge 1 commit intoRedHatQE:mainfrom
servolkov:EVPN

Conversation

@servolkov
Copy link

@servolkov servolkov commented Feb 17, 2026

STP Metadata

VEP issue: https://github.com/openshift/enhancements/blob/master/enhancements/network/ovn-kubernetes-evpn.md

What this PR does

This PR introduce EVPN STP.

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive Software Test Plan for VM BGP/EVPN integration in OpenShift/Kubernetes. Covers scope and P0 goals (stretched L2, VM live migration, source-provider migration emulation), detailed test strategy (functional, automation, performance, security, compatibility, upgrade), test environments and tooling, entry/acceptance criteria, risks and known limitations (emulation, networking constraints, IPv6/local gateway), cross-team responsibilities, traceability, QE ownership, governance, and sign-off procedures.

@coderabbitai
Copy link

coderabbitai bot commented Feb 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Introduces a new Software Test Plan (STP) for VM integration testing of BGP and EVPN with UDN in OpenShift/Kubernetes OVN-K, covering metadata, requirements, design review, test strategy, environments, scenarios, acceptance criteria, risks, limitations, and governance.

Changes

Cohort / File(s) Summary
EVPN Software Test Plan
stps/sig-network/EVPN.md
Added a 179-line STP defining metadata, owners, feature overview (stretched L2 EVPN for VMs with UDN), motivation, requirements, technology/design review, P0 scope and out-of-scope, detailed test strategy (functional, automation, perf, security, compatibility, regression, upgrade), test environment and tooling (bare-metal OCP, OVN-K Local Gateway Mode, source-provider emulation), entry/exit criteria, risks/limitations (emulation-only source provider, LG mode, IPv6 constraints, bare-metal focus), mapped test scenarios/traceability, and sign-off/approvals.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'net: Introduce EVPN STP' directly and clearly summarizes the main change: introducing a Software Test Plan (STP) for EVPN (Ethernet Virtual Private Network) in the network domain.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-virtualization-qe-bot-3

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: A tracking issue is created for this PR and will be closed when the PR is merged or closed
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified
  • Labels: Enabled categories: branch, can-be-merged, cherry-pick, has-conflicts, hold, needs-rebase, size, verified, wip

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
  • /regenerate-welcome - Regenerate this welcome message

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest all - Run all available tests

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. LGTM Count: Minimum 2 /lgtm from reviewers
  3. Status Checks: All required status checks must pass
  4. No Blockers: No WIP, hold, conflict labels
  5. Verified: PR must be marked as verified (if verification is enabled)

📊 Review Process

Approvers and Reviewers

Approvers:

  • EdDev

Reviewers:

  • Anatw
  • EdDev
  • azhivovk
  • servolkov
  • yossisegev
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is automatically removed on each new commit
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
stps/sig-network/EVPN.md (1)

17-33: Consider hyphenating compound adjectives for consistency.

Multiple instances of compound adjectives could be hyphenated for better readability:

  • Line 18: "User Defined Network" → "User-Defined Network"
  • Line 18: "OVN-K based network type" → "OVN-K-based network type"
  • Line 30: "User Defined Networks" → "User-Defined Networks"

While these are stylistic improvements, they enhance readability when the compound adjective modifies a noun.

📝 Proposed hyphenation improvements
-**Document Conventions:**
-- UDN: User Defined Network (OVN-K based network type).
+**Document Conventions:**
+- UDN: User-Defined Network (OVN-K-based network type).
-From the OCP-V perspective this feature enables OpenShift Virtualization VMs connected to primary OVN-Kubernetes 
-User Defined Networks (UDNs) to participate in an EVPN fabric.
+From the OCP-V perspective this feature enables OpenShift Virtualization VMs connected to primary OVN-Kubernetes 
+User-Defined Networks (UDNs) to participate in an EVPN fabric.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-network/EVPN.md` around lines 17 - 33, Hyphenate the compound
adjectives in EVPN.md for consistency: change "User Defined Network" to
"User-Defined Network" (both the singular instance and the plural "User Defined
Networks"), and change "OVN-K based network type" to "OVN-K-based network type";
update the occurrences in the "Document Conventions" section and the "Feature
Overview" paragraph where these phrases appear (search for the exact strings
"User Defined Network", "User Defined Networks", and "OVN-K based network type"
to locate and replace).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stps/sig-network/EVPN.md`:
- Around line 141-142: Replace the typos in the Risks table entries: change
"collobaration" to "collaboration" in the Timeline/Schedule row and change
"single-stuck" to "single-stack" in the Test Coverage row so terminology matches
the rest of the document (see the "Timeline/Schedule" and "Test Coverage" table
cells in the shown diff).
- Line 97: Fix the typo in the table cell for the "Cloud Testing" row: change
"Bare Metal environment is requried." to "Bare Metal environment is required." —
update the EVPN.md markdown content where the Cloud Testing table row text
contains "requried".
- Line 114: Update the typo in EVPN.md by replacing "infrustructure" with the
correct spelling "infrastructure" in the table row containing "**Special
Configurations** | Source Provider Virtualization Platform | Should be
configured next to CNV QE bare metal infrustructure"; ensure the table cell text
is updated to "Should be configured next to CNV QE bare metal infrastructure".
- Line 74: The table row under the "IPv6 support" entry contains a typo
"single-stuck" — update that text to "single-stack" so it matches the usage
elsewhere (e.g., the other "IPv6 single-stack" occurrence). Edit the Markdown
line that currently reads "| IPv6 support                     | IPv6
single-stuck support is not ready due to
https://github.com/FRRouting/frr/issues/5885.                                   
| [x] phoracek@redhat.com (02/2026) |" and replace "single-stuck" with
"single-stack".

---

Nitpick comments:
In `@stps/sig-network/EVPN.md`:
- Around line 17-33: Hyphenate the compound adjectives in EVPN.md for
consistency: change "User Defined Network" to "User-Defined Network" (both the
singular instance and the plural "User Defined Networks"), and change "OVN-K
based network type" to "OVN-K-based network type"; update the occurrences in the
"Document Conventions" section and the "Feature Overview" paragraph where these
phrases appear (search for the exact strings "User Defined Network", "User
Defined Networks", and "OVN-K based network type" to locate and replace).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@stps/sig-network/EVPN.md`:
- Line 114: Fix the misspelling in the markdown table cell: update the string
"Should be configured next to CNV QE bare metal infrustructure" to "Should be
configured next to CNV QE bare metal infrastructure" (look for the table row
containing the "Special Configurations" column or that exact sentence in
EVPN.md).
- Line 74: Fix the typo in the EVPN.md table row where "IPv6 support" currently
reads "IPv6 single-stuck support"; change "single-stuck" to "single-stack" so
the entry matches the correct terminology used elsewhere (see the "IPv6 support"
table row and the phrase "single-stuck" to locate and update the text).
- Line 97: The markdown table row for "Cloud Testing" contains a typo: change
the word "requried" to "required" in the cell that currently reads "Bare Metal
environment is requried." Locate the "Cloud Testing" row in EVPN.md and update
that cell text to "Bare Metal environment is required." Ensure only the spelling
is corrected and formatting of the table remains unchanged.
- Around line 141-142: In the EVPN.md table rows under the "Timeline/Schedule"
and "Test Coverage" entries, fix the typos: change "collobaration" to
"collaboration" and change "single-stuck" to "single-stack" (to match the usage
elsewhere, e.g., line referencing "single-stack"); update those exact strings in
the text so the table reads correctly.

@servolkov
Copy link
Author

Change: fix typos

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
stps/sig-network/EVPN.md (2)

116-122: Verify if Testing Tools section should remain empty.

All tool categories are currently empty. If specific tools, frameworks, or CI/CD pipelines will be used for testing, consider documenting them here for completeness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-network/EVPN.md` around lines 116 - 122, The "3.1. Testing Tools &
Frameworks" table is empty; either populate it with the actual tools or mark it
explicitly as TBD/None. Update the "3.1. Testing Tools & Frameworks" section to
list the chosen Test Framework(s) (e.g., pytest, Robot Framework), CI/CD
pipeline(s) (e.g., GitHub Actions, Jenkins), and any Other Tools (e.g., tox,
Molecule, linter, traffic generators), or replace the "-" entries with "TBD" or
"None" and a short note explaining that tooling will be decided later so readers
aren’t left with blank fields.

164-164: Consider using consistent terminology.

Line 44 uses "network configuration" while this line uses "network data". Consider using "network configuration" consistently for clarity.

♻️ Proposed refinement
-|                  | As an admin, I want to migrate a VM from the Source Provider to OCP keeping its network data                   | Verify migration from Source Provider to OCP over EVPN tunnel with preserving connectivity and VM network data             | Tier 2 | P0       |
+|                  | As an admin, I want to migrate a VM from the Source Provider to OCP keeping its network configuration          | Verify migration from Source Provider to OCP over EVPN tunnel with preserving connectivity and VM network configuration     | Tier 2 | P0       |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-network/EVPN.md` at line 164, The term "network data" in the EVPN.md
table row (the cell reading "Verify migration from Source Provider to OCP over
EVPN tunnel with preserving connectivity and VM network data") should be changed
to "network configuration" to match the terminology used earlier ("network
configuration"); update that table cell text so it reads "preserving
connectivity and VM network configuration" to maintain consistent wording across
the document.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stps/sig-network/EVPN.md`:
- Line 56: The notation "CUDN\UDN" in the EVPN.md API Extensions table uses a
backslash; update it to use a forward slash ("CUDN/UDN") to match the Document
Conventions and the earlier "CUDN/UDN resource" wording. Locate the table entry
containing "CUDN\UDN" (the API Extensions row) and replace the backslash with a
forward slash so the term is consistently "CUDN/UDN" throughout the document.
- Line 44: Update the grammar in the EVPN.md table row that currently reads
"User migrate a VM from Source Provider to OpenShift over the stretched L2 EVPN
network preserving source VM network configuration." — change it to "A user
migrates a VM from the Source Provider to OpenShift over the stretched L2 EVPN
network, preserving the source VM's network configuration." Ensure you update
the phrase in the table cell text (the line containing "User migrate a VM...")
and adjust articles/possessive ("the Source Provider", "source VM's") and add
the comma before "preserving".

---

Nitpick comments:
In `@stps/sig-network/EVPN.md`:
- Around line 116-122: The "3.1. Testing Tools & Frameworks" table is empty;
either populate it with the actual tools or mark it explicitly as TBD/None.
Update the "3.1. Testing Tools & Frameworks" section to list the chosen Test
Framework(s) (e.g., pytest, Robot Framework), CI/CD pipeline(s) (e.g., GitHub
Actions, Jenkins), and any Other Tools (e.g., tox, Molecule, linter, traffic
generators), or replace the "-" entries with "TBD" or "None" and a short note
explaining that tooling will be decided later so readers aren’t left with blank
fields.
- Line 164: The term "network data" in the EVPN.md table row (the cell reading
"Verify migration from Source Provider to OCP over EVPN tunnel with preserving
connectivity and VM network data") should be changed to "network configuration"
to match the terminology used earlier ("network configuration"); update that
table cell text so it reads "preserving connectivity and VM network
configuration" to maintain consistent wording across the document.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
stps/sig-network/EVPN.md (2)

18-19: Standardize compound-adjective hyphenation for readability.

Several compound adjectives appear without hyphens (e.g., “OVN-K based,” “pod level,” “Bare Metal”). Consider normalizing to “OVN-K-based,” “pod-level,” “bare-metal,” etc., for consistency in technical documentation.

Also applies to: 30-32, 55-56, 76-77, 97-97, 105-106, 111-111, 114-114

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-network/EVPN.md` around lines 18 - 19, The document uses
unhyphenated compound adjectives; update instances such as "OVN-K based" to
"OVN-K-based" and normalize other compound adjectives (e.g., "pod level" →
"pod-level", "Bare Metal" → "bare-metal", "cluster level" → "cluster-level",
"project level" → "project-level") throughout the EVPN.md content (notably the
lines around the UDN and CUDN/UDN resource descriptions and the other indicated
ranges) so all compound modifiers are consistently hyphenated.

46-47: Align acceptance criteria with the stated P0 goals.

Testing goals include “Source Provider Migration,” but acceptance criteria only list two items. Either add a third criterion that explicitly marks migration as deferred until the Source Provider environment exists, or note the deferral in the acceptance criteria list for clarity.

✏️ Possible adjustment
- | **Acceptance Criteria**                | [x]  | 1. Connectivity: UDN VM can ping Source Provider endpoint on same subnet.<br/>2. Mobility: Connection survives UDN VM Live Migration.                                                                                                                    |          |
+ | **Acceptance Criteria**                | [x]  | 1. Connectivity: UDN VM can ping Source Provider endpoint on same subnet.<br/>2. Mobility: Connection survives UDN VM Live Migration.<br/>3. Migration: Source Provider → OCP migration validated once Source Provider environment is available (deferred until GA). |          |

Also applies to: 65-68

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-network/EVPN.md` around lines 46 - 47, Update the "Acceptance
Criteria" table in EVPN.md to align with the P0 goals by either adding a third,
explicit migration criterion (e.g., "3. Source Provider Migration: Connection
survives Source Provider Live Migration" but conditioned on the Source Provider
environment being available) or by adding a clear note alongside the existing
two criteria that migration testing is deferred until the Source Provider
environment exists; make the same change for the second Acceptance Criteria
block referenced (the similar block at lines ~65-68) so both instances
consistently state the migration requirement or its deferral.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stps/sig-network/EVPN.md`:
- Line 152: Change the grammar of the known-limitation sentence in EVPN.md from
"Single-Stack IPv6 are not supported" to the singular form "Single-Stack IPv6 is
not supported," leaving the rest of the sentence intact ("due to upstream Linux
Kernel limitations regarding IPv6-based VXLAN tunnels, the EVPN VTEP (Underlay)
requires IPv4") so the statement reads correctly.

---

Nitpick comments:
In `@stps/sig-network/EVPN.md`:
- Around line 18-19: The document uses unhyphenated compound adjectives; update
instances such as "OVN-K based" to "OVN-K-based" and normalize other compound
adjectives (e.g., "pod level" → "pod-level", "Bare Metal" → "bare-metal",
"cluster level" → "cluster-level", "project level" → "project-level") throughout
the EVPN.md content (notably the lines around the UDN and CUDN/UDN resource
descriptions and the other indicated ranges) so all compound modifiers are
consistently hyphenated.
- Around line 46-47: Update the "Acceptance Criteria" table in EVPN.md to align
with the P0 goals by either adding a third, explicit migration criterion (e.g.,
"3. Source Provider Migration: Connection survives Source Provider Live
Migration" but conditioned on the Source Provider environment being available)
or by adding a clear note alongside the existing two criteria that migration
testing is deferred until the Source Provider environment exists; make the same
change for the second Acceptance Criteria block referenced (the similar block at
lines ~65-68) so both instances consistently state the migration requirement or
its deferral.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
stps/sig-network/EVPN.md (1)

46-46: Consider aligning Acceptance Criteria with Testing Goals.

The Acceptance Criteria lists only 2 items (Connectivity and Mobility), while Testing Goals (lines 65-67) include a third P0 goal: "Source Provider Migration". Although line 67 notes this scenario is not currently testable, consider either adding it to the Acceptance Criteria with a note about testability constraints, or explicitly stating in the Acceptance Criteria that Source Provider Migration is excluded due to environment limitations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-network/EVPN.md` at line 46, The Acceptance Criteria table currently
lists only "Connectivity" and "Mobility" but Testing Goals includes a separate
P0 "Source Provider Migration" scenario; update the Acceptance Criteria (the
table under "Acceptance Criteria") to either add "Source Provider Migration" as
an item with a note about current testability constraints, or add a clarifying
sentence to the Acceptance Criteria stating that "Source Provider Migration" is
excluded due to environment limitations referenced in the "Testing Goals"
section (see "Testing Goals" and the "Source Provider Migration" mention) so
both sections remain aligned.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@stps/sig-network/EVPN.md`:
- Line 44: Replace the grammatically incorrect table cell text ("A user migrates
a VM from Source Provider to OpenShift over the stretched L2 EVPN network
preserving source VM network configuration.") so it uses correct subject-verb
agreement and includes the article as suggested; update the row under the
"**Customer Use Cases**" column to read exactly "A user migrates a VM from
Source Provider to OpenShift over the stretched L2 EVPN network, preserving the
source VM's network configuration." to fix agreement and clarity.

---

Nitpick comments:
In `@stps/sig-network/EVPN.md`:
- Line 46: The Acceptance Criteria table currently lists only "Connectivity" and
"Mobility" but Testing Goals includes a separate P0 "Source Provider Migration"
scenario; update the Acceptance Criteria (the table under "Acceptance Criteria")
to either add "Source Provider Migration" as an item with a note about current
testability constraints, or add a clarifying sentence to the Acceptance Criteria
stating that "Source Provider Migration" is excluded due to environment
limitations referenced in the "Testing Goals" section (see "Testing Goals" and
the "Source Provider Migration" mention) so both sections remain aligned.

@servolkov
Copy link
Author

/assign-reviewer @maiqueb @anuragthehatter

@openshift-virtualization-qe-bot-5

not adding reviewer maiqueb @anuragthehatter by user comment, maiqueb @anuragthehatter is not part of contributers

@servolkov
Copy link
Author

@maiqueb, @anuragthehatter I can't add you as reviewers, pls, review.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test connectivity between betwen workloads on those OCP nodes (east/west)

Also are we going to also check connections are not broken ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test connectivity between between workloads on those OCP nodes (east/west)

ack

Also are we going to also check connections are not broken ?

could you elaborate on what do you mean?


- Lack of Physical Source Provider (Emulation Only): we do not have a physical Source Provider, for connectivity checks the emulation is only possible, the migration from Source Provider to OCP cannot be tested until the Source Provider environment is ready.
- Local Gateway Mode Only: EVPN is not supported on clusters using Shared Gateway Mode.
- Single-Stack IPv6 is not supported: due to upstream Linux Kernel limitations regarding IPv6-based VXLAN tunnels, the EVPN VTEP (Underlay) requires IPv4.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

singlestack IPv6 is already at non goals we do not need to add it here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as some other items in this paragraph :) Lack of Physical Source Provider and Local Gateway Mode Only have already mentioned above as well (directly or indirectly). I am just following a template, in fact, this is limitations, isn't it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I don't see a non-goals section. Where's that @qinqon ?

And TBH, I don't think how it can be: the OpenShift enhancement clearly lists IPv6 underlays in the test plan:

An scenario where IPv4 overlay is advertised on an IPv6 underlay, and vice versa.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the "out of scope" section we have "IPv6 support"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And TBH, I don't think how it can be: the OpenShift enhancement clearly lists IPv6 underlays in the test plan

but it does not work based on what I found (you can find the link in the STP as well): FRRouting/frr#5885

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goal is marked P0 (blocking GA) but explicitly states it is "not testable at this stage". This is contradictory — if it blocks GA but cannot be tested, what is the plan?

Should this be:

  1. Downgraded to P1/P2 until the Source Provider environment is available?
  2. Kept as P0 with a clear timeline/milestone for when it becomes testable?
  3. Moved entirely to a follow-up STP once the environment is ready?

The Entry Criteria (section 4) also notes this environment won't be available until GA, which reinforces the conflict.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue was following me within STP composing. For migration we need a special environment, but we can't test it now since we don't have it. As well as for other tests we need some emulation apart the special source provider environment and I need to get it in all together in one STP. Why in one STP? Because we create a general STP for the feature, that is how I understand.

Why P0 for the Source Provider Migration? it is a crucial part we must test in OCP-V, and it lives together with the fact it is not achievable now. There are no estimates when we will have such an environment.

@rnetser I need your help here, I can't say I am solid with my rationale, I totally understand @qinqon confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the confuusion

There are no estimates when we will have such an environment.

If this is the case, this should be listed under limitations and signed off by PM

If you think the feature cannot be GA-ed without these tests then this must be raised asap and a mitigation plan needs to be defined (if there is a feasible one). f there is no mitigation plan, you need to raise that under risks and get it approved by PM.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case, this should be listed under limitations and signed off by PM

it is there, the limitation is known.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the N/A means that corenet is going to test it ? maybe is better if we mention that instead of N/A.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not read about any security testing plans for this feature. Probably, @anuragthehatter can correct me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security can be tested with pods, I am not sure if corenet is testing but clearly we don't need VMs for that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a bit more clarification to the commentary.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to add here the external EVPN emulation things ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean just basic words that we will use some emulation? The problem that I have blurry understanding what will be used, this is the next step and implementation detail. WDYT?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll at least need an EVPN aware router, right ?

let's "choose" one. For instance, FRR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FRR is an implementation detail. But lets say generally "EVPN aware external router". Updated.

@servolkov
Copy link
Author

Change: addressed comments

Copy link

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts.

I think we need to align on the feature's scope: from my perspective, both MAC & MAC+IP VRFs are relevant, to allow a migrated VM to still access services on other networks - not only its L2 networks (which is now stretched across clusters).

Comment on lines 31 to 32
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also want to have routed ingress over VPN, plus reach external networks over VPN (IP+MAC VRFs).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initially, IP-VRF was not considered as a testing goal. Since we decided to have it covered as well, I have fixed STP to reflect this. Take a look.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way I see it, customers need access to the same networks they had in the src cluster pre migration.

Meaning, they need to be able to access the stretched L2 (as you describe), but also to have a way to access these same "old" networks (i.e. different subnets) over VPN. Both ways (the rest of the VMs running in the src cluster also need to access the newly migrated into OpenShift VMs).

I don't know how to phrase this in a single sentence to be honest ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are more:

  • routed ingress over VPN
  • access to external networks (different subnet) over VPN

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add VMs can ping each other over the L3 VPN (different networks), which, it turn, adds a new mobility requirement: connection established between these different networks survives UDN VM inner-cluster Live Migration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added


- Lack of Physical Source Provider (Emulation Only): we do not have a physical Source Provider, for connectivity checks the emulation is only possible, the migration from Source Provider to OCP cannot be tested until the Source Provider environment is ready.
- Local Gateway Mode Only: EVPN is not supported on clusters using Shared Gateway Mode.
- Single-Stack IPv6 is not supported: due to upstream Linux Kernel limitations regarding IPv6-based VXLAN tunnels, the EVPN VTEP (Underlay) requires IPv4.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I don't see a non-goals section. Where's that @qinqon ?

And TBH, I don't think how it can be: the OpenShift enhancement clearly lists IPv6 underlays in the test plan:

An scenario where IPv4 overlay is advertised on an IPv6 underlay, and vice versa.

- Lack of Physical Source Provider (Emulation Only): we do not have a physical Source Provider, for connectivity checks the emulation is only possible, the migration from Source Provider to OCP cannot be tested until the Source Provider environment is ready.
- Local Gateway Mode Only: EVPN is not supported on clusters using Shared Gateway Mode.
- Single-Stack IPv6 is not supported: due to upstream Linux Kernel limitations regarding IPv6-based VXLAN tunnels, the EVPN VTEP (Underlay) requires IPv4.
- MAC-VRF (Layer 2) Testing Only: this test plan strictly covers MAC-VRF (Stretched Layer 2) scenarios. IP-VRF (Layer 3) routing is not tested because the migration use case relies on extending the Layer 2 network.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree; migration is not the only user story here.

Let's groom the epic further.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we discussed it offline, the STP has been updated to cover L3 as well, considering it P1 priority.

|:-----------------|:---------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------|:-------|:---------|
| CNV-63123 (epic) | As a user, I want my VMs on different nodes to communicate with each other over the EVPN fabric (East-West). | Verify that two UDN VMs connected to the same EVPN network can communicate with each other across different OCP nodes. | Tier 2 | P0 |
| | As a user, I want my OCP VM to communicate with the Source Provider network on the same subnet (Stretched L2). | Verify a VM can communicate with the Source Provider on the same subnet via the EVPN fabric. | Tier 2 | P0 |
| | As an admin, I want to live-migrate an EVPN-connected VM between OCP nodes without losing connectivity. | Verify that a VM connected via EVPN can live-migrate between OCP nodes without losing connectivity to the Source Provider. | Tier 2 | P0 |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remind me: a user can't migrate vms, right ? that's why this user story is from the admin's perspective.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| | As an admin, I want to migrate a VM from the Source Provider to OCP keeping its network data. | Verify migration from Source Provider to OCP over EVPN tunnel with preserving connectivity and VM network data. | Tier 2 | P0 |
| | As an admin, I want to migrate a VM from the Source Provider to OCP keeping its network data. | Verify migration from Source Provider to OCP over VXLAN tunnel with preserving connectivity and VM network data. | Tier 2 | P0 |

Alternatively, you can omit the "tunnel" part, and just say you want to verify migration from Source Provider to OCP over the VPN ...

I think that's more accurate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm ... but we also want existing connections to survive the multiple migrations right ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I follow. could you elaborate on? (reminder, we are inside upgrade scenario context).

Copy link

@Anatw Anatw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.
I pasted suggestions for additional test cases to consider.


- **[P0]** Basic Connectivity (East-West): Verify two UDN VMs connected to the same EVPN network can communicate with each other.
- **[P0]** Stretched L2 Connectivity (East-West): Verify a UDN VM can communicate with the Source Provider on the same subnet via the EVPN fabric.
- **[P0]** UDN VM Live Migration (Internal): Verify that a UDN VM connected via EVPN can live-migrate between OCP nodes without losing connectivity to the Source Provider.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also include a fast recovery (Cold Reboot) scenario. While the migration scenario tests VM movement in a 'hot' manner, we should also consider validating that connectivity is restored immediately once a VM is restarted from scratch (after the VMI and virt-launcher pod are deleted and recreated). How about an additional test to verify that a client can establish a new TCP connection to the started VM as soon as the VM is 'Ready'.

- **[P0]** Basic Connectivity (East-West): Verify two UDN VMs connected to the same EVPN network can communicate with each other.
- **[P0]** Stretched L2 Connectivity (East-West): Verify a UDN VM can communicate with the Source Provider on the same subnet via the EVPN fabric.
- **[P0]** UDN VM Live Migration (Internal): Verify that a UDN VM connected via EVPN can live-migrate between OCP nodes without losing connectivity to the Source Provider.
- **[P0]** Source Provider Migration: Verify migration from Source Provider to OCP UDN namespace over EVPN tunnel. Note: Since we lack a physical Source Provider environment, the scenario is not testable at this stage.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the feature correctly, I think that we should also include a specific ingress scenario for EVPN. While we currently have an ingress test for plain BGP, EVPN introduces a different data plane with the VXLAN encapsulation. We need to verify that the external fabric can resolve the VM's MAC address (via BGP Type-2 routes) and that the node correctly decapsulates incoming VXLAN traffic to reach the VM.

- **[P0]** Stretched L2 Connectivity (East-West): Verify a UDN VM can communicate with the Source Provider on the same subnet via the EVPN fabric.
- **[P0]** UDN VM Live Migration (Internal): Verify that a UDN VM connected via EVPN can live-migrate between OCP nodes without losing connectivity to the Source Provider.
- **[P0]** Source Provider Migration: Verify migration from Source Provider to OCP UDN namespace over EVPN tunnel. Note: Since we lack a physical Source Provider environment, the scenario is not testable at this stage.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some additional scenarios under 'Requirement Summary' that should also appear here.

Signed-off-by: Sergei Volkov <sevolkov@redhat.com>
@servolkov
Copy link
Author

Change:

  1. added L3 coverage
  2. addressed comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comments