Port Address Translation (stateless) #1137

qmonnet · 2025-12-13T02:03:31Z

Stateless Port and Address Translation (PAT) for the dataplane.

Best reviewed per-commit, but hold on tight, this PR is dense.

Follow-ups include:

Obviously, passing the port ranges from the user config.
Adding port ranges support in stateful NAT (we already support NAT-ing ports in stateful NAT, the update should include allocating only from allowed target ranges, dropping packets for packets not coming from allowed ranges).
ICMP. How do we do ICMP with PAT?? Still to be determined.
Probably some more optimisations or code clarifications here and there.

qmonnet · 2025-12-17T00:23:18Z

I think it's ready for a first pass

Copilot

Pull request overview

This PR implements stateless Port Address Translation (PAT) for the dataplane, adding support for port ranges in NAT operations. The implementation extends the existing prefix-based NAT system to handle combinations of IP prefixes and port ranges, enabling fine-grained control over which ports are translated.

Key Changes:

Introduced PrefixWithOptionalPorts and PortRange types to represent IP prefixes with associated port ranges
Extended NAT tables to support both address-only translation (NAT) and port+address translation (PAT)
Updated lookup and translation logic throughout the dataplane to consider port ranges

Reviewed changes

Copilot reviewed 27 out of 28 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`lpm/src/prefix/with_ports.rs`	New module defining port range types and IP prefix/port combinations
`lpm/src/prefix/range_map.rs`	New `DisjointRangesBTreeMap` for efficient range-based lookups
`nat/src/stateless/setup/tables.rs`	Extended NAT tables with PAT support, including `PortAddrTranslationValue`
`nat/src/stateless/setup/range_builder.rs`	Updated range builder to handle port ranges in offset calculations
`nat/src/stateless/mod.rs`	Added port translation methods and integrated port lookup
`pkt-meta/src/dst_vpcd_lookup/mod.rs`	Updated destination VPC lookup to consider port ranges
`config/src/utils/collapse.rs`	Simplified prefix collapsing to use new port range subtraction logic
`config/src/external/overlay/vpcpeering.rs`	Updated VpcExpose validation for port-aware prefix overlapping
`nat/src/stateless/test.rs`	Added comprehensive tests for PAT functionality
`lpm/src/trie/*.rs`	Added `matching_entries` method for finding all matching prefixes

lpm/src/prefix/with_ports.rs

mvachhar · 2025-12-17T15:40:42Z

lpm/src/prefix/with_ports.rs

+    /// If the port range is empty, this method should return the number of addresses in the address
+    /// range: so this will produce a different value than an object with the same IP range, but
+    /// with a port range covering all existing ports.
+    fn size(&self) -> PrefixWithPortsSize {


Why make this semantic distinction? is it just to allow compatibility between an ip range and an ip range with ports?

In other words, if this trait is called IpRangeWithPorts, then should it always count (Ip, port) pairs available, so no port ranges means the size is the same as all port ranges?

If we what this to mean something like IpRangeOrIpRangeWithPorts (terrible name, we should choose something shorter), then the better way to say this is that it returns the available size of items. For an IpRange that would be the number of IPs for an IPPortRange, that would be a different thing entirely.

Perhaps a discussion on the correct design here is in order.

The comment is in fact outdated. I had a previous version of the code that made a distinction, but it's been changed a while ago. I'll remove the comment.

Comment removed

mvachhar · 2025-12-17T15:41:52Z

lpm/src/prefix/with_ports.rs

@@ -0,0 +1,299 @@
+// SPDX-License-Identifier: Apache-2.0


Where are the tests for this new functionality? I prefer fuzz tests if they are easy, but there should at least be some tests right?

I'm not sure I see what functionality you refer to, in particular. There are some unit tests in follow-up commits after the introduction of these structs and methods.

Fuzzing is tricky, most of the interesting methods to fuzz require re-implementing the logic in the fuzzer to validate correctness. I don't have any fuzzing tests in this PR, we could look into it but it would probably be for a follow-up.

mvachhar · 2025-12-17T15:52:26Z

pkt-meta/src/dst_vpcd_lookup/mod.rs

+#[derive(Debug, Clone)]
+enum VpcDiscriminantTableValue {
+    PortRangeTable(DisjointRangesBTreeMap<PortRange, Option<VpcDiscriminant>>),
+    NoPorts(Option<VpcDiscriminant>),


I think I like this enumeration approach better for the IpRangeWithPorts trait as well instead of using an empty port range to mean IPs only. It might clean up my earlier comment about the size method as well.

Also, does the order here matter? Do we need a BTreeMap or would a HashMap suffice? For small overlaps though BTreeMap might be better. Just thought I'd point out the tradeoff. What do we expect for size here if people are mapping port ranges from one ISP provided IP to many internal hosts?

I think I like this enumeration approach better for the IpRangeWithPorts trait as well instead of using an empty port range to mean IPs only. It might clean up my earlier comment about the size method as well.

I'll see if I can improve it.

Also, does the order here matter? Do we need a BTreeMap or would a HashMap suffice? For small overlaps though BTreeMap might be better. Just thought I'd point out the tradeoff.

I don't think a HashMap would do. For a given IP address, I don't know how to find the matching prefix used as a key in the HashMap without doing a linear search on the entries; for an ordered BTreeMap, I can do that.

What do we expect for size here if people are mapping port ranges from one ISP provided IP to many internal hosts?

One BTreeMap per port range, with the associated IPs. You need to store these mappings one way or another anyway, unless you find a way to “compress” it the difference is the overhead related to the data structure in use, I think

I added a commit at the end to use an enum for PrefixWithOptionalPorts, I think this is what you meant (but I'm not 100% sure). Please take a look at it.

refactor(lpm): Use enum for PrefixWithOptionalPorts

config/src/utils/collapse.rs

mvachhar · 2025-12-17T15:57:36Z

lpm/src/prefix/with_ports.rs

+
+#[cfg(test)]
+mod tests {
+    use super::*;


Ok, so now we have some tests, but still missing the basic tests for overlap.

What tests are you referring to, exactly?

As discussed during our call, I'll add some tests to see how excluding a range of ports for the full range of IPs behaves. Does this address the comment?

I added commit test(config): Add tests for VpcExpose validation with port ranges, I think it should address your concern (except for fuzzing).

nat/src/stateless/setup/range_builder.rs

lpm/src/trie/prefix_map_impl.rs

mvachhar · 2025-12-17T16:05:49Z

Great initial cut, but I think it still needs some work.

In preparation for Port Address Translation in stateless NAT, add new structs alongside the Prefix defined in crate lpm. Namely, we add: - PortRange: a port range - PrefixWithPorts: a Prefix, with an associated PortRange - PrefixWithOptionalPorts: a Prefix, with an optional PortRange - trait IpRangeWithPorts: a trait for common methods for structs implementing ranges of IPs (such as a Prefix) and ports. It makes the interface for such structs more consistent, and avoids duplicating the size() method. These structs will be used in a follow-up commit. Signed-off-by: Quentin Monnet <[email protected]>

Prepare for Port Address Translation in NAT. This commit is based on a simple change. The following struct: pub struct VpcExpose { pub ips: BTreeSet<Prefix>, pub nots: BTreeSet<Prefix>, pub nat: Option<VpcExposeNat>, } becomes: pub struct VpcExpose { pub ips: BTreeSet<PrefixWithOptionalPorts>, pub nots: BTreeSet<PrefixWithOptionalPorts>, pub nat: Option<VpcExposeNat>, } In other words, we attach an optional port range to each prefix in the various prefix lists of a VpcExpose object. Such objects are used in several locations in the code, and as a consequence, we need to update all of these. This includes: - The gRPC/K8s conversion methods in crate config - The validation steps for VpcExpose objects in crate config - The "prefix collapse" utilities in crate config - Building route advertisement in crate mgmt - The stateful NAT allocator in crate nat - The stateless NAT context build in crate nat - The destination VPC lookup stage in crate pkt-meta WARNING: These locations get trivial updates to make the code compile. The surrounding logic is not updated to account for port ranges in this commit. It will be updated, where relevant, in follow-up commits. Signed-off-by: Quentin Monnet <[email protected]>

As part of this, we refactor validation to some extent, in particular for prefixes overlapping detection, making it easier to follow at the expense of more lines of code. Signed-off-by: Quentin Monnet <[email protected]>

Now that VpcExpose accepts optional port ranges associated with IP prefixes, and that we've updated the VpcExpose validation steps to account for these port ranges, let's add some unit tests to make sure validation behaves as we expect. Signed-off-by: Quentin Monnet <[email protected]>

Introduce a new struct DisjointRangesBTreeMap, containing a generic BTreeMap. This struct is meant to be used with map keys representing non-overlapping ranges, for example: disjoint port ranges or disjoint IP prefixes. The idea is the following: we can store IP ranges, for example, in a DisjointRangesBTreeMap object, and look up for a single IP address, to find the corresponding map key (the IP range) and the associated value, whatever it is. The struct implements this lookup, so that we don't have to re-implement it each time we want to use a similar BTreeMap with different element types. The way we perform this lookup is by retrieving the closest lower map key (IP range) matching the IP address, using methods provided by the BTreeMap. We already use similar structs in several locations in the code, and we used to have it for stateless NAT in the past (before we switched to a LPM). We also plan on adding a new one for the destination VPC lookup discriminant stage, in a future commit. Introducing a DisjointRangesBTreeMap should help harmonise these, although we don't convert the existing similar structs at the moment. Notes: - As for the location of this struct: should it be in crate lpm? It's not directly related to the LPM implementation we have, but I didn't find a location that would really be better suited, and I don't think it deserves a dedicated crate. The crate also hosts the implementation for the Prefix structs, which are not completely tied to the LPM, either. - Maybe re-implementing many methods (iter(), range(), ...) for the DisjointRangesBTreeMap, just piggybacking on the methods for the inner BTreeMap, might be overly verbose. If it's too much we may change in the future to expose the inner BTreeMap, and use the methods from there, although I'd rather keep it encapsulated if possible. Signed-off-by: Quentin Monnet <[email protected]>

In preparation for supporting port ranges in "expose" objects, for Port Address Translation, adjust the destination VPC discriminant lookup stage. So far, for a given packet, the source VPC and the destination VPC would usually provide the destination VPC. When we add port ranges to VpcExpose objects, we may have two packets from the same VPC, with the same destination IP, but with L4 destination port from different ranges, maybe associated to distinct peerings corresponding with distinct VPCs. In this case, the source VPC and destination address are no longer enough to determine the destination VPC: we must take the destination port into account, too. This commit adjust the VpcDiscriminantTable and related lookup accordingly. As part of this, we re-implement the lookup() and insert() method for the VpcDiscriminantTable itself, in order to mask the additional complexity when working with the inner IpPrefixTrie. Signed-off-by: Quentin Monnet <[email protected]>

"Collapsing" prefixes, in the context of this commit, comes down to "applying" exclusion prefixes to a set of prefixes, by splitting the latter set into subsets containing all the initial addresses, minus those from the exclusion prefixes. This is helpful for NAT, because after we've collapsed prefixes, we don't need to work with exclusion prefixes anymore. In preparation for Port Address Translation, we've updated VpcExpose to associate optional port ranges to IP prefixes. As a result, we now need to account for port ranges when "applying" exclusion prefixes-with-port-ranges to prefixes-with-port-ranges. This is not trivial to do, but with a bit of refactoring, it is not as scary as it sounds. Signed-off-by: Quentin Monnet <[email protected]>

Add unit tests for methods subtract() and intersection() for structs PrefixWithPorts and PrefixWithOptionalPorts, to validate the logic. Tests drafted with Claude code, then validated and refined by hand. Signed-off-by: Quentin Monnet <[email protected]>

To test the updated prefix collapsing implementation, which now accounts for port ranges alongside IP prefixes, add a bunch of unit tests. Tests drafted by Claude Code - on the model of the existing tests that don't take port ranges into account - then adjusted and validated by yours truly. Signed-off-by: Quentin Monnet <[email protected]>

This is some simple renaming, in preparation for more changes in the code for the stateless NAT tables. Struct TrieRange will get an alternative struct with port ranges, soon. Make sure we can distinguish the two by renaming the existing one as what it is: an IP address range. Once we've renamed it, it no longer makes sense to call it's length method ip_len(). Instead, just make it len(). Signed-off-by: Quentin Monnet <[email protected]>

Working towards support for Port Address Translation in stateless NAT, update the structure of the stateless NAT tables to account for port ranges: we turn NatTableValue into an enum with variants AddrTranslationValue (IP range only) and PortAddrTranslationValue (IP and port ranges). We introduce the IpPortRange type, similar to IpRange but with an associated port range. It differs from PrefixWithPorts from lpm crate in that the IP range may not be a CIDR prefix. As for the map lookup, we used to do a simple LPM lookup to find the IP range (LPM key) corresponding to a given IP address, we'd retrieve associated target prefixes (LPM value) and do the mapping. IP ranges in the LPM never overlapped. With ports, it's a bit different: we can have overlapping IP ranges (as long as their respective associated port ranges make them disjoint). As a consequence, the longest matching IP prefix may no longer be the one matching the IP _and the port_ from the packet. Instead, we need to iterate over _all matching IP prefixes_ to find the one with the correct associated port. See method matching_entries() added to the LPM. Adjustments for building the table will come in a follow-up commit. Signed-off-by: Quentin Monnet <[email protected]>

Start Bolero tests update for testing the build of the context for stateless NAT, although we don't go as far as to add actual port ranges as part of the validation at this time. Signed-off-by: Quentin Monnet <[email protected]>

Now that stateless NAT tables support IP prefixes with associated port ranges, update the RangeBuilder struct and its implementation to build the stateless NAT tables accordingly. It's a bit messy, but in the end we get there. Note: we convert back the generated PortAddrTranslationValue to a simpler AddrTranslationValue when possible. This allows simpler (and more efficient) processing when port ranges are not required. It may be more correct to convert back _only_ when port ranges are not specified at all in the config for the relevant VpcExpose. But we'd need a pass before building the tables, to determine whether any of the prefixes in the lists of the VpcExpose uses ports (at the moment, we only have this information for the current prefix being processed, not for the other original prefixes). We may also want to optimise the RangeBuilder if we know there is no port range for a given Expose; although it's not critical, as this code only runs at config build time. Signed-off-by: Quentin Monnet <[email protected]>

Add unit tests to validate some of the functions used when building the context for stateless NAT. There is no test for port range support in RangeBuilder at this point, only for the functions it uses to build the ranges. Signed-off-by: Quentin Monnet <[email protected]>

After the various complex commits that ensued from the addition of port ranges to VpcExpose objects, this commit looks pretty straightforward. If we get a port mapping from the stateless NAT tables lookup, update the source and/or destination L4 port accordingly for the packet. Port update is a bit messy because we need to process each protocol separately; we might want to refactor this in the future. Signed-off-by: Quentin Monnet <[email protected]>

The idea behind the stateless NAT lookup is that, for a given IP and port, we find the matching covering prefix (and associated port range) in a LPM trie, and get the list of corresponding target ranges from the LPM value. Then we retrieve the offset of the IP and port within the prefix (LPM key), and look for the corresponding IP and port within the target prefixes (LPM value). To find the IP and port at the right offset, we need to iterate over all target prefixes in the list and to compare the accumulated size with the offset. This is not efficient; instead, we can use another data structure to perform a faster lookup. To that end, replace the list of prefixes with a DisjointRangesBTreeMap that associates a target prefix to each portion of the original prefix. Suggested-by: Manish Vachharajani <[email protected]> Signed-off-by: Quentin Monnet <[email protected]>

Make sure translation for stateless PAT works as "expected": we rely on the implementation using ordered and sorted prefixes to compute the expected matching IP/port for some IP/port combinations. Signed-off-by: Quentin Monnet <[email protected]>

Using an Option<PortRange> to make the port range optional may lead to confusion, in particular when we attempt to get the .size() of the object. It's cleaner to use an enum instead, which can take either a Prefix or a PrefixWithPorts (reusing the latter's methods to implement equivalent methods for PrefixWithOptionalPorts) - even though it's a bit more verbose overall. Suggested-by: Manish Vachharajani <[email protected]> Signed-off-by: Quentin Monnet <[email protected]>

qmonnet · 2025-12-18T00:34:28Z

@mvachhar Your feedback should be mostly addressed, please take another look when you can.

qmonnet added this to the GW R2 milestone Dec 13, 2025

qmonnet self-assigned this Dec 13, 2025

qmonnet added area/nat Related to Network Address Translation (NAT) ci:-upgrade Disable VLAB upgrade tests labels Dec 13, 2025

qmonnet force-pushed the pr/qmonnet/pat branch from a91378a to 7ddd0b8 Compare December 13, 2025 02:04

This was linked to issues Dec 13, 2025

Stateless NAT, Port Address Translation: implement translation #1089

Open

Stateless NAT, Port Address Translation: build context from API #1088

Open

qmonnet force-pushed the pr/qmonnet/pat branch 7 times, most recently from 6b5ba67 to a3c771f Compare December 17, 2025 00:18

qmonnet requested a review from mvachhar December 17, 2025 00:22

qmonnet marked this pull request as ready for review December 17, 2025 00:23

qmonnet requested a review from a team as a code owner December 17, 2025 00:23

qmonnet requested a review from Copilot December 17, 2025 00:23

Copilot started reviewing on behalf of qmonnet December 17, 2025 00:24 View session

Copilot AI reviewed Dec 17, 2025

View reviewed changes

mvachhar requested changes Dec 17, 2025

View reviewed changes

qmonnet force-pushed the pr/qmonnet/pat branch from a3c771f to 05d9959 Compare December 17, 2025 17:31

qmonnet added 6 commits December 18, 2025 00:25

qmonnet added 12 commits December 18, 2025 00:25

qmonnet force-pushed the pr/qmonnet/pat branch from 05d9959 to c0f0db9 Compare December 18, 2025 00:31

qmonnet requested a review from mvachhar December 18, 2025 00:33

Port Address Translation (stateless) #1137

Are you sure you want to change the base?

Port Address Translation (stateless) #1137

Uh oh!

Conversation

qmonnet commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qmonnet commented Dec 17, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qmonnet Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mvachhar commented Dec 17, 2025

Uh oh!

qmonnet commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qmonnet commented Dec 13, 2025 •

edited

Loading

qmonnet Dec 18, 2025 •

edited

Loading