Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 certificate format #1216

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

V2 certificate format #1216

wants to merge 10 commits into from

Conversation

nbrownus
Copy link
Collaborator

@nbrownus nbrownus commented Sep 13, 2024

This is a near complete implementation of an ipv6 enabled overlay network. There are a handful of pull requests targeting this branch in addition to a number of incomplete issues within this PR.

I have tried to highlight the main trouble spots with comments on the PR and in addition here are my internal notes:

  • nebula-cert is defaulting to mint both v1 and v2 certificates and nebula config is defaulting to transmit v1 certificates if both are present. This will lead to a situation where net new adopters will be stuck using v1 certificates. Should this be the default behavior?

  • Currently nebula wont allow you to remove a v1 certificate on reload but this means a restart is required to complete a migration to v2. A comment on this review highlights this, we should allow it.

  • Currently there are a few spots (control, ssh) where we return the certificate in use but they only return 1 cert, should we leave it returning the default or force folks to consume both? How do we indicate which one is default?

  • Every time we encounter a certificate we need to sort the networks as well as only record the union of applicable addresses. See [cert-v2] punchy-respond on an address in common with the querying host #1261

  • We need to take extra care to ensure we never allow a ipv4 mapped ipv6 address. This is not currently addressed in this PR.

  • There may be an opportunity to improve the lighthouse protocol by directly using the MarshalBinary and UnmarshalBinary functions for netip objects. This is more of a nit than anything but we should agree on this before merging.

  • nebula-cert sign help text needs to talk about primary vpn address (or the union of effective addresses) being the first one after being sorted and what that means for tunnels.

  • The firewall config still uses naming like ca-sha should we alias this to ca-fingerprint to normalize the verbiage?

  • The firewall config still uses naming like ip should we alias this to network to normalize the verbiage

  • Normalize all names for Marshal* and Unmarshal*, we have some From*/To* and others without From/To.

  • Should we use this moment to swap p256 keys to use their compressed form? Saves ~32 bytes.

Once this is merged the upgrade path should likely be:

  1. Give every host dual certs and leave the default pki.default_version to 1. Everything continues to use v1 protocols.
  2. Switch all am_relay: true hosts pki.default_version to 2. Relays can now initiate v2 tunnels, but this is an uncommon flow.
  3. Switch all non lighthouse hosts pki.default_version to 2. Lighthouses will reply to v2 hosts using v2 protocols.
  4. Switch all lighthouses pki.default_version to 2.
  5. Remove v1 certs from all hosts

cert/cert_v2.asn1 Outdated Show resolved Hide resolved
lighthouse.go Outdated Show resolved Hide resolved
@nbrownus nbrownus mentioned this pull request Oct 4, 2024
Base automatically changed from cert-interface to master October 10, 2024 23:00
q,
cache.Get(u.l),
)
r(netip.AddrPortFrom(netip.AddrFrom16(rua.Addr).Unmap(), (rua.Port>>8)|((rua.Port&0xff)<<8)), buffer[:n])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use binary to avoid endianness issues

}
if *sf.networks != "" {
for _, rs := range strings.Split(*sf.networks, ",") {
//TODO: error on duplicates? Mainly only addr matters, having two of the same addr in the same or different prefix space is strange
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also no 4in6

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #1266

if err != nil {
return newHelpErrorf("invalid subnet definition: %s", rs)
return newHelpErrorf("invalid -unsafe-networks definition: %s", rs)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And no 4in6

return fmt.Errorf("error while signing with PKCS#11: %w", err)

if version == cert.Version1 {
// If we are asked to mint a v1 certificate only then we cant just ignore any v6 addresses
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We arent ignoring them

@nbrownus nbrownus force-pushed the cert-v2 branch 3 times, most recently from 9f75b80 to a09f39f Compare October 24, 2024 03:17
cert/cert_v1.go Outdated Show resolved Hide resolved
cert/cert_v2.go Outdated Show resolved Hide resolved
cert/cert_v2.go Outdated Show resolved Hide resolved
cert/cert_v2.go Outdated Show resolved Hide resolved
cert/cert_v2.go Outdated Show resolved Hide resolved
cert/cert_v2.go Outdated Show resolved Hide resolved
cert/cert_v2.go Outdated Show resolved Hide resolved
cert/cert_v2.go Outdated Show resolved Hide resolved
Vltime: 0xffffffff,
Pltime: 0xffffffff,
},
//TODO: should we disable DAD (duplicate address detection) and mark this as a secured address?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least on Linux, my unsubstantiated opinion is that nebula should not do this, but our docs should suggest appropriate sysctls and systemd-networkd settings.


//add all new addresses
for i := range newAddrs {
//todo do we want to stack errors and try as many ops as possible?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

overlay/tun_linux.go Outdated Show resolved Hide resolved
@@ -213,7 +223,8 @@ func (t *tun) removeRoutes(routes []Route) error {
continue
}

cmd := exec.Command("/sbin/route", "-n", "delete", "-net", r.Cidr.String(), t.cidr.Addr().String())
//todo is this right?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

@@ -181,8 +191,8 @@ func (t *tun) removeRoutes(routes []Route) error {
if !r.Install {
continue
}

cmd := exec.Command("/sbin/route", "-n", "delete", "-inet", r.Cidr.String(), t.cidr.Addr().String())
//todo is this right?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

ssh.go Outdated
@@ -785,7 +785,8 @@ func sshPrintCert(ifce *Interface, fs interface{}, a []string, w sshd.StringWrit
return nil
}

cert := ifce.pki.GetCertState().Certificate
//TODO: This should return both certs
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

@@ -91,40 +92,60 @@ func AddRelay(l *logrus.Logger, relayHostInfo *HostInfo, hm *HostMap, vpnIp neti
func (rm *relayManager) EstablishRelay(relayHostInfo *HostInfo, m *NebulaControl) (*Relay, error) {
relay, ok := relayHostInfo.relayState.CompleteRelayByIdx(m.InitiatorRelayIndex, m.ResponderRelayIndex)
if !ok {
rm.l.WithFields(logrus.Fields{"relay": relayHostInfo.vpnIp,
//TODO: we need to handle possibly logging deprecated fields as well
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

if msg.OldRelayFromAddr > 0 || msg.OldRelayToAddr > 0 {
v = cert.Version1

//TODO: yeah this is junk but maybe its less junky than the other options
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

if v == cert.Version1 {
peer := peerHostInfo.vpnAddrs[0]
if !peer.Is4() {
//TODO: log cant do it
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

relay_manager.go Outdated
@@ -310,11 +357,11 @@ func (rm *relayManager) handleCreateRelayRequest(h *HostInfo, f *Interface, m *N
f.SendMessageToHostInfo(header.Control, 0, peer, msg, make([]byte, 12), make([]byte, mtu))
rm.l.WithFields(logrus.Fields{
//TODO: IPV6-WORK another lazy used to use the req object
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

relay_manager.go Outdated

if v == cert.Version1 {
if !h.vpnAddrs[0].Is4() {
//TODO: log it
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

relay_manager.go Outdated
@@ -360,11 +419,11 @@ func (rm *relayManager) handleCreateRelayRequest(h *HostInfo, f *Interface, m *N
f.SendMessageToHostInfo(header.Control, 0, h, msg, make([]byte, 12), make([]byte, mtu))
rm.l.WithFields(logrus.Fields{
//TODO: IPV6-WORK more lazy, used to use resp object
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

pki.go Outdated
return p.cs.Load()
}

func (p *PKI) GetCAPool() *cert.CAPool {
return p.caPool.Load()
// TODO: We should remove this
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

pki.go Outdated
return p.cs.Load().GetDefaultCertificate()
}

// TODO: We should remove this
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

}

} else if currentState.v1Cert != nil {
//TODO: we should be able to tear this down
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

p.cs.Store(cs)
p.cs.Store(newState)

//TODO: newState needs a stringer that does json
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

pki.go Outdated Show resolved Hide resolved
return nil, util.NewContextualError("v1 and v2 curve are not the same, ignoring", nil, nil)
}

//TODO: make sure v2 has v1s address
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

overlay/tun_windows.go Outdated Show resolved Hide resolved
@nbrownus nbrownus marked this pull request as ready for review October 29, 2024 03:48
if ok {
whereToPunch = newDest
} else {
//TODO this means the destination will have no addresses in common with the punch-ee
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants