Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOA Serial does not reflect the version of the data being served #690

Open
james-stevens opened this issue Feb 23, 2022 · 20 comments · May be fixed by #766
Open

SOA Serial does not reflect the version of the data being served #690

james-stevens opened this issue Feb 23, 2022 · 20 comments · May be fixed by #766

Comments

@james-stevens
Copy link
Contributor

This is effectively a duplicate, but broader description, of #559

In DNS, the purpose of the SOA Serial is to tell the clients the version of the data currently being served.

This is NOT being fulfilled by simply serving out the current date & time as it fails to take into account when a server is out-of-date & is still catching up, e.g. due to maintenance downtime or connectivity issues.

This causes a problem when running two instances of hsd (for failover) in conjunction with Buffrr's AXFR plug-in to feed the merged ROOT zone to one or more slaves. If one hsd server is taken down for a day or two, then brought back up - it will immediately lie that it has the latest data, when in fact it is still catching up.

This can cause the data on a downstream slave to be rolled back to an earlier version & the slave will then not be updated until the clock marches forward.

I've pointed this out varios devs at various times, but so far it's not fixed (v3.0.0)

The timestamp on the last block that was included in the most recent urkel tree update seems a reasonable choice to me, or this timestamp could be converting into YYYYMMSSXX format, should you prefer, but many TLDs use unixtime as the SOA Serial these days.

Using any information that is always increasing, from the last block that was included in the most recent urkel tree update, will ensure that only when two servers are serving the same version of information will they return the same SOA Serial.

@pinheadmz
Copy link
Member

@buffrr and I have discussed this a lot as well but the issue is how to handle re-orgs (more on that in a sec). Your point about getting SOA from full nodes that have not yet finished syncing however seems far more important than what we were thinking about.

The reorg issue is: Imagine block 36 is found, the Urkel tree and the HNS root zone will officially be updated. We can use the height 36 as the serial or the timestamp of that block in YYYYMMDDHH. A few minutes later, a chain reorganization occurs, changing the content of the Urkel tree and HNS root zone and creating an alternate block at the same height, maybe with the same timestamp, maybe even with an EARLIER timestamp (this is valid blockchain shit). Downstream DNS clients that already have the first copy of the root zone now have invalid data, inconsistent with other HNS resolvers, and will not be fixed until the next Urkel tree update in about 6 hours when, hopefully, a reorg does not occur and everyone is back on the same page.

One thing we could do is just always use the current chain tip height as the serial, not just the height of the last tree update.

pro: reorgs handled automatically
pro: downstream clients can tell a hsd node is not synced yet
pro: all synced nodes are always in sync with same SOA serial
con: serial will change every ten minutes even though root zone data probably hasn't changed

Another thing we can use instead of chain height is the Median Time Past which is the median time of the last 11 block timestamps. It is guaranteed to always increase unlike the individual block timestamps themselves. Again, we would have to update this every block to ensure that reorgs are properly handled.

So,

Whats the best tradeoff? Does the axfr bridge repeatedly poll SOA serial and only transfer when its been updated? That would require more clever logic or else youre going to be downloading the same root zone every ten minutes, probably.

Or,

is it "okay" to have invalid data for 6 hours and then just hope the next tree update goes smoothly?


I've pointed this out varios devs at various times, but so far it's not fixed (v3.0.0)

Sorry about this, we are under-staffed and the developers that are working on HNS core software have higher priorities that they feel affect more users. This is why writing a PR is often more effective than pointing things out or opening issues.

@buffrr
Copy link
Contributor

buffrr commented Feb 23, 2022

his causes a problem when running two instances of hsd (for failover) in conjunction with Buffrr's AXFR plug-in to feed the merged ROOT zone to one or more slaves. If one hsd server is taken down for a day or two, then brought back up - it will immediately lie that it has the latest data, when in fact it is still catching up.

I have an open issue about this here trying to solve this in the plugin. The idea is to wait 12 blocks before serving the zone so that we can have a semi-globally consistent SOA serial across multiple hsd instances serving the exact same zone (assuming no reorgs larger than 12 blocks i think)

is it "okay" to have invalid data for 6 hours and then just hope the next tree update goes smoothly?

Ideally, hsd shouldn't really serve any zone data before it's fully synced. Simply restarting hsd will cause all kinds of unexpected issues and break lots of sites because it serves stale data that recursive resolvers will cache for a couple of hours. In the worst case, it will serve compromised DS keys that site owners have updated, but users will still be vulnerable because they will get the old key.

@pinheadmz
Copy link
Member

Yes ok I like this a lot, and we were already discussing the disparity between hsd and hnsd -- a good solution is for hsd to ALSO use "safe height for resolution" (12 blocks) and then use the timestamp from the tree interval block as the SOA serial. There will be some confused users as we deploy this but it does seem like that covers everything.

@pinheadmz
Copy link
Member

we can also use chain.maybeSync() to determine if it is safe to resolve names. The worst case scenario is that this returns true when the node is still 12 hours behind:

hsd/lib/blockchain/chain.js

Lines 2877 to 2894 in df997a4

maybeSync() {
if (this.synced)
return;
if (this.options.checkpoints) {
if (this.height < this.network.lastCheckpoint)
return;
}
if (!this.hasChainwork())
return;
if (this.tip.time < this.network.now() - this.network.block.maxTipAge)
return;
this.synced = true;
this.emit('full');
}

/**
* Age used for the time delta to
* determine whether the chain is synced.
*/
maxTipAge: 12 * 60 * 60,

@james-stevens
Copy link
Contributor Author

In the worst case, it will serve compromised DS keys that site owners have updated, but users will still be vulnerable because they will get the old key.

I would hope, in that situation the stale DS would fail to validate & all the data should be discarded

con: serial will change every ten minutes even though root zone data probably hasn't changed

TBH, for me that would rule that out - IMHO that's terrible

Ideally, hsd shouldn't really serve any zone data before it's fully synced

Yeah - that's fair - if you can do that, it would be the best

on a (non-blockchain) high volume DNS server I wrote in the past, that used a push journal stream update, you could keep journal blocks coming in while the DNS server was down & it would rush-process them (at start-up) to catch up before serving any data. It would also ask the upstream the latest journal serial number & wait until it had at least reached that serial before serving data - it meant there could be a small number of journal blocks that hadn't been processed before it starts serving, but it would only be a few & they'd get processed very quickly after serving started.

Have to say, it did slightly disturb me that a new install of hsd would take about 6 to 8 hrs to sync up, but was more than happy to start serving data right away.

If hsd never served any data until it was fully sync'd then the SOA Serial becomes less relevant - cos it would only ever serve latest data - no response would be fine by me - but even then, only changing the SOA Serial when the underlying data has change would be nice - right now it changes every hour, even though data may, or may not, have changed - but that's not a big deal at all cos the first thing I do is an AXFR->IXFR conversion, so it will detect the SOA Serial is the only thing that's changed.

This is why writing a PR is often more effective

oh, sure, but as you can see from the detail of the discussion, there's no way I'd ever come up with anything suitable, just not enough background knowledge!

the median time of the last 11 block timestamps

so long as there's near 100% chance of it always increasing - seems fine to me

Does the axfr bridge repeatedly poll SOA serial and only transfer when its been updated?

Not sure about the bridge, but this is exactly what the slave will be doing - SOA polling over UDP looking for an increased Serial

There will be some confused users as we deploy this

Any DNS s/w should be able to automatically cope with the serial number dropping from YYYYMMDDXX to unixtime - there's a bunch of rules about it. Serial numbers are allowed to roll-round.

@befranz
Copy link

befranz commented Feb 23, 2022

Have to say, it did slightly disturb me that a new install of hsd would take about 6 to 8 hrs to sync up, but was more than happy to start serving data right away.

I assume this was installed on HDD, syncing up on an SSD shouldn't take longer than 3 hours.

@buffrr
Copy link
Contributor

buffrr commented Feb 23, 2022

I would hope, in that situation the stale DS would fail to validate & all the data should be discarded

If an attacker was in the middle it should have no problem giving DNS answers signed by the old key (acting as the TLD's authoritative server). Of course, i'm talking about the worst case here, and this only works on not yet synced hsd nodes.

we can also use chain.maybeSync() to determine if it is safe to resolve names. The worst case scenario is that this returns true when the node is still 12 hours behind:

Any chance of having a similar function to maybeSync() but with a stricter criteria for resolving that checks for a smaller than maxTipAge like 6 hours or even 2 hours? Something like

isReady() {
    if (!this.synced)
        return false; 
   
    return this.tip.time >= this.network.now() - 2 * 60 * 60;
}

Hmm is there a case where this could return false forever or take way too long?

@pinheadmz
Copy link
Member

Block timestamps are only required to be greater than MTP (Median Time Past) which is the median of the last 11 blocks timestamps and usually ends up being about an hour behind actual UTC time. They are also required to be no greater than two hours ahead of actual UTC time.

Sometimes (not often but) blocks can take over an hour to find by the miners. So there is a case where an in-fact-fully-synced node will stop resolving because the chain tip timestamp is < 2 hours ago.

I think 12 hours is probably ok for this, but we can still compromise on 6 hours which makes sense anyway since thats the tree interval on mainnet.

Even if we started resolving 24-hour-old data the worst case is that a key is trusted that was revoked less than one day ago. Question for you DNS experts: how long do you normally expect a DNSSEC update to propagate through DNS anyway?

@james-stevens
Copy link
Contributor Author

I assume this was installed on HDD, syncing up on an SSD shouldn't take longer than 3 hours.

yeah, HDD, but RAID - so not the worst case scenario - also 3 hrs of giving out incorrect NXDOMAIN answers isn't too great

And as time goes on, this will only get longer & longer - currently participation in this project is relatively low - for example, ICANN ROOT servers get terabytes of queries each, every day.

its a real shame the DNS data couldn't be separated from all the auction & $HNS transaction data - but I can see splitting off where proof-of-ownership actually occurs is tricky without all the supporting evidence.

it should have no problem giving DNS answers signed by the old key

if the scenario is you changed the DS cos the private keys were compromised, yeah I guess - lag is always a bitch

Even if we started resolving 24-hour-old data the worst case is that a key is trusted that was revoked less than one day ago

Or you give out NXDOMAIN for a TLD that really does exist

But if you change a DS or NS (assuming its not an emergency), it would be reasonable to expect the old values to continue to work for at least 24 hrs, but probably more like at least a week.

how long do you normally expect a DNSSEC update to propagate through DNS anyway?

If the zone changes their DNSKEY & DS (and all their RRSIGs) right away, any new RRSIGs will fail to validate (against the old key data) & ALL the associated data should be discarded from cache - so it should be pretty quick. But exactly what is discarded is very implementation dependent - so it can take longer, DNSSEC data & successful validations are often kept, cos they can take a while to complete

CloudFlare do it correctly & will flip almost immediately - most others wait for the TTL on validated keys to expire before dropping them. Cos 8.8.8.8 is a cluster of many servers, this means the old values are dropped gradually over a few hrs.

Obviously it also depends on DS change propagation time in the parent zone - in most ICANN TLDs customers now expect this to be live - i.e. within a few seconds (def <1min) of posting the change with the registrar.

@buffrr
Copy link
Contributor

buffrr commented Feb 24, 2022

Even if we started resolving 24-hour-old data the worst case is that a key is trusted that was revoked less than one day ago.

Yeah, even 24-hour is an improvement. If we can make it 6 hours, that's even better (assuming no issues)

how long do you normally expect a DNSSEC update to propagate through DNS anyway?

If the zone changes their DNSKEY & DS (and all their RRSIGs) right away, any new RRSIGs will fail to validate (against the old key data)

Yeah, a proper key roll over should be performed. If proofofconcept is popular site with lots of users. It would be a bad idea to just replace the DS in the parent. This will cause an outage.

For example, you have this DS record in the root zone:

proofofconcept.		21600      IN	DS	5362 8 2 <some digest>

To "roll" the DS, you should first add a new one (while still keeping the old DS).

proofofconcept.		21600      IN	DS	5362  8 2 <some digest>
proofofconcept.		21600      IN	DS	55367 8 2 <some digest>

Resolvers may still have the old DS RRSet cached for proofofconcept they don't know yet about the key with tag 55367. So you should keep your zone signed with the old key.

For a safe DS rollover:

  1. Add the new DS to the root zone (while still keeping the old one).
  2. Wait for the Handshake root zone to serve the new DS RRSet.
  3. Then, wait for the old DS RRSet to expire from resolvers cache (depends on TTL for hsd its 6 hours iirc)
  4. Also, change your DNSKEY RRSet. Add the new DNSKEY(s) to your authoritative server (still signed by the old key). Keep the old DNSKEYs too!
  5. Wait for the TTL of your old DNSKEY RRSet to expire. Resolvers should see your updated DNSKEY RRSet.
  6. Now you can start signing with the new key.
  7. Remove the old DS with tag 5362 from the root zone it's no longer used and remove the old DNSKEYs.

Rolling a DS safely requires two updates to the root zone. Alternatively, you can always have an emergency standby DS added that you keep secure somewhere. If the active DS/DNSKEYs are compromised, you can just remove them and start using the new ones immediately. This requires one update to remove the old DS.

Of course, this area is still improving so some better techniques may come up.

@james-stevens
Copy link
Contributor Author

For a safe DS rollover:

I think adding the new keys before adding the new DS is more reliable.

Adding the new DS first doesn't universally work (in ICANN zones). I've tried it before. It works with bind, CloudFlare & Google, but not some proprietary implementations, like Zscaler (there may be others) - TBH its a fecking nightmare and the RFCs do contradict each other at times.

I was trying to move from one DNSSEC signing provider to another (for a large client). In the end the conclusion was that the only way to do it was to go unsigned for a while!! ... although I think I've got a plan that would work now

IMHO best thing is to do is add the new keys, then add the new DS, wait then remove the old DS, then remove the old keys.

One RFC says so long as there is any path to validate any RRSIG then you should accept the data, another says there MUST be a validation path for EVERY DS present.

The powers that be™ are aware of this contradiction and plan to issue a clarification.

If you read the official methodology for changing external signing provider, you'll discover that there isn't a single piece of DNS s/w that supports it!

Changing KSK algorithm is also a nightmare.

I can totally see why a lot of well known sites just don't use DNSSEC - there's little tangible advantage (advantages an MBA could measure), but there are all sorts of nasty corner cases that can bring your site down. PowerDNS does a good job of making it a lot easier to implement.

@buffrr
Copy link
Contributor

buffrr commented Feb 25, 2022

I think adding the new keys before adding the new DS is more reliable.

Having an additional DS without a corresponding DNSKEY is okay and this was mentioned from the very early DNSSEC RFCs but it may get tricky when changing algorithms. I like the DNSKEY first more actually because it allows your new DNSKEY(s) to propagate in resolvers cache while still waiting for Handshake root zone to update. So you can do both at the same time actually especially if your DNSKEYs will propagate faster (depends on TTL). RFC-7583 is dedicated to this and explains drawbacks of different techniques but doesn't cover algorithm changes.

One RFC says so long as there is any path to validate any RRSIG then you should accept the data

Yup that should be the case.

another says there MUST be a validation path for EVERY DS present

This may be tricky when considering message digest algorithm and DNSKEY algorithm downgrades. I can see why this is useful. If two trust anchors are present, one with a stronger algorithm and one weaker, a validating resolver may want to favor the stronger. Mainstream resolvers don't do this though because they have to accept any valid path.

I can totally see why a lot of well known sites just don't use DNSSEC - there's little tangible advantage

There's a small advantage to securing A/AAAA records. WebPKI threat model doesn't rely on DNSSEC. DANE is the killer app and what makes it worth it.

TBH its a fecking nightmare and the RFCs do contradict each other at times

Yeah there's confusion there and some resolvers try to interpret the RFCs more strictly. I think what makes DNSSEC hard is having to think about those TTLs and the global cache. Validating resolvers should perhaps try to be more lenient and request new keys when validation fails although this could increase load or introduce new forms of denial of service since any bogus answer would trigger multiple queries.

@james-stevens
Copy link
Contributor Author

I like the DNSKEY first more actually because it allows your new DNSKEY(s) to propagate in resolvers cache

Right - with PowerDNS you can also propagate keys "inactive" first, which is nice - its their recommended method for ZSK Rollover

@buffrr
Copy link
Contributor

buffrr commented Feb 25, 2022

Someone should write a plugin for hsd to automate rollovers perhaps by querying CDS/CDNSKEY records ;) Since we can easily update the root zone, parent/child communication should really be automated.

@pinheadmz
Copy link
Member

Ok so if I can boil this discussion down to a set of code changes we all agree on, I'll open a PR:

  1. Like SPV node, Full Node should wait 12 confirmations after each Urkel Tree update before resolving records from the updated root zone. (see getSafeRoot() in chain.js)

  2. The SOA serial should be the timestamp in the first block header after each tree update (ie. the first block header to commit to the updated tree root hash, which according to (1) was at least 11 blocks ago)

  3. The hsd (and hnsd) root server should send REFUSED (or is there something better ?) until the chain is "synced" which means the timestamp in the chain tip (most recent block) is within the last 6 hours. This is a different definition of "synced" than is used elsewhere in the code, that's OK.

This will:

  • prevent old records from being served while a node is still syncing
  • guarantee that SOA serial is always increasing, and only changes when the root zone actually changes
    • even if there is a chain reorg, but only if that reorg is < 12 blocks deep (which is like, we got bigger problems then)
  • synchronize the responses from full and light nodes
    • HNS users will have to get used to an extra 2-hour wait when updating records on chain 😬

@james-stevens
Copy link
Contributor Author

james-stevens commented Mar 8, 2022

The hsd (and hnsd) root server should send REFUSED (or is there something better ?)

SERVFAIL (rcode=2) would be what I'd expect to see

                2               Server failure - The name server was
                                unable to process this query due to a
                                problem with the name server.

sounds like a typical techie comment 😆

@buffrr
Copy link
Contributor

buffrr commented Mar 8, 2022

Sounds good! We can use extended DNS errors RFC8914 edns option to indicate that the chain is syncing just to make it easier to differentiate other types of SERVFAIL

@james-stevens
Copy link
Contributor Author

We can use extended DNS errors RFC8914 edns option

Sounds like a fine plan, you would need to check the client asked using EDNS, of course

Generally, SERVFAIL is actually remarkably rare, for authoritative servers - generally I've only seen it when bind is still loading the zone data - so pretty much the exact same scenario. Also, the standard behavior is to just try the next NS - this comment is very typical, IMHO

Fortunately, most domains use multiple authoritative DNS servers, so if there is a short-lived ServFail issue
on one name server which doesn’t impact the others, DNS lookups should still work. That said, if a name 
server has chronic ServFail issues, we recommend investigating why. ServFail errors happen, but should be rare.

@buffrr
Copy link
Contributor

buffrr commented Mar 10, 2022

SERVFAIL is actually remarkably rare, for authoritative servers

yup REFUSED is usually used if they don't want/not ready to answer (it really just depends on preference). For example, Knot DNS (an authoritative) will give a REFUSED answer if it's not ready to respond to an AXFR request (if transfers are temporarily paused). The REFUSED answer uses Extended DNS Errors with EDE error code 14 (Not Ready):

4.15. Extended DNS Error Code 14 - Not Ready

The server is unable to answer the query, as it was not fully functional when the query was received.

@james-stevens
Copy link
Contributor Author

yup REFUSED is usually used

yeah, sounds better - it's a fine line RFC1035 says REFUSED is to be used for policy reasons - so it's mostly seen for permissions issues (auth servers that refuse to answer RD=1, resolvers with ACLs for who can use them, etc), but you could easily make a case that not serving out-of-data data is a policy decision -> name server may not wish to perform a particular operation does fit.

                5               Refused - The name server refuses to
                                perform the specified operation for
                                policy reasons.  For example, a name
                                server may not wish to provide the
                                information to the particular requester,
                                or a name server may not wish to perform
                                a particular operation (e.g., zone
                                transfer) for particular data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants