-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Query for DNS record propagation #577
Comments
Replication is done on a per-domain (per-zone) level, not per record set. The proper object to add this would therefore be the domain object, and it could indicate whether all secondaries have a current copy. (Or perhaps indicate the last time a current copy was seen everywhere; if this is less than last time the domain was changed, it should reflect replication freshness. Have to think about it more.) In any case, putting this at the domain object will also cover the "RRset deleted" case. |
Depending on the replication mechanism this information is very hard to obtain, and conflicts with replication plans that we recently introduced (#571). I hence suggest we close this issue as "won't fix". That being said, desec.io (not desec-stack) could publish information on how long DNS updates (typically) take. I believe currently we have some 99% of updates done in <1min. |
#571 doesn't contradict this. But it would indeed be problematic if the nodes we replicate to continue with additional replication to second-layer nodes whose IP addresses we don't have. To me, it's not clear yet that this will be the case in our situation, so I'd like to keep this open for now. For context: We are planning a cooperation with pch.net who will run some nodes for us, and they'll likely do some internal replication. The question will be how we can determine when replication has finished on their side. I'd hope that there would be some way to do that, not only for the purposes of this issue, but generally -- we should have insight into that. |
For me as a reader of this bug it is unclear if the requirement is:
In my experience, most of the unforeseen disruptions happen because planning should have considered at least case 3, but effectively "just" modeled with case 1. In my opinion, cases 1 and 2 are values of purely academic value that are just misleading in practice and shoulnd't be offered at all. On the other side, values for case 3+ are often underestimated by large (e.g. many records have TTLs in the range of days) and are only found out "when it's too late". Thus, I think, the calculation should actually support the prediction of changes before they're commited and receive a prominent place in the UX. |
This. It's important e.g. for ACME clients to know when it's safe to tell the server that the ACME challenge can be found in the DNS. There might be other reasons why a user might want to know which version of their zone is served in which region.
This has / We have nothing to do with caching resolvers; authoritative DNS ends with putting the zones in place.
That may be a good idea, but it's a different issue. |
In fact, calculating 3+ is only possible based on the information about what the propagation status on authoritative servers is. So, implementing the feature discussed here is a prerequisite for your feature. |
The propagation status is publicly available in DNS. The status required to calculate the delays for a planned change are in DNS before and up until the change is pushed and propagated. The SOA record declare the timings until all potentially anycasted servers are returning coherent answers again. In case of true multi-master hosting the SOAs will have different serial numbers and then each zone must be considered in parallel (this is perfectly legal and Microsoft Active Directory integrated DNS is a prominent case). ACME is indeed a very practical usecase, and actually the most concrete I see causing confusion very often. Let's assume for a moment the day we're changing this record is also the day we were unhappy with our previous hoster and decided to migrate our zone to a cool new one. The aforementioned resolver (e.g. a letsencrypt verifier) has cached our old NS reconds and our A record just moments before. Now we're pushing new (delegation and) NS records and a changed A record on our new authoritative servers. The aforementioned resolver is then asked again about our A record, which it realizes has expired. It still holds valid cached NS records though, because our previous DNS hoster and the parent zone had choosen to serve them with high TTL. Thus, the resolver will start recursive resolution, but using cached records will shortcut to querying our previous providers nameservers, possibly obtaining a tecnically valid response, but not the one we expected. Example
So I get that www.desec.io resolves to 88.99.64.5, valid for 15 minutes. Let's say, I want to migrate desec.io to another DNS hoster and update the A record. Let's see how quick the zone can be migrated, i.e. how long the delegation records persist in caches:
That's one hour. Now let's consider for how long desec.io servers think they should be cached:
The informational NS record indicates 5 minutes (while the parent zone requires one hour authoritatively). So, just from the zone data we would expect the update to propagate in less than 15 miutes. But in fact, by glossing over this example, a proper answer can't be less than 75 minutes (900 seconds for the A record and 3600 seconds for the NS delegation). |
No.
How do you know that? (I would be surprised if Let's Encrypt's challenge fetching follows standard TTL rules.)
The parent is not authoritative for the NS records, as indicated by the absence of the AA bit in the response:
Only the answer from the child has the AA bit:
|
For some applications (ACME challenges, TLSA records, ...), it is interesting to know, if a record that was just added/updated/removed/... has fully propagated to the authoritative nameservers. Just querying the authoritative nameservers is not sufficient, due to the anycast network. Some frontend servers may already have the data, others may not, but I can only query those, that are closest to me.
Particularly when the network is slow to update, it would be useful to have some way to find out if all servers have/publish the same information. Since I do not believe this can be done in DNS, I suggest adding it to the API.
I don't have a clear idea, of how this should look like.
It could be an additional JSON field that is returned for all RRSets. I'm not sure how to indicate the propagation status of record removal.
Another option might be to provide a list of pending changes.
The text was updated successfully, but these errors were encountered: