Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent DNS query results #869

Closed
Atemu opened this issue Jan 11, 2024 · 5 comments
Closed

Inconsistent DNS query results #869

Atemu opened this issue Jan 11, 2024 · 5 comments

Comments

@Atemu
Copy link

Atemu commented Jan 11, 2024

Today, I just cannot do DNS-01 ACME challenges reliably. It always fails saying that there was NXDOMAIN on the challenge domain eventhough lego has waited until the challenge record showed up in its DNS queries. I suspect this might be related to replication?

I let a while true; do dig @1.1.1.1 _acme-challenge.... TXT ; sleep 1 ; done run on the side and noticed something really odd:

  1. When the challenge record appears, it's usually gone the next second's query
  2. These short appearances can appear even while lego continues to query (it hasn't received a response where the record is present)
  3. The challenge record can re-appear even minutes after lego stopped the challenge (also just for one query, gone the next second). I typed out this report since the last challenge ran but I'm still sometimes getting ACME challenge records back every couple dozen seconds

Something's not right here..

@peterthomassen
Copy link
Member

Indeed, we have been experiencing replication issues related to instabilities of the nameserver software we're using on our secondary servers (context: https://talk.desec.io/t/ns1-desec-io-replication-issues/804/6).

We have identified a solution, which is running in test mode on ams-1.a.desec.io (IPv4 only). Feel free to do tests with this nameserver and report back here.

We are expecting to deploy this into production incrementally, starting tomorrow.

We're very sorry for this!

@Atemu
Copy link
Author

Atemu commented Jan 11, 2024

Thanks for the quick reply! Good to know I'm not going insane and there's an actual issue.

I'm not sure I can really test this given that the issue is on Let'sEncrypt's side and they query the SOA I assume.

(I'm somehow starting to question whether replication is solving more issues than it's causing...)

@Atemu
Copy link
Author

Atemu commented Jan 13, 2024

Is there a nameserver I could poll that will always be the last to have the record propagated to so that I can ensure ACME will get NOERROR on the challenge record when it queries the SOA?

@peterthomassen
Copy link
Member

Nopes, there is no such logic, unfortunately. The order in which secondary servers pull updates is not deterministic (or rather, the factors are not fully known, including network routing etc., so it's hard to say).

We're planning to implement an API for replication observability though. We might add a compact way for figuring out what the oldest deployed serial is, which I believe would give you what you need. You can track progress of this at #852. However, we're currently working on more important replication improvements, so that work on this PR is delayed by a bit -- and once the replication work has finished, you might no longer need it ;-)

I'm realizing that this is an issue for https://github.com/desec-io/desec-ns, so I'm closing it here.

@peterthomassen peterthomassen closed this as not planned Won't fix, can't repro, duplicate, stale Jan 15, 2024
@Atemu
Copy link
Author

Atemu commented Jan 15, 2024

Thanks.

For reference in case you changed anything, I was able to complete challenges some of the time earlier today though it was still inconsistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants