Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Priming the root key fails when RPZ is enabled in Unbound 1.20.0 or later #1152

Open
yukjn4 opened this issue Oct 10, 2024 · 9 comments
Open

Comments

@yukjn4
Copy link

yukjn4 commented Oct 10, 2024

Describe the bug
After updating Unbound from 1.19.1 to 1.20.0, trust anchor auto-update fails.
The first time Unbound starts, it loads the trust anchor and resolves the domain name.
However, when the trust anchor expires after one day, it is not automatically renewed and domain resolution fails.

To reproduce
Steps to reproduce the behavior:

  1. Start Unbound.
    →Trust anchor is loaded.
  2. The trust anchor expires after one day.
    →It is not automatically renewed and domain resolution fails.
    At this time, the contents of the root.key file were also not updated.

Expected behavior
Trust anchor is automatically updated.

System:

  • Unbound version: 1.20.0 , 1.21.0
  • OS: RHEL8

Additional information
Execute dig(failure case)

[root@xxx ~]# dig @(unbound server) www.google.com

; <<>> DiG 9.11.36-RedHat-9.11.36-11.el8_9.1 <<>> @(unbound server) www.google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 34544
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;www.google.com.                        IN      A

;; Query time: 286 msec
;; SERVER: xxx.xx.xx.xx#53(xxx.xx.xx.xx)
;; WHEN: Thu Sep 12 16:03:38 JST 2024
;; MSG SIZE  rcvd: 43

Execute dig(dnnsec)

[root@xxx ~]# dig +dnssec @(unbound server) jprs.jp

; <<>> DiG 9.11.36-RedHat-9.11.36-11.el8_9.1 <<>> +dnssec @(unbound server) jprs.jp
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14244
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;jprs.jp.                       IN      A

;; ANSWER SECTION:
jprs.jp.                300     IN      A       117.104.133.164
jprs.jp.                300     IN      RRSIG   A 8 2 300 20241005023003 20240905023003 61491 jprs.jp. Bmdphm9M3RVFGlrmO2Kt4bSKTvyMzr2/S/rbgjCAd4ImgTo7eAlZAEWd fZ8rkhv70sdCfWYNKPLewUqV8ofUTsssP9NxXy+dh8SJaA6Ur8ADxVre EhITCTrcg1Gv36yrw1Z6T2wHaiu3kjXjpB38d6W4dpcEVDnr6iAo8AZ+ OJI=

;; Query time: 17 msec
;; SERVER: xxx.xx.xx.xx#53(xxx.xx.xx.xx)
;; WHEN: Thu Sep 12 16:05:05 JST 2024
;; MSG SIZE  rcvd: 219

Error Log

unbound[1386]: [1386:1] info: failed to prime trust anchor -- could not fetch DNSKEY rrset . DNSKEY IN 

If Unbound is restarted before the trust anchor expires, it will not be able to resolve domain.
The condition for this event to occur was when RPZ was enabled. The update was possible when RPZ was disabled.
In addition to RPZ, the following settings are in place where DNSSEC is concerned.

	module-config: "respip validator iterator"
	auto-trust-anchor-file: "/var/unbound/root.key" 

Also, I have verified that the root.key permissions are set appropriately.

@wcawijngaards
Copy link
Member

It seems this is not really about trust anchor update, but about priming the root key? Because that is what the error says. But the issue talks about trust anchor update failure, where is that update happening, or was that a terminology mistake?

Of course unbound needs a working trust anchor, and for the root that means for all domain names, that are not covered by other trust anchors or domain insecure points. The error says validation of the trust anchor fails, saying it could not fetch the root DNSKEY RRset. And the issue is that this fails when RPZ is enabled. Is RPZ blocking the lookup of the root DNSKEY? That could depend on the contents of the RPZ file. The contents of it with respect to the root DNSKEY lookup, what is that contents?

It may be useful to enable verbosity at a level of 4 or 5, and then log to file, and then have unbound start with RPZ enabled, so that it can catch the error with debug output enabled. That could show in the debug logs, where the lookup of the root DNSKEY RRset is failing. Set rpz-log: yes and perhaps also rpz-log-name: "myrpz" so that unbound logs rpz applications in detail.

@wcawijngaards
Copy link
Member

There have been fixes in the unbound 1.20.0 release, and recent releases, for RPZ processing, in particular in the iterator. So it seems likely that the update causes the RPZ zone to be applied at a different point in the code.

@yukjn4
Copy link
Author

yukjn4 commented Nov 22, 2024

That was a terminological mistake.
Set verbosity to 5 and run dig on www.google.com.
Excerpts of large differences from logs with successful name resolution.

unbound[3543179]: [3543179:0] info: prime trust anchor
unbound[3543179]: [3543179:0] info: generate request . DNSKEY IN
unbound[3543179]: [3543179:0] info: generate keytag query _ta-4f66. NULL IN
unbound[3543179]: [3543179:0] info: generate request _ta-4f66. NULL IN
unbound[3543179]: [3543179:0] info: respip operate: query . DNSKEY IN
unbound[3543179]: [3543179:0] info: validator operate: query . DNSKEY IN
unbound[3543179]: [3543179:0] info: resolving . DNSKEY IN
unbound[3543179]: [3543179:0] info: rpz: applied [XXX.XXX] . rpz-local-data  . DNSKEY IN
unbound[3543179]: [3543179:0] info: validator operate: query . DNSKEY IN
unbound[3543179]: [3543179:0] info: respip operate: query . DNSKEY IN
unbound[3543179]: [3543179:0] info: validator: inform_super, sub is . DNSKEY IN
unbound[3543179]: [3543179:0] info: super is www.google.com. A IN
unbound[3543179]: [3543179:0] info: autotrust process for . DNSKEY IN
unbound[3543179]: [3543179:0] info: failed to prime trust anchor -- could not fetch DNSKEY rrset . DNSKEY IN
unbound[3543179]: [3543179:0] info: validator operate: query www.google.com. A IN
unbound[3543179]: [3543179:0] info: no signer, using www.google.com. TYPE0 CLASS0
unbound[3543179]: [3543179:0] info: prime trust anchor
unbound[3543179]: [3543179:0] info: generate request . DNSKEY IN
unbound[3543179]: [3543179:0] info: generate keytag query _ta-4f66. NULL IN
unbound[3543179]: [3543179:0] info: generate request _ta-4f66. NULL IN
unbound[3543179]: [3543179:0] info: respip operate: query . DNSKEY IN
unbound[3543179]: [3543179:0] info: validator operate: query . DNSKEY IN
unbound[3543179]: [3543179:0] info: resolving . DNSKEY IN
unbound[3543179]: [3543179:0] info: rpz: applied [XXX.XXX] . rpz-local-data  . DNSKEY IN
unbound[3543179]: [3543179:0] info: validator operate: query . DNSKEY IN
unbound[3543179]: [3543179:0] info: respip operate: query . DNSKEY IN
unbound[3543179]: [3543179:0] info: validator: inform_super, sub is . DNSKEY IN
unbound[3543179]: [3543179:0] info: super is www.google.com. A IN
unbound[3543179]: [3543179:0] info: autotrust process for . DNSKEY IN
unbound[3543179]: [3543179:0] info: failed to prime trust anchor -- could not fetch DNSKEY rrset . DNSKEY IN
unbound[3543179]: [3543179:0] info: validator operate: query www.google.com. A IN
unbound[3543179]: [3543179:0] info: no signer, using www.google.com. TYPE0 CLASS0

After the above is repeated several times, the following log is recorded.

unbound[3543179]: [3543179:0] info: Could not establish a chain of trust to keys for . DNSKEY IN

Also, when the unbound service was restarted and the root.key update failed, the following log was recorded.

unbound[28383]: [28383:0] info: autotrust probe . DNSKEY IN
unbound[28383]: [28383:0] info: respip operate: query . DNSKEY IN
unbound[28383]: [28383:0] info: validator operate: query . DNSKEY IN
unbound[28383]: [28383:0] info: resolving . DNSKEY IN
unbound[28383]: [28383:0] info: rpz: applied [XXX.XXX] . rpz-local-data  . DNSKEY IN
unbound[28383]: [28383:0] info: validator operate: query . DNSKEY IN
unbound[28383]: [28383:0] info: respip operate: query . DNSKEY IN
unbound[28383]: [28383:0] info: mesh_run: end 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 1 recursion replies sent, 0 replies dropped, 0 states jostled out
unbound[28383]: [28383:0] info: average recursion processing time 0.328093 sec
unbound[28383]: [28383:0] info: histogram of recursion processing times
unbound[28383]: [28383:0] info: [25%]=0 median[50%]=0 [75%]=0
unbound[28383]: [28383:0] info: lower(secs) upper(secs) recursions
unbound[28383]: [28383:0] info:    0.262144    0.524288 1

What settings in the contents of the rpz file would cause this behavior?

@gthess gthess changed the title Auto trust anchor update fails when RPZ is enabled in Unbound 1.20.0 or later Priming the root key fails when RPZ is enabled in Unbound 1.20.0 or later Nov 22, 2024
@gthess
Copy link
Member

gthess commented Nov 22, 2024

unbound[3543179]: [3543179:0] info: rpz: applied [XXX.XXX] . rpz-local-data  . DNSKEY IN
unbound[3543179]: [3543179:0] info: failed to prime trust anchor -- could not fetch DNSKEY rrset . DNSKEY IN

From the first line we see that Unbound is not allowed to resolve and a local-data reply is coming from the RPZ.
In the second line we see a complain that the validator does not have a DNSKEY record to operate on.

I believe an entry in the RPZ zone matches and returns a local record to the . DNSKEY query.

Do you get a different behavior with the same RPZ file (with the exact same contents) and Unbound 1.19.1?

@yukjn4
Copy link
Author

yukjn4 commented Nov 26, 2024

Using the same RPZ file (with exactly the same contents), the root key priming was successful up to Unbound 1.19.3.
On the other hand, Unbound 1.20.0 or later failed, and I have not been able to figure out the cause.

@gthess
Copy link
Member

gthess commented Nov 26, 2024

Is it possible to share the RPZ file? You can find my email at https://nlnetlabs.nl/people/.

@yukjn4
Copy link
Author

yukjn4 commented Dec 3, 2024

@gthess
Thank you for your support.
I shared the RPZ file via email.

@gthess
Copy link
Member

gthess commented Dec 3, 2024

I believe the problem lies with the TXT record you have at the apex. This means that there are local data for . and then RPZ blocks all queries for . except for the TXT record. Can you remove that record and try again?

@gthess
Copy link
Member

gthess commented Dec 3, 2024

You can easily see if you block . by a quick dig @(unbound server) . NS. That should return the root delegation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants