Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache responding server in case of primary one is unavailable #6

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vchlum
Copy link
Contributor

@vchlum vchlum commented Aug 7, 2024

Problem Description:

If the primary server is unavailable, the client always tries to connect to the primary one first. We could save some time and use the responding server (e.g. the secondary one) for a short time as the primary server.

Change Description:

In this change, a cache file is used to store the responding server. It is stored only if the responding server is not the primary one. The servers are sorted in order to have the responding server in the first position. The original order of servers is restored after a configurable cache timeout. The default cache timeout is 30 seconds. The cache timeout can be changed in /etc/ktb5.conf in the section [appdefaults] via the variable krb525_cache_timeout. The value is in seconds.

Testing:

  • Testing the cache works:
(BOOKWORM)root@torque1:~# cat /etc/krb5.conf | grep krb525_server
        krb525_server = kdc1.anonym
        krb525_server = kdc2.anonym
(BOOKWORM)root@torque1:~# iptables -A INPUT --src kdc1.anonym -p tcp --sport 6565 -j DROP
(BOOKWORM)root@torque1:~# time /usr/bin/krb525_renew vchlum@META && time /usr/bin/krb525_renew vchlum@META
Type: Kerberos
Valid until: 1723100309
doICPDC<anonymized>qAjAA

real	0m5.532s
user	0m0.000s
sys	0m0.010s
Type: Kerberos
Valid until: 1723100314
doICPDC<anonymized>qAjAA

real	0m0.295s
user	0m0.007s
sys	0m0.000s
  • Testing the cache deletes and order of server is restored after the cache timeout:
(BOOKWORM)root@torque1:~# cat /etc/krb5.conf | grep krb525
        krb525_server = kdc1.anonym
        krb525_server = kdc2.anonym
    krb525_cache_timeout = 10
(BOOKWORM)root@torque1:~# time /usr/bin/krb525_renew vchlum@META && sleep 5 && time /usr/bin/krb525_renew vchlum@META && sleep 5  && time /usr/bin/krb525_renew vchlum@META
Type: Kerberos
Valid until: 1723100766
doICPDC<anonymized>qAjAA

real	0m5.500s
user	0m0.005s
sys	0m0.005s
Type: Kerberos
Valid until: 1723100776
doICPDC<anonymized>qAjAA

real	0m0.293s
user	0m0.004s
sys	0m0.004s
Type: Kerberos
Valid until: 1723100782
doICPDC<anonymized>qAjAA

real	0m5.576s
user	0m0.005s
sys	0m0.005s
(BOOKWORM)root@torque1:~# 

@kouril
Copy link
Member

kouril commented Nov 19, 2024

Thanks for the contribution. A couple of observations is below, I'm not sure to what extent they pose a problem in our deployment. Anyway, here we go:

  • a static cache filename is used (/tmp/krb525_endpoint.cache). It would break the caching for processes running under different uid's.
  • the access() and remove()/fopen() pairs open a race condition, when the process may delete/open another file than checked (it should be harmless but could have some operations implications, esp. the fopen() part and relevant check).
  • the access to the file isn't synchronized/locked, I'm not sure how much we could rely on small data written in the cache (if we can assume the operation will be almost atomic).
  • the cache file content can be crafted by another (potentially malicious) process/user and loaded/used by the library. I don't see an immediate bad effect/exploit, but it's potentially dangerous.

Didn't you consider keeping the cache individually per a uid (and keep in something like /var/run/user/UID)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants