Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: retry failed ssl renewal with jittered backoff 🐛 #349

Closed

Conversation

meysam81
Copy link
Contributor

@meysam81 meysam81 commented Sep 23, 2023

fixes #338

For your reference, you can see the templated file below:

#!/bin/bash
#
# This file is managed by Ansible. Only change it through the rust-lang/simpleinfra repository.
#
set -euo pipefail
IFS=$'\n\t'

# During the initial renew nginx will fail to run as the certificate is
# missing, so lego will have to serve port 80.
if [[ $# -eq 1 ]] && [[ $1 = "initial-renew" ]]; then
    renew_kind="--http.port=:80"
    action="run"
else
    renew_kind="--http.webroot=/var/run/acme-challenges"
    action="renew"
fi

# Use the local Pebble instance to get a dummy certificate
server="https://localhost:14000/dir"
# Use Pebble's CA to validate connections to it
export LEGO_CA_CERTIFICATES=/etc/pebble/certs/pebble.minica.pem
# Remove the account as Pebble doesn't persist it
rm -rf "/etc/ssl/letsencrypt/accounts/localhost_14000/[email protected]"

retries="10"
wait_time="5"

lego_cmd="lego --email '[email protected]' \
        --server '${server}' \
        --accept-tos \
        --path /etc/ssl/letsencrypt \
        --http \
        ${renew_kind} \
        -d 'dummy' \
        ${action}"

function run_with_retries {
    command=$1
    local i=0
    while true; do
        ${command}
        exit_code=$?

        if [ ${exit_code} -eq 0 ]; then
            break
        fi

        if [ ${i} -ge ${retries} ]; then
            exit ${exit_code}
        fi

        jitter=$(($RANDOM % 10))
        wait_time_with_jitter=$((${wait_time} + ${jitter}))

        echo "Command failed with exit code ${exit_code}. Retrying in ${wait_time_with_jitter} seconds..."
        sleep ${wait_time_with_jitter}

        i=$(($i + 1))
    done
}

set +e
run_with_retries $lego_cmd
set -e

sudo /etc/ssl/letsencrypt/after-renew

Running the executable on the machine gave me the output you can see in the screenshot below.

failing-but-retrying-with-jitter-image

@meysam81
Copy link
Contributor Author

@jdno Please have a look and let me know if there's anything missing 🙏

@Mark-Simulacrum
Copy link
Member

Can we structure this implementation to use retries at the systemd level (https://github.com/rust-lang/simpleinfra/blob/master/ansible/roles/letsencrypt/templates/renew-ssl-certs.service)? I feel like that might be less ad-hoc in terms of implementation and a little cleaner than the current solution.

@shepmaster
Copy link
Member

Superseded by #393

@shepmaster shepmaster closed this Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry renewal of SSL certificates
3 participants