Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let's Encrypt may be down for maintenance or directoryUrl may be wrong #1

Open
alex996 opened this issue Apr 28, 2021 · 6 comments
Open

Comments

@alex996
Copy link

alex996 commented Apr 28, 2021

Earlier this week (Monday, Apr 26 around 12:30 ET) Let's Encrypt was undergoing maintenance and its ACME v2 URL https://acme-v02.api.letsencrypt.org/directory was returning an error. I have greenlock-express set up with a valid cert (issued in March, expiring in June). I needed to restart Node but I got the following error:

Listening on 0.0.0.0:80 for ACME challenges, and redirecting to HTTPS
Listening on 0.0.0.0:443 for secure traffic
Ready to Serve:
	 demo.example.com
ACME Directory URL: https://acme-v02.api.letsencrypt.org/directory
[debug] Let's Encrypt may be down for maintenance or `directoryUrl` may be wrong
set greenlockOptions.notify to override the default logger
Error cert_order:
Cannot read property 'termsOfService' of undefined
TypeError: Cannot read property 'termsOfService' of undefined
    at fin (/path/node_modules/@root/acme/acme.js:74:23)
    at /path/node_modules/@root/acme/acme.js:95:12
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at Object.greenlock._acme (/path/node_modules/@root/greenlock/greenlock.js:393:9)
    at Object.greenlock._order (/path/node_modules/@root/greenlock/greenlock.js:421:20)
    at Object.greenlock._renew (/path/node_modules/@root/greenlock/greenlock.js:335:9)
    at Object.greenlock.get (/path/node_modules/@root/greenlock/greenlock.js:212:23)

It seems that greenlock pings the ACME endpoint every 1 hour, is that correct? From @root/greenlock/greenlock.js:387:

var dir = caches[dirUrl];
// don't cache more than an hour
if (dir && Date.now() - dir.ts < 1 * 60 * 60 * 1000) {
    return dir.promise;
}

await acme.init(dirUrl).catch(function(err) {
    // TODO this is a special kind of failure mode. What should we do?
    console.error(
        "[debug] Let's Encrypt may be down for maintenance or `directoryUrl` may be wrong"
    );
    throw err;
});

I don't fully understand the intent here but my question is - if the cert is still valid (in my case, it's expiring in June), a. why is it necessary to ping the ACME endpoint, and b. why does this ping prevent the Node server from starting (again, despite a valid cert)?

Expected: given a valid cert, greenlock should start the Node server.
Actual: given a valid cert, greenlock fails to start because ACME v2 endpoint is unavailable.

Packages:

  • @root/greenlock v4.0.5
  • @root/acme v3.1.0
  • @root/greenlock-express v4.0.4

Thank you.

@coolaj86
Copy link
Contributor

Why is it necessary to ping the ACME endpoint?

Fail early. If someone is starting the server with incorrect settings, we want them to know right away.

It seems that greenlock pings the ACME endpoint every 1 hour, is that correct?

No. It caches the directory URL so that it doesn't fetch it again for at least an hour (as opposed to every time it's needed).

Why does this ping prevent the Node server from starting (again, despite a valid cert)?

// TODO this is a special kind of failure mode. What should we do?

"In the face of ambiguity, refuse the temptation to guess."

I think that it would be reasonable to make the default behavior to log the error and to continue rather than throw, now that the use case is better understood.

@alex996
Copy link
Author

alex996 commented Apr 28, 2021

Thanks. IIUC, if we remove this throw statement:

// @root/greenlock/greenlock.js:393
await acme.init(dirUrl).catch(function(err) {
    // TODO this is a special kind of failure mode. What should we do?
    console.error(
        "[debug] Let's Encrypt may be down for maintenance or `directoryUrl` may be wrong"
    );
    // throw err; // <--- this
});

and the call to ACME v2 does fail, then the metadata won't be initialized:

// @root/acme.js:69
me.init = function (opts) {
// ...
    function fin(dir) {
      me._directoryUrls = dir; // <--- this won't run
      me._tos = dir.meta.termsOfService; // <--- and this
      return dir;
    }

Which means acme._orderCert will need to call init again:

// @root/acme.js:1145
ACME._orderCert = function (me, options, kid) {
// ...
    return U._jwsRequest(me, {
        url: me._directoryUrls.newOrder, // <--- this will be missing

Alternatively, we can ping ACME v2 periodically (every 1 hour?) until it is back up. That said, I'm not sure if me._directoryUrls and me._tos are used elsewhere as well.

I think I get the general idea, so I can write up a PR if this makes sense.

@mikealeonetti
Copy link

Is there a setting that would allow the server to start even though let's encrypt API is down for maintenance? I did have a valid cert also and had to restart node and now the server is just down. Would love to prevent this in the future.

@eloquence
Copy link

This appears to be biting me today during an LE outage - had to restart Node for unrelated reasons and now the site is just down. Definitely would be nice for this module to handle such situations more gracefully.

@eloquence
Copy link

While the main API endpoint is down I was able to bring my server back up by temporarily switching to the staging API directory endpoint (since the cert is still valid this did not appear to have any unintended side effects, for now).

@coolaj86
Copy link
Contributor

I'm convinced that this is a problem that needs to be solved. Would someone like to make a PR, test it, and ping me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants