Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: delete intermediate cert if blank cert was created due to storage space #6667

Closed

Conversation

Botelho31
Copy link
Contributor

@Botelho31 Botelho31 commented Oct 29, 2024

For #4897

@francislavoie
Copy link
Member

Thanks!

It doesn't fully close #4897 actually, because this only fixes the acme_server PKI usecase, it doesn't resolve the case where Caddy failed to store data from Let's Encrypt.

@Botelho31
Copy link
Contributor Author

Ok, i will try to add the Let`s encrypt use case

@Botelho31
Copy link
Contributor Author

After looking more closely ( apologizing in advance if im wrong ), the error in the issue is pertaining to the intermediate certificate on PKI only, the actual leaf certificate creation and storage is all in certmagic and couldn`t be actually fixed in this codebase. The second error is happening on purpose as the user actually ran out of storage before the storage check for the leaf certificate.

@mholt
Copy link
Member

mholt commented Oct 30, 2024

I think a more general solution would be preferred.

I'm surprised an error is not returned when storage is full? Are there nothing in the logs when that happens? If not, that's concerning indeed, since it means we can't easily detect when storage is full. I would expect a failed write to give an error.

If an error is returned, we can probably delete the file that is empty/corrupt.

@Botelho31
Copy link
Contributor Author

The error returned in the certmagic.storage.Store function if the disk is full isn't very informative. It returns an os.PathError i believe with only 'write [CERT PATH]' in the log. I could give the general solution a try if you have a suggestion.

@mholt
Copy link
Member

mholt commented Oct 30, 2024

PathError? That's weird. Do you have any actual logs from this happening?

But in any case, I wonder if any error on Write() (in Store()) should necessitate a deletion of the file... 🤔

@Botelho31
Copy link
Contributor Author

Botelho31 commented Oct 30, 2024

The problem in Store i believe is that if there isnt enought space it cancels the writing of the file, but as the file was created beforehand it leaves an empty file ( in this case a cert ). In any case all of these problems occur as we don`t have knowledge of storage space in PKI module during creation.

In any case should we really leave a blank intermediate certificate? I have to admit i don`t know the full implications ( if there is any handling for corrupt/blank intermediate certificates ).

@mholt
Copy link
Member

mholt commented Oct 30, 2024

Oh, and after looking closer, PathError is just a container for a path + error. So a "path error" isn't the error itself, but it should contain an actual error with more information.

I'm not suggesting we leave a blank or corrupted file. When Write has an error, maybe we should delete the file so as not to give Caddy the wrong idea (that the file is valid).

What was the full error / log?

@Botelho31
Copy link
Contributor Author

Botelho31 commented Oct 31, 2024

I have created a minimal setup that errors at the initial setup of the PKI, here is the error that is actually thrown then:

Error: loading initial config: loading new config: loading http app module: provision http: getting tls app: loading tls app module: provision tls: provisioning automation policy 0: loading TLS automation management module: position 0: loading module 'internal': provision tls.issuance.internal: loading pki app module: provision pki: provisioning CA 'local': generating new intermediate cert: saving intermediate key: write /data/caddy/pki/authorities/local/intermediate.key: no space left on device

I could try to make it throw this error as it is going to renew the certificate if it`s necessary

Edit:

Forgot to add the actual link to the setup here: PoC

@mholt
Copy link
Member

mholt commented Oct 31, 2024

Ah, perfect: no space left on device - thanks for the effort!

That log gives us a good trace of the error to be able to implement a failover where we delete the file when a write fails, if it did not exist before.

Although... @elee1766 question for you: does the new way of writing to the file storage make this a non-issue? i.e. writing the new payload would fail, the temporary file would get cleaned up, and the existing file would continue to exist without corruption, right?

@elee1766
Copy link
Contributor

elee1766 commented Oct 31, 2024

Ah, perfect: no space left on device - thanks for the effort!

That log gives us a good trace of the error to be able to implement a failover where we delete the file when a write fails, if it did not exist before.

Although... @elee1766 question for you: does the new way of writing to the file storage make this a non-issue? i.e. writing the new payload would fail, the temporary file would get cleaned up, and the existing file would continue to exist without corruption, right?

yeah. it should fail on syncing the temp file, and then never run the rename, therefore leaving the existing cert as is

or maybe even fail on creating the temp file if you have no space for another inode

@francislavoie
Copy link
Member

francislavoie commented Oct 31, 2024

Thanks @elee1766

@Botelho31 What version were you trying that test with intermediate with? Was it with v2.8.4 or with the latest from master? Could you try master then to see if you can actually replicate it?

@Botelho31
Copy link
Contributor Author

I was using 2.8.4 as it was latest in docker ( i believe ). I will try to check latest master version today, but if the exchange of intermediate certificates doesn't leave a blank file anymore ( as it does the whole renaming process ) then i guess the actual problem is fixed.

@mholt
Copy link
Member

mholt commented Oct 31, 2024

Yep, I'm guessing that @elee1766's upstream enhancement for how files are written (which master is now using) is probably the fix for this.

@Botelho31
Copy link
Contributor Author

I have just tested the master changes, this doesn`t appear to be happening anymore! Should the original issue be closed too?

@francislavoie
Copy link
Member

Guess so! Thanks for confirming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants