Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exit proc_lib gracefully on proxy header mismatch #1510

Conversation

NelsonVides
Copy link
Contributor

If the proxy header was not to match (like, the header not being sent),
cowboy libraries were expecting really strict matching and then crashing
the proc_lib with a badmatch, that generates a hard error report. I
believe this error should be expected and termination should be graceful.

@essen
Copy link
Member

essen commented Apr 20, 2021

Why wouldn't you want this to be logged?

@NelsonVides
Copy link
Contributor Author

Why wouldn't you want this to be logged?

Put it another way, why would you do so just here? I think that the cause of death should be handled elsewhere and be given a chance to decide what to do with it. For example, I believe ranch already logs this.

@essen
Copy link
Member

essen commented Apr 20, 2021

You've made it into a shutdown, shutdowns aren't logged: https://github.com/ninenines/ranch/blob/master/src/ranch_conns_sup.erl#L388

The proxy header missing is most likely a configuration error (on the proxy side), it should provide as much information as possible about the problem.

@NelsonVides
Copy link
Contributor Author

You've made it into a shutdown, shutdowns aren't logged: https://github.com/ninenines/ranch/blob/master/src/ranch_conns_sup.erl#L388

The proxy header missing is most likely a configuration error (on the proxy side), it should provide as much information as possible about the problem.

Ah, that's the ranch line I was looking for. But then the issue is that the same error is logged twice, once as a crash report for proc_lib, another from Ranch's supervisor, isn't it?

@NelsonVides
Copy link
Contributor Author

I have no idea what broke so much in the CI but wow how red everything is. Nevertheless I'm stuck wondering, how to deal with the crash reported twice (by proc_lib and ranch respectively).

@essen
Copy link
Member

essen commented Apr 21, 2021

That's just the way it's done, you get a crash report and a supervisor report. Nothing unusual, you get the same for gen_*+supervisor as well.

@NelsonVides
Copy link
Contributor Author

You've made it into a shutdown, shutdowns aren't logged: https://github.com/ninenines/ranch/blob/master/src/ranch_conns_sup.erl#L388

The proxy header missing is most likely a configuration error (on the proxy side), it should provide as much information as possible about the problem.

What about the case when the port is only being probed, that is, a load-balancer is only doing a tcp handshake and closing the connection, without sending any traffic whatsoever?

@essen
Copy link
Member

essen commented May 6, 2021

Is there a deployment that has the combination of PROXY and such load balancer?

@NelsonVides
Copy link
Contributor Author

Is there a deployment that has the combination of PROXY and such load balancer?

Obscure swiss providers, and even some older offerings of AWS it seems to me, but I can't bet on it, I'm not in charge of the devops for the project I'm working on 😕

@essen
Copy link
Member

essen commented May 7, 2021

I'm happy to make it handle, say, {error, closed} if that's what happens. But other errors should probably keep crashing as they're likely a misconfiguration.

@NelsonVides
Copy link
Contributor Author

I'm happy to make it handle, say, {error, closed} if that's what happens. But other errors should probably keep crashing as they're likely a misconfiguration.

That sounds fair, I agree with that one. Pushed a new version trying to do just that.

@essen
Copy link
Member

essen commented May 7, 2021

Looks fine. Could you please add a test though? test/proxy_header_SUITE.erl should be the place.

Also can you please confirm that in your environments you do get {error,closed} before I merge this? It should be in the verbose logs you mentioned before.

@NelsonVides NelsonVides force-pushed the proxy_protocol_graceful_failure branch from 85b05de to e380d05 Compare May 8, 2021 16:15
@NelsonVides
Copy link
Contributor Author

I've just tried to write a test for it. Will deploy this in my env in the next few days and will let you know, in the meantime please let me know if the tests more or less follow cowboy's conventions.

@NelsonVides
Copy link
Contributor Author

@essen tested in my environments, I do get {error, closed} and logs aren't dirty with crashes.

@essen
Copy link
Member

essen commented May 12, 2021

OK thanks. It looks fine, I will just need to find time to look into it properly. It's either going to be tomorrow or in weeks. Hopefully tomorrow!

@essen
Copy link
Member

essen commented May 13, 2021

Please don't add a new test group and instead make the test against all http/https/h2/h2c instead so we know it's working for both TCP and TLS and all protocols.

@NelsonVides
Copy link
Contributor Author

Please don't add a new test group and instead make the test against all http/https/h2/h2c instead so we know it's working for both TCP and TLS and all protocols.

Easy to change, just did it locally, but then it seems that some tests are failing randomly I suppose because of some race condition.

@essen
Copy link
Member

essen commented May 13, 2021

Make sure it connects to the right host/port and if you can't figure it out please post the errors you see.

@NelsonVides NelsonVides force-pushed the proxy_protocol_graceful_failure branch from e380d05 to 55efbb1 Compare May 13, 2021 09:56
@NelsonVides
Copy link
Contributor Author

Just pushed the test, it generally fails the https group with a timeout when waiting for the 'DOWN' message.

@NelsonVides
Copy link
Contributor Author

Right, the Pid that ct_helper:get_remote_pid_tcp/1 is not the same than the socket is listening to, so it doesn't receive the DOWN message. This happens only for the https and h2 groups, seems now deterministic.

@essen
Copy link
Member

essen commented May 13, 2021

There's a separate function for TLS.

@NelsonVides
Copy link
Contributor Author

There's a separate function for TLS.

Just saw that, but that won't work, because it's doing ssl:sockname, and this socket is not yet a ssl socket, it's a barebones tcp:connect one.

@NelsonVides
Copy link
Contributor Author

@essen now, I've pushed a new version of the test that does a bit of a hack, hopefully elegant, to correctly find the process we want to track the dead of. Hope you like the code, I await your comments :)

@NelsonVides
Copy link
Contributor Author

@essen did you have a chance to review the latest changes? :)

@essen
Copy link
Member

essen commented May 20, 2021

No but the tab is still open for whenever this can happen.

@NelsonVides
Copy link
Contributor Author

No but the tab is still open for whenever this can happen.

Hi there, it's been a while, how's progress on this one? :)

@NelsonVides
Copy link
Contributor Author

@essen heads up on this, any updates, plans?

@NelsonVides
Copy link
Contributor Author

@essen it has been a while with no activity here, any updates on when, and if, this could be merged?

@essen
Copy link
Member

essen commented Nov 19, 2021

When I have some free time. I think the PR is fine as-is by the way but I need to test it.

@essen essen added this to the 2.11 milestone Nov 23, 2023
@essen
Copy link
Member

essen commented Dec 5, 2023

Please rebase onto current master so that CI runs properly. I am planning to do a release with this and other things soon.

If the proxy header was not to match (like, the header not being sent),
cowboy libraries were expecting really strict matching and then crashing
the proc_lib with a badmatch, that generates a hard error report. I
believe this error should be expected and termination should be graceful.
@NelsonVides NelsonVides force-pushed the proxy_protocol_graceful_failure branch from 3ea2ba9 to 150cde9 Compare December 5, 2023 15:39
@NelsonVides
Copy link
Contributor Author

Please rebase onto current master so that CI runs properly. I am planning to do a release with this and other things soon.

Done.

@NelsonVides
Copy link
Contributor Author

It's been quite a while since I worked on this and now this test seems to be failing in CI, however I can't even ct running locally. Do you have a guide somewhere on how to run tests? I'd like to fix this issue but I can't find how.

@essen
Copy link
Member

essen commented Dec 6, 2023

Should be just make ct but otherwise check https://erlang.mk it has plenty of documentation.

It's likely a race condition in the test itself.

I am going through a number of tickets/PRs so if you can work on this, great! If not, also great, I'll just pick it up when I get to it. Cheers.

@essen
Copy link
Member

essen commented Dec 21, 2023

I've merged with minimal changes to the tests. Closing, thanks!

@essen essen closed this Dec 21, 2023
@NelsonVides NelsonVides deleted the proxy_protocol_graceful_failure branch December 21, 2023 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants