Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCTP won't work on older kernel #9477

Open
RoadRunnr opened this issue Feb 24, 2025 · 5 comments
Open

SCTP won't work on older kernel #9477

RoadRunnr opened this issue Feb 24, 2025 · 5 comments
Assignees
Labels
not a bug Issue is determined as not a bug by OTP team:PS Assigned to OTP team PS

Comments

@RoadRunnr
Copy link
Contributor

Describe the bug

Erlang SCTP support build on Linux > 5.4 kernels will not work on Linux kernels < 5.5

Strictly speaking, this is not an Erlang bug but a side effect of how the Linux kernels ABI works. One might argue that Linux is breaking its ABI compatibility promise, but the situation is complex.

The SCTP user API is defined by set of C structs. Kernel commit torvalds/linux@b6e6b5f increases the size of the sctp_event_subscribe structure. The kernel API checks that the size of the structs passed into setsockopt is smaller or equal that the struct size.
The struct size is compiled into the user space binary and applications build on later kernels will have a larger struct size for sctp_event_subscribe, but the kernel API does not allow sizes that exceed its expectations.

As a result, beam VMs build with SCTP user space API headers from a Linux 5.5+ kernel will use sctp_event_subscribe structure size that is rejected by per 5.5 kernels, breaking SCTP for those build on per 5.5 kernels.

The problem is most notably when attempting to use containers images on cloud instance that use pre 5.5 kernels.

#5442 run into this exact problem, but failed to diagnose the root cause.

To Reproduce
Use a beam VM build with recent kernel headers on a Linux 5.4 kernel.

Expected behavior
Should work.

Affected versions
Any build with Linux 5.5+ headers

@RoadRunnr RoadRunnr added the bug Issue is reported as a bug label Feb 24, 2025
@RoadRunnr
Copy link
Contributor Author

One idea to handle this, any place that does a setsockopt on sctp_event_subscribe could to a getsockopt first to determine the struct size that the kernel can handle. And maybe do it once only and cache the result.

@jhogberg
Copy link
Contributor

Thanks for your report!

One might argue that Linux is breaking its ABI compatibility promise, but the situation is complex.

Their only promise is that things built for old versions will continue to work in newer versions, not that things built for newer versions will work in older ones.

Trying to resolve this at runtime is bound to be hacky. If you want your binary to work on 5.4 or below, you ought to compile with the headers of the earliest version that you wish to support. It shouldn't be too difficult as most distributions let you install these headers as a separate package.

@jhogberg jhogberg added team:PS Assigned to OTP team PS not a bug Issue is determined as not a bug by OTP and removed bug Issue is reported as a bug labels Feb 24, 2025
@RoadRunnr
Copy link
Contributor Author

It is noteworthy that a very similar issue has been discussed in #4176 in 2020.

@jhogberg thanks for the official statement and I do completely agree.

However, I would guess that most people are bitten by this problem when they try to use the official Erlang and Elixir container images on cloud instance that have older kernels (Linux 5.4 still seems to be the default on some clouds). So maybe adding a hint or warning in the documentation of gen_sctp and socket might be warranted?

@RoadRunnr
Copy link
Contributor Author

Also, it is noteworthy that at least one other project choosed to support a similar workaround to what was proposed in #4176

@RoadRunnr
Copy link
Contributor Author

RoadRunnr commented Feb 24, 2025

ok, I promise that is my last comment (for the moment)...

@jhogberg RFC-6458 does noticed that this event API has a binary compatibility problem https://www.rfc-editor.org/rfc/rfc6458.html#section-6.2.2 and added a new API for the event subscription. That API was added to Linux 5.0 (apparently).

It would be great if that API would be supported by Erlang and I guess that converts this issue into a wishlist item for that.

@RaimoNiskanen RaimoNiskanen self-assigned this Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not a bug Issue is determined as not a bug by OTP team:PS Assigned to OTP team PS
Projects
None yet
Development

No branches or pull requests

3 participants