Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidy up experimental RDMA feature #131

Merged
merged 5 commits into from
Jan 19, 2025
Merged

Conversation

garlick
Copy link
Member

@garlick garlick commented Jan 19, 2025

Problem: RDMA doesn't compile or even begin to run.

Fix some obvious problems with RDMA, short of actually testing it end to end.

The feature is opt-in in configure and is not completely implemented (no config file support for example) so tag it in the configure help as "experimental".

At least the next person that wants to work on it won't have a completely broken starting point.

Problem: diod_rdma.c does not compile

When internal include paths changed recently, diod_rdma.c was not updated.

Update include paths.
Problem: when --enable-rdmatrans is explicitly requested, failure to
configure it is non-fatal.

Make it fatal if the feature cannot be enabled.

Also rename "rdmatrans" to "rdma" and label it experimental in the
configure help output since it is not production ready.  For example,
if configured, it unconditionally tries to listen on a hard wired
address.
Problem: some packages that are no longer required are installed in CI.

Drop libpopt and tcp wrappers.
Problem: the RDMA code is not built in CI so it is subject to rot.

Add a build-only check.
Problem: when built with --enable-rdma, diod fails to initialize
with the following error:

  # diod -E
  diod: waiting on rdma connection
  diod: rdma_get_cm_event: Unknown error -1

and dmesg contains

  [3890201.976178] ucma_write: process 2106336 (diod) changed security contexts
    after opening file descriptor, this is not allowed

Initializng rdma later, after user transitions, seems to avoid this
particular problem at initialization.

Fixes chaos#107
Copy link
Member

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@garlick
Copy link
Member Author

garlick commented Jan 19, 2025

Thanks!

@mergify mergify bot merged commit dca6145 into chaos:master Jan 19, 2025
6 checks passed
@garlick garlick deleted the rdma_buildfix branch January 19, 2025 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants