-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a bug that caused the controller to hang indefinitely on exit. #113
base: main
Are you sure you want to change the base?
Conversation
During the teardown process (i.e when issuing quit() command), the controller tries to close all resources including the StreamChannel (StreamChannel is started on controller initaliazation). StreamChannel events are handled by separated thread, which listens for gRPC messages/replies from the P4Agent, when there is no requests, the StreamChannel blocks, which in turn blocks the thread. As part of the teardown, the controller join() the StreamChannel thread, which does not exit as there are no requests from the P4Agent and thus the controller hangs forever. The fix uses TimeoutIterator to query StreamChannel events with given timeout, if no events are received the loops continues and thus can check for program termination This commit adds dependecy on 'iterators' module that implments the TimeoutIterator Signed-off-by: Emil Khshiboun <[email protected]>
I am not able to reproduce this bug, so could you share some details? When running These are the steps I used:
I tried the steps above a couple of times. antonin@ubuntu-22:~$ simple_switch_grpc --no-p4 --log-console
Calling target program-options parser
Server listening on 0.0.0.0:9559
[14:03:55.511] [bmv2] [I] [thread 7705] Starting Thrift server on port 9090
[14:03:55.512] [bmv2] [I] [thread 7705] Thrift server was started
[14:03:57.481] [bmv2] [D] [thread 7723] Set default default entry for table 'ipv4_lpm': NoAction -
[14:03:57.481] [bmv2] [D] [thread 7723] Set default default entry for table 'forward': NoAction -
[14:03:57.481] [bmv2] [D] [thread 7723] Set default default entry for table 'send_frame': NoAction -
[14:03:57.482] [bmv2] [D] [thread 7723] simple_switch target has been notified of a config swap
[14:04:00.392] [bmv2] [D] [thread 7724] Set default default entry for table 'ipv4_lpm': NoAction -
[14:04:00.392] [bmv2] [D] [thread 7724] Set default default entry for table 'forward': NoAction -
[14:04:00.392] [bmv2] [D] [thread 7724] Set default default entry for table 'send_frame': NoAction -
[14:04:00.393] [bmv2] [D] [thread 7724] simple_switch target has been notified of a config swap (venv) antonin@ubuntu-22:~$ python3 -m p4runtime_sh --device-id 0 --election-id 0,1 --config simple_router.proto.txt,simple_router.json
*** Welcome to the IPython shell for P4Runtime ***
P4Runtime sh >>> quit
(venv) antonin@ubuntu-22:~$ python3 -m p4runtime_sh --device-id 0 --election-id 0,1 --config simple_router.proto.txt,simple_router.json
*** Welcome to the IPython shell for P4Runtime ***
P4Runtime sh >>> quit
(venv) antonin@ubuntu-22:~$ |
Hi Anton, I might be missing something here, but I could not find any information on the p4 specification regarding client disconnecting, what do you mean the p4server implemented incorrectly ? From the Controller code in tear_down(), no message/signal is sent to the p4 server, so the server can not know when the client disconnected (correct me if i am wrong) Here is my output: (running with -v)
The above stuck forever |
That could explain it. Probably a bug in the implementation of the server.
I assume you mean P4Runtime specification here, and not P4 specification. The reference P4Runtime server (C++) can be found here: https://github.com/p4lang/PI/blob/main/proto/server/pi_server.cpp In my experience, it is easier to use synchronous gRPC, while asynchronous APIs require more care when dealing with streams. I don't know which one you are using in your custom P4 server. |
During the teardown process (i.e when issuing quit() command), the controller tries to close all resources including the StreamChannel (StreamChannel is started on controller initaliazation).
StreamChannel events are handled by separated thread, which listens for gRPC messages/replies from the P4Agent, when there is no requests, the StreamChannel blocks, which in turn blocks the thread. As part of the teardown, the controller join() the StreamChannel thread, which does not exit as there are no requests from the P4Agent and thus the controller hangs forever.
The fix uses TimeoutIterator to query StreamChannel events with given timeout, if no events are received the loops continues and thus can check for program termination
This commit adds dependecy on 'iterators' module that implments the TimeoutIterator