Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helper server error return code does not fail the test for core = app #440

Open
nshy opened this issue Jul 1, 2024 · 1 comment
Open

Comments

@nshy
Copy link
Contributor

nshy commented Jul 1, 2024

It hides memory leak/unexpected panic/crashes/failed assertions.

Repro. Let's make next changes:

diff --git a/src/main.cc b/src/main.cc
index 7dd4f537d..c9de9163f 100644
--- a/src/main.cc
+++ b/src/main.cc
@@ -729,6 +730,8 @@ print_help(FILE *stream)
        fprintf(stream, help_msg, tarantool_version());
 }

+int tarantool_do_panic = 0;
+
 int
 main(int argc, char **argv)
 {
@@ -1101,5 +1104,8 @@ main(int argc, char **argv)
        free((void *)instance.name);
        free((void *)instance.config);
        tarantool_free();
+       if (tarantool_do_panic)
+               panic("AAAAAAA!!!!");
+
        return exit_code;
 }
diff --git a/extra/exports b/extra/exports
index 1dbfedc5c..78db2513b 100644
--- a/extra/exports
+++ b/extra/exportss
@@ -679,3 +679,5 @@ luaM_sysprof_stop
 luaopen_misc

 # }}} LuaJIT API
+
+tarantool_do_panic
diff --git a/test/box/tiny.lua b/test/box/tiny.lua
index 608d48366..2b30f6dfd 100644
--- a/test/box/tiny.lua
+++ b/test/box/tiny.lua
@@ -13,3 +13,7 @@ require('console').listen(os.getenv('ADMIN'))
 box.once('init', function()
     box.schema.user.grant('guest', 'read,write,execute', 'universe')
 end)
+
+local ffi = require('ffi')
+ffi.cdef('int tarantool_do_panic;')
+ffi.C.tarantool_do_panic = 1

The test passes:

test/test-run.py --force --builddir ../build-dev box-tap/session.test
======================================================================================
WORKR TEST                                            RARAMS              RESULT
--------------------------------------------------------------------------------------
[001] box-tap/session.test.lua                                            [ pass ]
--------------------------------------------------------------------------------------

Yet the helper server panics as we planned:

$ grep AAAA /tmp/t/001_box-tap/tiny.log
2024-07-01 11:04:13.672 [50651] main F> AAAAAAA!!!!
2024-07-01 11:04:13.672 [50651] main F> AAAAAAA!!!!

In particular such behaviour may hide memory leak issues under ASAN CI.

I tried to prepare a patch for the issue and found the cause. The thing is gevent does not work properly together with subprocess and multiprocess packages. In particular in test-run on helper server error exit code we actually get 0 error code from subprocess.Popen.returncode in

while self.process.returncode is None:
self.process.poll()
if self.process.returncode is None:
gevent.sleep(0.1)

In more details we run main server:

def run_server(execs, cwd, server, logfile, retval, test_id):
os.putenv("LISTEN", server.listen_uri)
with open(logfile, 'ab') as f:
server.process = Popen(execs, stdout=sys.stdout, stderr=f, cwd=cwd)
sampler.register_process(server.process.pid, test_id, server.name)
test_timeout = Options().args.test_timeout
timer = Timer(test_timeout, timeout_handler, (server.process, test_timeout))
timer.start()
retval['returncode'] = server.process.wait()
timer.cancel()
server.process = None

with gevent.Popen. Unlike to subprocess.popen it tracks SIGCHILD signal and reaps ALL processes when receive it. In particular it reaps helper server process. So when we poll for it exit code we cannot find the process and subprocess decides to just report success.

See also #416

@Totktonada
Copy link
Member

There are some findings around this problem in #252.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants