Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test test_task_arena times out on Apple Silicon #952

Open
peteschroed opened this issue Nov 3, 2022 · 6 comments
Open

Test test_task_arena times out on Apple Silicon #952

peteschroed opened this issue Nov 3, 2022 · 6 comments

Comments

@peteschroed
Copy link

Encountered this issue while running tests on an M1 Macbook Pro against oneTBB 2021.7.

Build arm64:

cmake .. -DCMAKE_OSX_ARCHITECTURES=arm64
cmake –build .

Run test 100x:
ctest --repeat-until-fail 100 . -R test_task_arena --output-on-failure --timeout 60

Result:
Test times out after ~20 iterations.

ctest --repeat-until-fail 100 . -R test_task_arena --output-on-failure --timeout 60
Test project /Users/pete.schroeder/p4/oneTBB_2021_7/build
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    2.26 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    1.80 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    1.89 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    1.77 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    1.80 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    1.78 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    1.75 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    1.78 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    1.77 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    1.79 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    1.78 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    1.83 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    1.78 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    1.79 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    1.76 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................   Passed    1.78 sec
    Start 25: test_task_arena
    Test #25: test_task_arena ..................***Timeout  60.06 sec
TBB Warning: The number of workers is currently limited to 9. The request for 19 workers is ignored. Further requests for more workers will be silently ignored until the limit changes.



0% tests passed, 1 tests failed out of 1

Total Test time (real) =  89.20 sec

The following tests FAILED:
	 25 - test_task_arena (Timeout)
Errors while running CTest
@pavelkumbrasev
Copy link
Contributor

Could you please provide threads call stacks?

@phprus
Copy link
Contributor

phprus commented Nov 4, 2022

@pavelkumbrasev
Maybe this is another issue with utils::SpinBarrier?
(#756 (comment))

@pavelkumbrasev
Copy link
Contributor

Yes, I think the same.

@peteschroed
Copy link
Author

Here's the debug/thread info for test_task_arena hang:

pete.schroeder@petesch-ltmtwqf near1 % ps ax | grep test_task_arena
61715 s000  S+     0:00.06 ctest --repeat-until-fail 100 . -R test_task_arena --output-on-failure
61782 s000  S+     0:41.68 /Users/pete.schroeder/p4/oneTBB_2021_7_v2/build/appleclang_14.0_cxx11_64_relwithdebinfo/test_task_arena --force-colors=1
62412 s001  R+     0:00.00 grep test_task_arena
pete.schroeder@petesch-ltmtwqf near1 % lldb
(lldb) process attach -p 61782
Process 61782 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00000001923cc834 libsystem_kernel.dylib`__ulock_wait + 8
libsystem_kernel.dylib`:
->  0x1923cc834 <+8>:  b.lo   0x1923cc854               ; <+40>
    0x1923cc838 <+12>: pacibsp 
    0x1923cc83c <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1923cc840 <+20>: mov    x29, sp
Target 0: (test_task_arena) stopped.
Executable module set to "/Users/pete.schroeder/p4/oneTBB_2021_7_v2/build/appleclang_14.0_cxx11_64_relwithdebinfo/test_task_arena".
Architecture set to: arm64e-apple-macosx-.
(lldb) tb
No breakpoints currently set.
(lldb) thread list
Process 61782 stopped
* thread #1: tid = 0x2194fbb, 0x00000001923cc834 libsystem_kernel.dylib`__ulock_wait + 8, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  thread #2: tid = 0x2194fbc, 0x00000001923ce06c libsystem_kernel.dylib`__semwait_signal + 8
  thread #3: tid = 0x2194fbd, 0x00000001923ce06c libsystem_kernel.dylib`__semwait_signal + 8
  thread #4: tid = 0x2194fbe, 0x00000001923ce06c libsystem_kernel.dylib`__semwait_signal + 8
  thread #5: tid = 0x2194fbf, 0x00000001923ce06c libsystem_kernel.dylib`__semwait_signal + 8
  thread #6: tid = 0x2194fc0, 0x00000001923ce06c libsystem_kernel.dylib`__semwait_signal + 8
  thread #7: tid = 0x2194fc1, 0x00000001923ce06c libsystem_kernel.dylib`__semwait_signal + 8
  thread #8: tid = 0x2194fc2, 0x00000001923ca8ec libsystem_kernel.dylib`semaphore_wait_trap + 8
  thread #9: tid = 0x2194fc3, 0x00000001923ce06c libsystem_kernel.dylib`__semwait_signal + 8
  thread #10: tid = 0x2194fc4, 0x00000001923ce06c libsystem_kernel.dylib`__semwait_signal + 8
  thread #11: tid = 0x21951e1, 0x00000001923ce06c libsystem_kernel.dylib`__semwait_signal + 8
(lldb) thread select 1
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00000001923cc834 libsystem_kernel.dylib`__ulock_wait + 8
libsystem_kernel.dylib`:
->  0x1923cc834 <+8>:  b.lo   0x1923cc854               ; <+40>
    0x1923cc838 <+12>: pacibsp 
    0x1923cc83c <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1923cc840 <+20>: mov    x29, sp
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001923cc834 libsystem_kernel.dylib`__ulock_wait + 8
    frame #1: 0x000000019240a5a0 libsystem_pthread.dylib`_pthread_join + 444
    frame #2: 0x0000000192363a14 libc++.1.dylib`std::__1::thread::join() + 36
    frame #3: 0x000000010481ae94 test_task_arena`TestDefaultCreatedWorkersAmount() at utils.h:85:16 [opt]
    frame #4: 0x000000010481ae3c test_task_arena`TestDefaultCreatedWorkersAmount() at test_task_arena.cpp:1464:5 [opt]
    frame #5: 0x000000010481cbdc test_task_arena`DOCTEST_ANON_FUNC_226() [inlined] TestDefaultWorkersLimit() at test_task_arena.cpp:1500:5 [opt]
    frame #6: 0x000000010481cbd8 test_task_arena`DOCTEST_ANON_FUNC_226() at test_task_arena.cpp:1840:5 [opt]
    frame #7: 0x000000010480ccc0 test_task_arena`doctest::Context::run(this=<unavailable>) at doctest.h:6724:21 [opt]
    frame #8: 0x000000010480e890 test_task_arena`main(argc=<unavailable>, argv=<unavailable>) at doctest.h:6809:71 [opt]
    frame #9: 0x0000000104cd908c dyld`start + 520
(lldb) thread select 2
* thread #2
    frame #0: 0x00000001923ce06c libsystem_kernel.dylib`__semwait_signal + 8
libsystem_kernel.dylib`:
->  0x1923ce06c <+8>:  b.lo   0x1923ce08c               ; <+40>
    0x1923ce070 <+12>: pacibsp 
    0x1923ce074 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1923ce078 <+20>: mov    x29, sp
(lldb) bt
* thread #2
  * frame #0: 0x00000001923ce06c libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x00000001922d6fc8 libsystem_c.dylib`nanosleep + 220
    frame #2: 0x0000000192363af8 libc++.1.dylib`std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) + 84
    frame #3: 0x0000000104820e74 test_task_arena`void utils::SpinWaitWhile<void utils::SpinWaitWhileCondition<unsigned long, void utils::SpinWaitWhileEq<unsigned long, unsigned long>(std::__1::atomic<unsigned long> const&, unsigned long)::'lambda'(unsigned long)>(std::__1::atomic<unsigned long> const&, unsigned long)::'lambda'()>(unsigned long) at thread:386:9 [opt]
    frame #4: 0x0000000104820e64 test_task_arena`void utils::SpinWaitWhile<void utils::SpinWaitWhileCondition<unsigned long, void utils::SpinWaitWhileEq<unsigned long, unsigned long>(std::__1::atomic<unsigned long> const&, unsigned long)::'lambda'(unsigned long)>(std::__1::atomic<unsigned long> const&, unsigned long)::'lambda'()>(pred=(unnamed class) @ 0x0000600001df4070) at spin_barrier.h:49:13 [opt]
    frame #5: 0x0000000104820d50 test_task_arena`bool utils::SpinBarrier::customWait<utils::WaitWhileEq, utils::SpinBarrier::DummyCallback>(utils::WaitWhileEq const&, utils::SpinBarrier::DummyCallback const&) [inlined] void utils::SpinWaitWhileCondition<unsigned long, void utils::SpinWaitWhileEq<unsigned long, unsigned long>(std::__1::atomic<unsigned long> const&, unsigned long)::'lambda'(unsigned long)>(location=<unavailable>, comp=(unnamed class) @ 0x000000016ba06cc8) at spin_barrier.h:60:5 [opt]
    frame #6: 0x0000000104820d48 test_task_arena`bool utils::SpinBarrier::customWait<utils::WaitWhileEq, utils::SpinBarrier::DummyCallback>(utils::WaitWhileEq const&, utils::SpinBarrier::DummyCallback const&) [inlined] void utils::SpinWaitWhileEq<unsigned long, unsigned long>(location=<unavailable>, value=<unavailable>) at spin_barrier.h:67:5 [opt]
    frame #7: 0x0000000104820d48 test_task_arena`bool utils::SpinBarrier::customWait<utils::WaitWhileEq, utils::SpinBarrier::DummyCallback>(utils::WaitWhileEq const&, utils::SpinBarrier::DummyCallback const&) [inlined] void utils::WaitWhileEq::operator(this=<unavailable>, location=<unavailable>, value=<unavailable>)<unsigned long, unsigned long>(std::__1::atomic<unsigned long> const&, unsigned long) const at spin_barrier.h:84:9 [opt]
    frame #8: 0x0000000104820d48 test_task_arena`bool utils::SpinBarrier::customWait<utils::WaitWhileEq, utils::SpinBarrier::DummyCallback>(this=0x000000016daf2e78, onWaitCallback=0x000000016ba06d38, onOpenBarrierCallback=<unavailable>) at spin_barrier.h:140:13 [opt]
    frame #9: 0x000000010484e25c test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) [inlined] bool utils::SpinBarrier::wait<utils::SpinBarrier::DummyCallback>(this=<unavailable>, onOpenBarrierCallback=0x000000016ba06d18) at spin_barrier.h:159:16 [opt]
    frame #10: 0x000000010484e250 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) [inlined] utils::SpinBarrier::wait(this=<unavailable>) at spin_barrier.h:163:16 [opt]
    frame #11: 0x000000010484e250 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) [inlined] TestDefaultCreatedWorkersAmount(this=0x000000016daf2ea0, (null)=<unavailable>)::$_27::operator()(int) const::'lambda'(int)::operator()(int) const at test_task_arena.cpp:1477:33 [opt]
    frame #12: 0x000000010484df08 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) at parallel_for.h:208:13 [opt]
    frame #13: 0x000000010484dec8 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) [inlined] tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::run_body(this=0x000000010538ba00, r=0x000000010538ba40) at parallel_for.h:119:9 [opt]
    frame #14: 0x000000010484dec8 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) [inlined] void tbb::detail::d1::simple_partition_type::execute<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>, tbb::detail::d1::blocked_range<int> >(this=<unavailable>, start=0x000000010538ba00, range=0x000000010538ba40, ed=0x0000000105333808)::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>&, tbb::detail::d1::blocked_range<int>&, tbb::detail::d1::execution_data&) at partitioner.h:512:15 [opt]
    frame #15: 0x000000010484ddf0 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(this=0x000000010538ba00, ed=0x0000000105333808) at parallel_for.h:172:18 [opt]
    frame #16: 0x0000000104a43594 libtbb.12.8.dylib`tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) + 1428
    frame #17: 0x0000000104a4efec libtbb.12.8.dylib`tbb::detail::r1::market::process(rml::job&) + 108
    frame #18: 0x0000000104a5146c libtbb.12.8.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 228
    frame #19: 0x000000019240826c libsystem_pthread.dylib`_pthread_start + 148
(lldb) thread select 3
* thread #3
    frame #0: 0x00000001923ce06c libsystem_kernel.dylib`__semwait_signal + 8
libsystem_kernel.dylib`:
->  0x1923ce06c <+8>:  b.lo   0x1923ce08c               ; <+40>
    0x1923ce070 <+12>: pacibsp 
    0x1923ce074 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1923ce078 <+20>: mov    x29, sp
(lldb) bt
* thread #3
  * frame #0: 0x00000001923ce06c libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x00000001922d6fc8 libsystem_c.dylib`nanosleep + 220
    frame #2: 0x0000000192363af8 libc++.1.dylib`std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) + 84
    frame #3: 0x0000000104820e74 test_task_arena`void utils::SpinWaitWhile<void utils::SpinWaitWhileCondition<unsigned long, void utils::SpinWaitWhileEq<unsigned long, unsigned long>(std::__1::atomic<unsigned long> const&, unsigned long)::'lambda'(unsigned long)>(std::__1::atomic<unsigned long> const&, unsigned long)::'lambda'()>(unsigned long) at thread:386:9 [opt]
    frame #4: 0x0000000104820e64 test_task_arena`void utils::SpinWaitWhile<void utils::SpinWaitWhileCondition<unsigned long, void utils::SpinWaitWhileEq<unsigned long, unsigned long>(std::__1::atomic<unsigned long> const&, unsigned long)::'lambda'(unsigned long)>(std::__1::atomic<unsigned long> const&, unsigned long)::'lambda'()>(pred=(unnamed class) @ 0x0000600001dc0150) at spin_barrier.h:49:13 [opt]
    frame #5: 0x0000000104820d50 test_task_arena`bool utils::SpinBarrier::customWait<utils::WaitWhileEq, utils::SpinBarrier::DummyCallback>(utils::WaitWhileEq const&, utils::SpinBarrier::DummyCallback const&) [inlined] void utils::SpinWaitWhileCondition<unsigned long, void utils::SpinWaitWhileEq<unsigned long, unsigned long>(std::__1::atomic<unsigned long> const&, unsigned long)::'lambda'(unsigned long)>(location=<unavailable>, comp=(unnamed class) @ 0x000000016be12cc8) at spin_barrier.h:60:5 [opt]
    frame #6: 0x0000000104820d48 test_task_arena`bool utils::SpinBarrier::customWait<utils::WaitWhileEq, utils::SpinBarrier::DummyCallback>(utils::WaitWhileEq const&, utils::SpinBarrier::DummyCallback const&) [inlined] void utils::SpinWaitWhileEq<unsigned long, unsigned long>(location=<unavailable>, value=<unavailable>) at spin_barrier.h:67:5 [opt]
    frame #7: 0x0000000104820d48 test_task_arena`bool utils::SpinBarrier::customWait<utils::WaitWhileEq, utils::SpinBarrier::DummyCallback>(utils::WaitWhileEq const&, utils::SpinBarrier::DummyCallback const&) [inlined] void utils::WaitWhileEq::operator(this=<unavailable>, location=<unavailable>, value=<unavailable>)<unsigned long, unsigned long>(std::__1::atomic<unsigned long> const&, unsigned long) const at spin_barrier.h:84:9 [opt]
    frame #8: 0x0000000104820d48 test_task_arena`bool utils::SpinBarrier::customWait<utils::WaitWhileEq, utils::SpinBarrier::DummyCallback>(this=0x000000016daf2e78, onWaitCallback=0x000000016be12d38, onOpenBarrierCallback=<unavailable>) at spin_barrier.h:140:13 [opt]
    frame #9: 0x000000010484e25c test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) [inlined] bool utils::SpinBarrier::wait<utils::SpinBarrier::DummyCallback>(this=<unavailable>, onOpenBarrierCallback=0x000000016be12d18) at spin_barrier.h:159:16 [opt]
    frame #10: 0x000000010484e250 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) [inlined] utils::SpinBarrier::wait(this=<unavailable>) at spin_barrier.h:163:16 [opt]
    frame #11: 0x000000010484e250 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) [inlined] TestDefaultCreatedWorkersAmount(this=0x000000016daf2ea0, (null)=<unavailable>)::$_27::operator()(int) const::'lambda'(int)::operator()(int) const at test_task_arena.cpp:1477:33 [opt]
    frame #12: 0x000000010484df08 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) at parallel_for.h:208:13 [opt]
    frame #13: 0x000000010484dec8 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) [inlined] tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::run_body(this=0x0000000104f32000, r=0x0000000104f32040) at parallel_for.h:119:9 [opt]
    frame #14: 0x000000010484dec8 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) [inlined] void tbb::detail::d1::simple_partition_type::execute<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>, tbb::detail::d1::blocked_range<int> >(this=<unavailable>, start=0x0000000104f32000, range=0x0000000104f32040, ed=0x0000000105333788)::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>&, tbb::detail::d1::blocked_range<int>&, tbb::detail::d1::execution_data&) at partitioner.h:512:15 [opt]
    frame #15: 0x000000010484ddf0 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(this=0x0000000104f32000, ed=0x0000000105333788) at parallel_for.h:172:18 [opt]
    frame #16: 0x0000000104a43594 libtbb.12.8.dylib`tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) + 1428
    frame #17: 0x0000000104a4efb4 libtbb.12.8.dylib`tbb::detail::r1::market::process(rml::job&) + 52
    frame #18: 0x0000000104a5146c libtbb.12.8.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 228
    frame #19: 0x000000019240826c libsystem_pthread.dylib`_pthread_start + 148
(lldb) thread select 4
* thread #4
    frame #0: 0x00000001923ce06c libsystem_kernel.dylib`__semwait_signal + 8
libsystem_kernel.dylib`:
->  0x1923ce06c <+8>:  b.lo   0x1923ce08c               ; <+40>
    0x1923ce070 <+12>: pacibsp 
    0x1923ce074 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1923ce078 <+20>: mov    x29, sp
(lldb) bt
* thread #4
  * frame #0: 0x00000001923ce06c libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x00000001922d6fc8 libsystem_c.dylib`nanosleep + 220
    frame #2: 0x0000000192363af8 libc++.1.dylib`std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) + 84
    frame #3: 0x0000000104820e74 test_task_arena`void utils::SpinWaitWhile<void utils::SpinWaitWhileCondition<unsigned long, void utils::SpinWaitWhileEq<unsigned long, unsigned long>(std::__1::atomic<unsigned long> const&, unsigned long)::'lambda'(unsigned long)>(std::__1::atomic<unsigned long> const&, unsigned long)::'lambda'()>(unsigned long) at thread:386:9 [opt]
    frame #4: 0x0000000104820e64 test_task_arena`void utils::SpinWaitWhile<void utils::SpinWaitWhileCondition<unsigned long, void utils::SpinWaitWhileEq<unsigned long, unsigned long>(std::__1::atomic<unsigned long> const&, unsigned long)::'lambda'(unsigned long)>(std::__1::atomic<unsigned long> const&, unsigned long)::'lambda'()>(pred=(unnamed class) @ 0x0000600001dc0170) at spin_barrier.h:49:13 [opt]
    frame #5: 0x0000000104820d50 test_task_arena`bool utils::SpinBarrier::customWait<utils::WaitWhileEq, utils::SpinBarrier::DummyCallback>(utils::WaitWhileEq const&, utils::SpinBarrier::DummyCallback const&) [inlined] void utils::SpinWaitWhileCondition<unsigned long, void utils::SpinWaitWhileEq<unsigned long, unsigned long>(std::__1::atomic<unsigned long> const&, unsigned long)::'lambda'(unsigned long)>(location=<unavailable>, comp=(unnamed class) @ 0x000000016c21ecc8) at spin_barrier.h:60:5 [opt]
    frame #6: 0x0000000104820d48 test_task_arena`bool utils::SpinBarrier::customWait<utils::WaitWhileEq, utils::SpinBarrier::DummyCallback>(utils::WaitWhileEq const&, utils::SpinBarrier::DummyCallback const&) [inlined] void utils::SpinWaitWhileEq<unsigned long, unsigned long>(location=<unavailable>, value=<unavailable>) at spin_barrier.h:67:5 [opt]
    frame #7: 0x0000000104820d48 test_task_arena`bool utils::SpinBarrier::customWait<utils::WaitWhileEq, utils::SpinBarrier::DummyCallback>(utils::WaitWhileEq const&, utils::SpinBarrier::DummyCallback const&) [inlined] void utils::WaitWhileEq::operator(this=<unavailable>, location=<unavailable>, value=<unavailable>)<unsigned long, unsigned long>(std::__1::atomic<unsigned long> const&, unsigned long) const at spin_barrier.h:84:9 [opt]
    frame #8: 0x0000000104820d48 test_task_arena`bool utils::SpinBarrier::customWait<utils::WaitWhileEq, utils::SpinBarrier::DummyCallback>(this=0x000000016daf2e78, onWaitCallback=0x000000016c21ed38, onOpenBarrierCallback=<unavailable>) at spin_barrier.h:140:13 [opt]
    frame #9: 0x000000010484e25c test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) [inlined] bool utils::SpinBarrier::wait<utils::SpinBarrier::DummyCallback>(this=<unavailable>, onOpenBarrierCallback=0x000000016c21ed18) at spin_barrier.h:159:16 [opt]
    frame #10: 0x000000010484e250 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) [inlined] utils::SpinBarrier::wait(this=<unavailable>) at spin_barrier.h:163:16 [opt]
    frame #11: 0x000000010484e250 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) [inlined] TestDefaultCreatedWorkersAmount(this=0x000000016daf2ea0, (null)=<unavailable>)::$_27::operator()(int) const::'lambda'(int)::operator()(int) const at test_task_arena.cpp:1477:33 [opt]
    frame #12: 0x000000010484df08 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) at parallel_for.h:208:13 [opt]
    frame #13: 0x000000010484dec8 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) [inlined] tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::run_body(this=0x0000000104f1a900, r=0x0000000104f1a940) at parallel_for.h:119:9 [opt]
    frame #14: 0x000000010484dec8 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(tbb::detail::d1::execution_data&) [inlined] void tbb::detail::d1::simple_partition_type::execute<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>, tbb::detail::d1::blocked_range<int> >(this=<unavailable>, start=0x0000000104f1a900, range=0x0000000104f1a940, ed=0x0000000105333988)::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>&, tbb::detail::d1::blocked_range<int>&, tbb::detail::d1::execution_data&) at partitioner.h:512:15 [opt]
    frame #15: 0x000000010484ddf0 test_task_arena`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestDefaultCreatedWorkersAmount()::$_27::operator()(int) const::'lambda'(int), int>, tbb::detail::d1::simple_partitioner const>::execute(this=0x0000000104f1a900, ed=0x0000000105333988) at parallel_for.h:172:18 [opt]
    frame #16: 0x0000000104a43594 libtbb.12.8.dylib`tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) + 1428
    frame #17: 0x0000000104a4efb4 libtbb.12.8.dylib`tbb::detail::r1::market::process(rml::job&) + 52
    frame #18: 0x0000000104a5146c libtbb.12.8.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 228
    frame #19: 0x000000019240826c libsystem_pthread.dylib`_pthread_start + 148
(lldb) thread select 8
* thread #8
    frame #0: 0x00000001923ca8ec libsystem_kernel.dylib`semaphore_wait_trap + 8
libsystem_kernel.dylib`semaphore_wait_trap:
->  0x1923ca8ec <+8>: ret    

libsystem_kernel.dylib`semaphore_wait_signal_trap:
    0x1923ca8f0 <+0>: mov    x16, #-0x25
    0x1923ca8f4 <+4>: svc    #0x80
    0x1923ca8f8 <+8>: ret    
(lldb) bt
* thread #8
  * frame #0: 0x00000001923ca8ec libsystem_kernel.dylib`semaphore_wait_trap + 8
    frame #1: 0x0000000104a514e0 libtbb.12.8.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 344
    frame #2: 0x000000019240826c libsystem_pthread.dylib`_pthread_start + 148
(lldb) 

@phprus
Copy link
Contributor

phprus commented Dec 4, 2022

@pavelkumbrasev
I observed a similar "hang" of the test_task_arena test on Windows 7 SP1 x86_64 (2 virtual CPU core) with high CPU oversubscription (2 virtual CPU to 1 physical CPU core, on baremetal does not reproduced).

On average, the test was completed in 6-7 seconds. But some launches worked for 60 seconds or longer.

Commit: 274f68e
Source: https://github.com/oneapi-src/oneTBB/archive/274f68e529a75c3512a30dc72c963ac9b72dfa8c.zip

Build (with LTO):

cmake -DCMAKE_VERBOSE_MAKEFILE=ON -G "Visual Studio 16 2019" -A x64   -T v142,version=14.29.30133 -DCMAKE_CXX_STANDARD=20 -DCMAKE_CXX_FLAGS_INIT="/D_WIN32_WINNT=0x0601" -DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON ../..

cmake --build . --config RelWithDebInfo -v -j

Test:

ctest --build-config RelWithDebInfo --timeout 60 --output-on-failure -R test_task_arena --repeat-until-fail 200

Test project F:/tmp/tbb/oneTBB-274f68e529a75c3512a30dc72c963ac9b72dfa8c/build/vc2019-x64-lto
    Start 26: test_task_arena
    Test #26: test_task_arena ..................***Timeout  64.42 sec
TBB Warning: The number of workers is currently limited to 1. The request for 3 workers is ignored. Further requests for
 more workers will be silently ignored until the limit changes.



0% tests passed, 1 tests failed out of 1

Total Test time (real) =  64.45 sec

The following tests FAILED:
         26 - test_task_arena (Timeout)
Errors while running CTest

Based on the stack trace data, the problem is inside the TestArenaConcurrency test.

Unlike ARM, the test on x86 completes, but it takes 10-100 times longer:

...
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed  230.09 sec
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed   25.56 sec
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed  777.18 sec
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed    5.34 sec
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed    4.50 sec
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed    4.57 sec
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed    5.10 sec
...

Minidump with heap and pdb:
274f68e529a75c3512a30dc72c963ac9b72dfa8c.zip

@phprus
Copy link
Contributor

phprus commented Dec 4, 2022

LTO does not affect test behavior.

Without LTO:

F:\tmp\tbb\oneTBB-274f68e529a75c3512a30dc72c963ac9b72dfa8c\build\vc2019-x64>ctest --build-config RelWithDebInfo --timeou
t 60 --output-on-failure -R test_task_arena --repeat-until-fail 200
Test project F:/tmp/tbb/oneTBB-274f68e529a75c3512a30dc72c963ac9b72dfa8c/build/vc2019-x64
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed    7.08 sec
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed    6.81 sec
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed    6.95 sec
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed   26.20 sec
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed    6.83 sec
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed    6.65 sec
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed    7.04 sec
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed    6.51 sec
    Start 26: test_task_arena
    Test #26: test_task_arena ..................   Passed    6.65 sec
    Start 26: test_task_arena
    Test #26: test_task_arena ..................***Timeout  61.37 sec
TBB Warning: The number of workers is currently limited to 1. The request for 3 workers is ignored. Further requests for
 more workers will be silently ignored until the limit changes.



0% tests passed, 1 tests failed out of 1

Total Test time (real) = 142.14 sec

The following tests FAILED:
         26 - test_task_arena (Timeout)
Errors while running CTest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants