Enable testing of new scheduler in CI #6428

hkaiser · 2024-01-31T16:49:38Z

No description provided.

StellarBot · 2024-01-31T16:58:50Z

Performance test report

HPX Performance

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR	PARALLEL_EXECUTOR	SCHEDULER_EXECUTOR
For Each	(=)	??	-

Info

Property	Before	After
HPX Datetime	2023-05-10T12:07:53+00:00	2024-01-31T16:49:40+00:00
HPX Commit	`dcb5415`	`b7a3134`
Datetime	2023-05-10T14:50:18.616050-05:00	2024-01-31T10:55:40.712102-06:00
Envfile
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clustername	rostam	rostam

Comparison

BENCHMARK	NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch	(=)

Info

Property	Before	After
HPX Datetime	2023-05-10T12:07:53+00:00	2024-01-31T16:49:40+00:00
HPX Commit	`dcb5415`	`b7a3134`
Datetime	2023-05-10T14:52:35.047119-05:00	2024-01-31T10:57:55.614863-06:00
Envfile
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clustername	rostam	rostam

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATOR	PARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATOR	SCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add	(=)	(=)	(=)
Stream Benchmark - Scale	+	=	=
Stream Benchmark - Triad	(=)	(=)	(=)
Stream Benchmark - Copy	(=)	(=)	(=)

Info

Property	Before	After
HPX Datetime	2023-05-10T12:07:53+00:00	2024-01-31T16:49:40+00:00
HPX Commit	`dcb5415`	`b7a3134`
Datetime	2023-05-10T14:52:52.237641-05:00	2024-01-31T10:58:12.811457-06:00
Envfile
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clustername	rostam	rostam

Explanation of Symbols

Symbol	MEANING
=	No performance change (confidence interval within ±1%)
(=)	Probably no performance change (confidence interval within ±2%)
(+)/(-)	Very small performance improvement/degradation (≤1%)
+/-	Small performance improvement/degradation (≤5%)
++/--	Large performance improvement/degradation (≤10%)
+++/---	Very large performance improvement/degradation (>10%)
?	Probably no change, but quite large uncertainty (confidence interval with ±5%)
??	Unclear result, very large uncertainty (±10%)
???	Something unexpected…

codacy-production · 2024-01-31T21:29:33Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ -0.09%	✅ ∅

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`9c53b99`)	206733	176197	85.23%
Head commit (`78626f0`)	190772 (-15961)	162415 (-13782)	85.14% (-0.09%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#6428)	0	0	∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation}

hkaiser · 2024-02-05T16:05:50Z

@Pansysk75 it looks like that the new scheduler doesn't properly work yet (as you suggested). the clang-12 and gcc-10 errors are most likely related.

Pansysk75 · 2024-02-07T18:47:24Z

retest

Pansysk75 · 2024-02-16T19:02:29Z

@hkaiser FYI I'm working on this one, I've reproduced some failing tests (stop_token_cb1 being one of them). So far, I see that some pending hpx thread is never getting scheduled (either starvation or some bug). I'll let you know of what I find out.

Pansysk75 · 2024-02-17T12:07:26Z

@hkaiser It seems like the moodycamel queue provides FIFO guarantees only among items placed by the same thread, however there is no guarantee that items queued by different threads will be dequeued in a FIFO order.

From the moodycamel blog:

All elements from a given producer thread will necessarily still be seen in that same order relative to each other when dequeued (since the sub-queue preserves that order), albeit with elements from other sub-queues possibly interleaved.

And from the discussion in the comments:

gasche: ... If thread A enqueues 1, then does a write observed by thread B (synchronized through the appropriate memory fence), then B enqueues 2, then I would expect 1 to come before 2 -- and this is not racy. It seems to me that your data-structure is thus providing a weaker specification than linearizable queues. ...

Cameron: Yes, you're right. If there's external synchronization, then the final order is not related to the total order formed by the happens-before relationships of the enqueue operations themselves. ...

This currently causes trouble (deadlocks), as we rely on the FIFO ordering to avoid starvation:

hpx/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp

Lines 325 to 332 in bd9e3cb

    
           // schedule this thread again, make sure it ends up at 
        
           // the end of the queue 
        
           scheduler.SchedulingPolicy::schedule_thread_last( 
        
               HPX_MOVE(thrd), 
        
               threads::thread_schedule_hint( 
        
                   static_cast<std::int16_t>(num_thread)), 
        
               true); 
        
           scheduler.SchedulingPolicy::do_some_work(num_thread);

In the fail-case that I managed to reproduce, a perpetually yielding thread starves out a shared-resource-holding thread when these two happen to be in the same moodycamel thread-queue. We cannot reliably insert the yielding thread in the"back" of the queue (no FIFO guarantee), and the queue under certain circumstances acts similar to a LIFO, thus never allowing the shared-resource-holding thread to be picked up by the scheduler and release the held resource.

hkaiser · 2024-02-17T14:36:35Z

Excellent detective work @Pansysk75! Is that true in case if we used the consumer/producer token API the queue provides as well?

Pansysk75 · 2024-02-17T15:12:07Z

Excellent detective work @Pansysk75! Is that true in case if we used the consumer/producer token API the queue provides as well?

Thanks! Here's what the implementer has to say about this (TLDR it's not possible): https://www.github.com/cameron314/concurrentqueue/issues/6

hkaiser · 2024-02-18T17:13:19Z

Excellent detective work @Pansysk75! Is that true in case if we used the consumer/producer token API the queue provides as well?

Thanks! Here's what the implementer has to say about this (TLDR it's not possible): https://www.github.com/cameron314/concurrentqueue/issues/6

Would this patch mitigate the issue?

diff --git a/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp b/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp
index 1ca53941ee4..d2cc74fae4e 100644
--- a/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp
+++ b/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp
@@ -315,13 +315,6 @@ namespace hpx::threads::detail {
                     if (HPX_UNLIKELY(
                             state_val == thread_schedule_state::pending))
                     {
-                        if (HPX_LIKELY(next_thrd == nullptr))
-                        {
-                            // schedule other work
-                            scheduler.wait_or_add_new(num_thread, running,
-                                idle_loop_count, enable_stealing_staged, added);
-                        }
-
                         // schedule this thread again, make sure it ends up at
                         // the end of the queue
                         scheduler.SchedulingPolicy::schedule_thread_last(
@@ -329,6 +322,14 @@ namespace hpx::threads::detail {
                             threads::thread_schedule_hint(
                                 static_cast<std::int16_t>(num_thread)),
                             true);
+
+                        if (HPX_LIKELY(next_thrd == nullptr))
+                        {
+                            // schedule other work
+                            scheduler.wait_or_add_new(num_thread, running,
+                                idle_loop_count, enable_stealing_staged, added);
+                        }
+
                         scheduler.SchedulingPolicy::do_some_work(num_thread);
                     }
                     else if (HPX_UNLIKELY(state_val ==

I think the problem is that currently after the pending thread has been put back onto the (top of) queue the scheduler immediately pulls it back and retries without potentially waiting threads having a change of being work-requested. For other schedulers those will get stolen eventually, thus avoiding the live-lock.

Pansysk75 · 2024-02-19T12:08:56Z

@hkaiser Doesn't seem to help much... keep in mind that this issue doesn't show up on the workrequesting scheduler when using the default queue, so it's either the mc queue or the stealhalf functionality that is causing the issue (I also tried setting this to false

hpx/libs/core/schedulers/include/hpx/schedulers/lockfree_queue_backends.hpp

Line 119 in bd9e3cb

static constexpr bool support_bulk_dequeue = true;

to see bulk dequeueing was the culprit, but it didn't fix things.

Also, I think this fix is needed (or sth equivalent):

                task_data thrds(d.num_thread_);
-                thrds.tasks_.reserve(max_num_to_steal);


#ifdef HPX_HAVE_THREAD_STEALING_COUNTS
+              thrds.tasks_.reserve(max_num_to_steal);
                thread_id_ref_type thrd;
                while (max_num_to_steal-- != 0 &&
                    d.queue_->get_next_thread(thrd, false, true))
                {
                    d.queue_->increment_num_stolen_from_pending();
                    thrds.tasks_.push_back(HPX_MOVE(thrd));
                    thrd = thread_id_ref_type{};
                }
#else
+              thrds.tasks_.resize(max_num_to_steal);
                d.queue_->get_next_threads(
                    thrds.tasks_.begin(), thrds.tasks_.size(), false, true);

hkaiser · 2024-02-19T15:56:45Z

Also, I think this fix is needed (or sth equivalent):

                task_data thrds(d.num_thread_);
-                thrds.tasks_.reserve(max_num_to_steal);


#ifdef HPX_HAVE_THREAD_STEALING_COUNTS
+              thrds.tasks_.reserve(max_num_to_steal);
                thread_id_ref_type thrd;
                while (max_num_to_steal-- != 0 &&
                    d.queue_->get_next_thread(thrd, false, true))
                {
                    d.queue_->increment_num_stolen_from_pending();
                    thrds.tasks_.push_back(HPX_MOVE(thrd));
                    thrd = thread_id_ref_type{};
                }
#else
+              thrds.tasks_.resize(max_num_to_steal);
                d.queue_->get_next_threads(
                    thrds.tasks_.begin(), thrds.tasks_.size(), false, true);

Uhh, how did it ever work? ;-)

hkaiser · 2024-02-19T16:23:12Z

Also, I think this fix is needed (or sth equivalent):

                task_data thrds(d.num_thread_);
-                thrds.tasks_.reserve(max_num_to_steal);


#ifdef HPX_HAVE_THREAD_STEALING_COUNTS
+              thrds.tasks_.reserve(max_num_to_steal);
                thread_id_ref_type thrd;
                while (max_num_to_steal-- != 0 &&
                    d.queue_->get_next_thread(thrd, false, true))
                {
                    d.queue_->increment_num_stolen_from_pending();
                    thrds.tasks_.push_back(HPX_MOVE(thrd));
                    thrd = thread_id_ref_type{};
                }
#else
+              thrds.tasks_.resize(max_num_to_steal);
                d.queue_->get_next_threads(
                    thrds.tasks_.begin(), thrds.tasks_.size(), false, true);

Uhh, how did it ever work? ;-)

That's fixed now. Thanks again!

codacy-production · 2024-02-19T21:50:34Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ +0.04%	✅ 0.00%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`e977ecc`)	217975	185525	85.11%
Head commit (`2e610e7`)	190892 (-27083)	162547 (-22978)	85.15% (+0.04%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#6428)	1	0	0.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{Codacy will stop sending the deprecated coverage status from June 5th, 2024. Learn more}

- flyby: fixing buffer allocation

StellarBot · 2024-04-28T16:42:36Z

Performance test report

HPX Performance

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR	PARALLEL_EXECUTOR	SCHEDULER_EXECUTOR
For Each	(=)	-	-

Info

Property	Before	After
HPX Commit	`d27ac2e`	`0e8959a`
HPX Datetime	2024-03-18T14:00:30+00:00	2024-04-28T16:28:37+00:00
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Clustername	rostam	rostam
Datetime	2024-03-18T09:18:04.949759-05:00	2024-04-28T11:40:07.559807-05:00

Comparison

BENCHMARK	NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch	(=)

Info

Property	Before	After
HPX Commit	`d27ac2e`	`0e8959a`
HPX Datetime	2024-03-18T14:00:30+00:00	2024-04-28T16:28:37+00:00
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Clustername	rostam	rostam
Datetime	2024-03-18T09:19:53.062988-05:00	2024-04-28T11:41:54.962060-05:00

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATOR	PARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATOR	SCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add	(=)	(=)	(=)
Stream Benchmark - Scale	(=)	(=)	(=)
Stream Benchmark - Triad	=	(=)	(=)
Stream Benchmark - Copy	(=)	(=)	(=)

Info

Property	Before	After
HPX Commit	`d27ac2e`	`0e8959a`
HPX Datetime	2024-03-18T14:00:30+00:00	2024-04-28T16:28:37+00:00
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Clustername	rostam	rostam
Datetime	2024-03-18T09:20:13.002391-05:00	2024-04-28T11:42:13.137113-05:00

Explanation of Symbols

Symbol	MEANING
=	No performance change (confidence interval within ±1%)
(=)	Probably no performance change (confidence interval within ±2%)
(+)/(-)	Very small performance improvement/degradation (≤1%)
+/-	Small performance improvement/degradation (≤5%)
++/--	Large performance improvement/degradation (≤10%)
+++/---	Very large performance improvement/degradation (>10%)
?	Probably no change, but quite large uncertainty (confidence interval with ±5%)
??	Unclear result, very large uncertainty (±10%)
???	Something unexpected…

hkaiser added type: enhancement type: compatibility issue category: CI labels Jan 31, 2024

hkaiser added this to the 1.10.0 milestone Jan 31, 2024

hkaiser force-pushed the workrequesting_scheduler_ci branch from 78626f0 to 1b888c6 Compare February 19, 2024 16:19

hkaiser force-pushed the workrequesting_scheduler_ci branch from 1b888c6 to a0dca56 Compare February 19, 2024 17:30

Enable testing of new scheduler in CI

2e610e7

- flyby: fixing buffer allocation

hkaiser force-pushed the workrequesting_scheduler_ci branch from a0dca56 to 2e610e7 Compare April 28, 2024 16:28

hkaiser modified the milestones: 1.10.0, 1.11.0 May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable testing of new scheduler in CI #6428

Enable testing of new scheduler in CI #6428

hkaiser commented Jan 31, 2024

StellarBot commented Jan 31, 2024

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

codacy-production bot commented Jan 31, 2024

hkaiser commented Feb 5, 2024

Pansysk75 commented Feb 7, 2024

Pansysk75 commented Feb 16, 2024

Pansysk75 commented Feb 17, 2024

hkaiser commented Feb 17, 2024

Pansysk75 commented Feb 17, 2024 •

edited

Loading

hkaiser commented Feb 18, 2024

Pansysk75 commented Feb 19, 2024

hkaiser commented Feb 19, 2024

hkaiser commented Feb 19, 2024

codacy-production bot commented Feb 19, 2024 •

edited

Loading

StellarBot commented Apr 28, 2024

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

Enable testing of new scheduler in CI #6428

Are you sure you want to change the base?

Enable testing of new scheduler in CI #6428

Conversation

hkaiser commented Jan 31, 2024

StellarBot commented Jan 31, 2024

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

codacy-production bot commented Jan 31, 2024

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

hkaiser commented Feb 5, 2024

Pansysk75 commented Feb 7, 2024

Pansysk75 commented Feb 16, 2024

Pansysk75 commented Feb 17, 2024

hkaiser commented Feb 17, 2024

Pansysk75 commented Feb 17, 2024 • edited Loading

hkaiser commented Feb 18, 2024

Pansysk75 commented Feb 19, 2024

hkaiser commented Feb 19, 2024

hkaiser commented Feb 19, 2024

codacy-production bot commented Feb 19, 2024 • edited Loading

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

StellarBot commented Apr 28, 2024

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

Pansysk75 commented Feb 17, 2024 •

edited

Loading

codacy-production bot commented Feb 19, 2024 •

edited

Loading