Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable testing of new scheduler in CI #6428

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hkaiser
Copy link
Member

@hkaiser hkaiser commented Jan 31, 2024

No description provided.

@StellarBot
Copy link

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each(=)??-

Info

PropertyBeforeAfter
HPX Datetime2023-05-10T12:07:53+00:002024-01-31T16:49:40+00:00
HPX Commitdcb5415b7a3134
Datetime2023-05-10T14:50:18.616050-05:002024-01-31T10:55:40.712102-06:00
Envfile
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clusternamerostamrostam

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch(=)

Info

PropertyBeforeAfter
HPX Datetime2023-05-10T12:07:53+00:002024-01-31T16:49:40+00:00
HPX Commitdcb5415b7a3134
Datetime2023-05-10T14:52:35.047119-05:002024-01-31T10:57:55.614863-06:00
Envfile
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clusternamerostamrostam

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add(=)(=)(=)
Stream Benchmark - Scale+==
Stream Benchmark - Triad(=)(=)(=)
Stream Benchmark - Copy(=)(=)(=)

Info

PropertyBeforeAfter
HPX Datetime2023-05-10T12:07:53+00:002024-01-31T16:49:40+00:00
HPX Commitdcb5415b7a3134
Datetime2023-05-10T14:52:52.237641-05:002024-01-31T10:58:12.811457-06:00
Envfile
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clusternamerostamrostam

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

Copy link

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-0.09%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (9c53b99) 206733 176197 85.23%
Head commit (78626f0) 190772 (-15961) 162415 (-13782) 85.14% (-0.09%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#6428) 0 0 ∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation

@hkaiser
Copy link
Member Author

hkaiser commented Feb 5, 2024

@Pansysk75 it looks like that the new scheduler doesn't properly work yet (as you suggested). the clang-12 and gcc-10 errors are most likely related.

@Pansysk75
Copy link
Member

retest

@Pansysk75
Copy link
Member

@hkaiser FYI I'm working on this one, I've reproduced some failing tests (stop_token_cb1 being one of them). So far, I see that some pending hpx thread is never getting scheduled (either starvation or some bug). I'll let you know of what I find out.

@Pansysk75
Copy link
Member

@hkaiser It seems like the moodycamel queue provides FIFO guarantees only among items placed by the same thread, however there is no guarantee that items queued by different threads will be dequeued in a FIFO order.

From the moodycamel blog:

All elements from a given producer thread will necessarily still be seen in that same order relative to each other when dequeued (since the sub-queue preserves that order), albeit with elements from other sub-queues possibly interleaved.

And from the discussion in the comments:

gasche: ... If thread A enqueues 1, then does a write observed by thread B (synchronized through the appropriate memory fence), then B enqueues 2, then I would expect 1 to come before 2 -- and this is not racy. It seems to me that your data-structure is thus providing a weaker specification than linearizable queues. ...

Cameron: Yes, you're right. If there's external synchronization, then the final order is not related to the total order formed by the happens-before relationships of the enqueue operations themselves. ...

This currently causes trouble (deadlocks), as we rely on the FIFO ordering to avoid starvation:

// schedule this thread again, make sure it ends up at
// the end of the queue
scheduler.SchedulingPolicy::schedule_thread_last(
HPX_MOVE(thrd),
threads::thread_schedule_hint(
static_cast<std::int16_t>(num_thread)),
true);
scheduler.SchedulingPolicy::do_some_work(num_thread);

In the fail-case that I managed to reproduce, a perpetually yielding thread starves out a shared-resource-holding thread when these two happen to be in the same moodycamel thread-queue. We cannot reliably insert the yielding thread in the"back" of the queue (no FIFO guarantee), and the queue under certain circumstances acts similar to a LIFO, thus never allowing the shared-resource-holding thread to be picked up by the scheduler and release the held resource.

@hkaiser
Copy link
Member Author

hkaiser commented Feb 17, 2024

Excellent detective work @Pansysk75! Is that true in case if we used the consumer/producer token API the queue provides as well?

@Pansysk75
Copy link
Member

Pansysk75 commented Feb 17, 2024

Excellent detective work @Pansysk75! Is that true in case if we used the consumer/producer token API the queue provides as well?

Thanks! Here's what the implementer has to say about this (TLDR it's not possible): https://www.github.com/cameron314/concurrentqueue/issues/6

@hkaiser
Copy link
Member Author

hkaiser commented Feb 18, 2024

Excellent detective work @Pansysk75! Is that true in case if we used the consumer/producer token API the queue provides as well?

Thanks! Here's what the implementer has to say about this (TLDR it's not possible): https://www.github.com/cameron314/concurrentqueue/issues/6

Would this patch mitigate the issue?

diff --git a/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp b/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp
index 1ca53941ee4..d2cc74fae4e 100644
--- a/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp
+++ b/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp
@@ -315,13 +315,6 @@ namespace hpx::threads::detail {
                     if (HPX_UNLIKELY(
                             state_val == thread_schedule_state::pending))
                     {
-                        if (HPX_LIKELY(next_thrd == nullptr))
-                        {
-                            // schedule other work
-                            scheduler.wait_or_add_new(num_thread, running,
-                                idle_loop_count, enable_stealing_staged, added);
-                        }
-
                         // schedule this thread again, make sure it ends up at
                         // the end of the queue
                         scheduler.SchedulingPolicy::schedule_thread_last(
@@ -329,6 +322,14 @@ namespace hpx::threads::detail {
                             threads::thread_schedule_hint(
                                 static_cast<std::int16_t>(num_thread)),
                             true);
+
+                        if (HPX_LIKELY(next_thrd == nullptr))
+                        {
+                            // schedule other work
+                            scheduler.wait_or_add_new(num_thread, running,
+                                idle_loop_count, enable_stealing_staged, added);
+                        }
+
                         scheduler.SchedulingPolicy::do_some_work(num_thread);
                     }
                     else if (HPX_UNLIKELY(state_val ==

I think the problem is that currently after the pending thread has been put back onto the (top of) queue the scheduler immediately pulls it back and retries without potentially waiting threads having a change of being work-requested. For other schedulers those will get stolen eventually, thus avoiding the live-lock.

@Pansysk75
Copy link
Member

@hkaiser Doesn't seem to help much... keep in mind that this issue doesn't show up on the workrequesting scheduler when using the default queue, so it's either the mc queue or the stealhalf functionality that is causing the issue (I also tried setting this to false

static constexpr bool support_bulk_dequeue = true;
to see bulk dequeueing was the culprit, but it didn't fix things.

Also, I think this fix is needed (or sth equivalent):

                task_data thrds(d.num_thread_);
-                thrds.tasks_.reserve(max_num_to_steal);


#ifdef HPX_HAVE_THREAD_STEALING_COUNTS
+              thrds.tasks_.reserve(max_num_to_steal);
                thread_id_ref_type thrd;
                while (max_num_to_steal-- != 0 &&
                    d.queue_->get_next_thread(thrd, false, true))
                {
                    d.queue_->increment_num_stolen_from_pending();
                    thrds.tasks_.push_back(HPX_MOVE(thrd));
                    thrd = thread_id_ref_type{};
                }
#else
+              thrds.tasks_.resize(max_num_to_steal);
                d.queue_->get_next_threads(
                    thrds.tasks_.begin(), thrds.tasks_.size(), false, true);

@hkaiser
Copy link
Member Author

hkaiser commented Feb 19, 2024

Also, I think this fix is needed (or sth equivalent):

                task_data thrds(d.num_thread_);
-                thrds.tasks_.reserve(max_num_to_steal);


#ifdef HPX_HAVE_THREAD_STEALING_COUNTS
+              thrds.tasks_.reserve(max_num_to_steal);
                thread_id_ref_type thrd;
                while (max_num_to_steal-- != 0 &&
                    d.queue_->get_next_thread(thrd, false, true))
                {
                    d.queue_->increment_num_stolen_from_pending();
                    thrds.tasks_.push_back(HPX_MOVE(thrd));
                    thrd = thread_id_ref_type{};
                }
#else
+              thrds.tasks_.resize(max_num_to_steal);
                d.queue_->get_next_threads(
                    thrds.tasks_.begin(), thrds.tasks_.size(), false, true);

Uhh, how did it ever work? ;-)

@hkaiser hkaiser force-pushed the workrequesting_scheduler_ci branch from 78626f0 to 1b888c6 Compare February 19, 2024 16:19
@hkaiser
Copy link
Member Author

hkaiser commented Feb 19, 2024

Also, I think this fix is needed (or sth equivalent):

                task_data thrds(d.num_thread_);
-                thrds.tasks_.reserve(max_num_to_steal);


#ifdef HPX_HAVE_THREAD_STEALING_COUNTS
+              thrds.tasks_.reserve(max_num_to_steal);
                thread_id_ref_type thrd;
                while (max_num_to_steal-- != 0 &&
                    d.queue_->get_next_thread(thrd, false, true))
                {
                    d.queue_->increment_num_stolen_from_pending();
                    thrds.tasks_.push_back(HPX_MOVE(thrd));
                    thrd = thread_id_ref_type{};
                }
#else
+              thrds.tasks_.resize(max_num_to_steal);
                d.queue_->get_next_threads(
                    thrds.tasks_.begin(), thrds.tasks_.size(), false, true);

Uhh, how did it ever work? ;-)

That's fixed now. Thanks again!

@hkaiser hkaiser force-pushed the workrequesting_scheduler_ci branch from 1b888c6 to a0dca56 Compare February 19, 2024 17:30
Copy link

codacy-production bot commented Feb 19, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.04% 0.00%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (e977ecc) 217975 185525 85.11%
Head commit (2e610e7) 190892 (-27083) 162547 (-22978) 85.15% (+0.04%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#6428) 1 0 0.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy will stop sending the deprecated coverage status from June 5th, 2024. Learn more

- flyby: fixing buffer allocation
@hkaiser hkaiser force-pushed the workrequesting_scheduler_ci branch from a0dca56 to 2e610e7 Compare April 28, 2024 16:28
@StellarBot
Copy link

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each(=)--

Info

PropertyBeforeAfter
HPX Commitd27ac2e0e8959a
HPX Datetime2024-03-18T14:00:30+00:002024-04-28T16:28:37+00:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Clusternamerostamrostam
Datetime2024-03-18T09:18:04.949759-05:002024-04-28T11:40:07.559807-05:00

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch(=)

Info

PropertyBeforeAfter
HPX Commitd27ac2e0e8959a
HPX Datetime2024-03-18T14:00:30+00:002024-04-28T16:28:37+00:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Clusternamerostamrostam
Datetime2024-03-18T09:19:53.062988-05:002024-04-28T11:41:54.962060-05:00

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add(=)(=)(=)
Stream Benchmark - Scale(=)(=)(=)
Stream Benchmark - Triad=(=)(=)
Stream Benchmark - Copy(=)(=)(=)

Info

PropertyBeforeAfter
HPX Commitd27ac2e0e8959a
HPX Datetime2024-03-18T14:00:30+00:002024-04-28T16:28:37+00:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Clusternamerostamrostam
Datetime2024-03-18T09:20:13.002391-05:002024-04-28T11:42:13.137113-05:00

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

@hkaiser hkaiser modified the milestones: 1.10.0, 1.11.0 May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants