class: center, middle
- and how are they different from normal threads
- the dividing line between OS and application
- it requires rethinking of some codes
- how deep into the internals should you go
- and how do you find our way around it
-
It's the job of the operating system to manage 'hardware level' or 'kernel level' OS threads.
- OS threads are expensive to create/destroy
- OS threads take a timeslice of CPU time
- Creating too many of them can hamper performance
-
On Startup HPX creates one 'worker' thread per core
- On each hardware thread, HPX runs its own Task Scheduler.
-
HPX (Threads)/Tasks are executed on the HPX worker thread
- which is really an OS thread
- (you must be careful not to block the underlying thread - if you do, no tasks can run on it)
- each HPX 'task' is referred to as a lightweight thread
-
Wikipedia "A thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system"
-
The philosophy behind lightweight threads is to switch from one task to another as quickly as possible as soon as anything
- finishes
- needs to wait (suspend)
-
Suspended tasks
- Many tasks can run on the same worker thread (one after another)
- or intermixed as one task suspends and another resumes
-
HPX does not ever interrupt your task directly
- the OS worker thread may be suspended as it loses its time slice
- the HPX task running at the time is therefore suspended
- other (dependent) tasks might in turn suspend
-
Many small tasks are the key to sucess
- with minimal dependencies on other tasks (if possible)
- not too small (overheads of management), not too big (starvation of worker threads)
- enough tasks in the queue helps ensure that no CPU cycles are wasted
-
Similar to OpenMP/TBB, but ...
- OpenMP has parallel regions where a thread pool executes your loops/tasks
- outside those regions, code runs as usual (on normal 'OS threads')
-
With HPX, the runtime is always active
- the runtime is started on program startup
- it stays active until program termination
- there are no parallel regions
- everything is running on an HPX thread
- everything is part of a task
-
Disclaimer
- you can manually start/stop the runtime (if you really want to)
- See example in tutorials
-
It's implemented as a C++ library/framework
- provides threads/futures/locks/schedulers/etc
- allows the user to ignore the threads/schedulers/locks
- gives the user control over them too
-
The user doesn't manage threads directly (usually)
- user creates tasks
- synchronizes between/amongst them
- shouldn't need to worry about low level synchronization primitives
- (but sometimes needs to - see tutorial 'latency' example)
- Threads can be 'managed' with special executors (e.g. OpenGL etc)
-
The runtime does its best to schedule tasks
- but it can be abused easily
- either by accident (careless programming)
- or deliberately
-
A good understanding of how it all works really helps
-
The principal means of synchronization in HPX is the
future<T>
-
A future is the result of something that hasn't yet run
-
A future is set by one thread/task (directly, by using a promise)
- using
promise.set_value(stuff);
- or by returing a value from a task/call such as
future<T> x = async(...)
- using
-
The value is retrieved by another thread/task
- using
auto value = future.get();
syntax
- using
-
Futures in HPX are extended to support continuations
future2 = future1.then(something_else);
-
And
when_xxx(one_or_more_futures)
syntax -
From these building blocks one can make complex DAGs
- A different approach to writing your code
- your program should be a tree of tasks
- each task will depend on other tasks
- and have child tasks
- You don't have to have the same on all nodes (c.f. MPI)
- Make these gaps as small as possible
- Keep breaking tasks into smaller tasks so the scheduler can fill gaps
- limit of task size/granularity is a function of overheads
- context switch (actually swapping stack, registers etc)
- time to create, queue, dequeue a task
-
Breaking a program into tasks should be straightforward
-
More functional
- tasks should accept inputs and return results
- modifying global state should be avoided
- race conditions and other thread related problems (deadlocks etc)
-
the leaf nodes of the tree are the smallest bits of work you can express
- but those leaf nodes might be broken further by HPX
- even
parallel::for(...)
loops decompose into tasks - all parallel::algorithm's are made up of tasks
-
HPX differs from (most) other libraries because the same API and the same scheduling/runtime can be used for the whole heirarchy of tasks
-
We aim to replace OpenMP+MPI+Acc with a single framework
- based soundly on C++
- from top to bottom (of the task tree)
- Same code, same hardware, same compiler, same performance
- Try to provide abstractions for tasks without excessive overheads.
Comparison of DPLASMA (PaRSEC scheduler) cholesky decomposition and HPX implementation (Oct 2017)
.left-column[
- Even Microsoft are amazed!
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script> ] .right-column[ ![tweet](images/tweet-2.png) ]Dear HPX: How? pic.twitter.com/12u1n8bBdi
— Billy O'Neal (@MalwareMinigun) June 29, 2017
-
Each task goes onto one of the schedulers
-
where a task goes is controlled by executors
-
schedulers maintain a high priority and normal queue
-
schedulers can steal (some do, some don't)
-
you can choose a scheduler when you create an executor (not really though)
-
-
Tasks can be
-
Running : context is active, code is running as it would on native thread
-
Suspended : task ran, but then had to wait for something
-
Staged : task has been created, but cannot be run yet
-
Pending :it is ready to run, but waiting in a queue
-
Terminated : awaiting cleanup
-
-
A task that is running requires a value from a future
-
the future is not ready (:disappointed:)
- Use CPS - don't call get ever!
-
auto val = future->get()
would block (if we were not HPX) -
the current task cannot progress so it changes state to suspended
-
the scheduler puts it into the suspended list
-
the future that was needed has the suspended task added to its internal (shared state) waiting list
-
when that future become ready, the task will be changed (back) to pending
-
and go back onto the queue so that when a worker is free, it can run
-
-
The same process happens when a task tries to take a lock but can't get it
-
The shared state inside the mutex will
notify
the task and do the 'right thing' -
the current task cannot progress so it changes state to suspended
- Look for spinlock mutexes in the HPX source
- Spin for a bit, then suspend when some deadline reached
-
the scheduler puts it into the suspended list
-
-
This is one reason why all the
std::thread
,std::mutex
etc code has been reimplemented- You can use std::lock_guard<> etc, but not the mutexes inside them
- the locks are just wrappers around the mutexes where the real work is done
-
Like a suspended task, but it hasn't run yet
-
A staged task is what exists when a continuation or
when_xxx
creates a task but it can't run until the dependencies are satisfied -
It's a suspended task that hasn't yet started
-
When is the task actually created?
-
When the code that instantiates it is executed
-
Inside continuations this might be when another task completes
future.then(another_task.then(more_tasks));
-
Outside continuations it can be 'up-front'
- a loop generating futures and attaching to previous iterations
future[i] = future[i-1].then(another_task);
- it can be confusing
- a loop generating futures and attaching to previous iterations
-
Session (Resource management) will look again at this question
-
-
MPI uses send/receive
- (it has one sided RMA too)
- there is an implied synchronization between the endpoints
-
HPX uses active messages
- a remote function invocation + arguments
- the arguments are the 'message' data
-
A remote funtion invocation create a task on the remote node
- the task goes into the queue of the remote node
- it is no different to any other task
- it might not have any relation to other tasks running there
- it can synchronize with other tasks
-
Channels
- More about them later/tomorrow
- Work locally or remotely and are a very nice abstraction (Go?)
-
AGAS is the Active Global Address Space
- A distributed in memory kev/value store
-
Active
- Objects can move around, but their Id remains the same
- Migration of things
- execute a task on data object(id)
- the task goes to where the data is
- Objects can move around, but their Id remains the same
-
No communicators currently
- but "Components" fill this gap.
- Components are handles to objects on remote nodes
- this pointer for distributed mode
-
You can write an HPX application that only starts on locality 0
- meaning
int main
runs on locality 0 but other nodes just sit in a waiting loop - node 0 triggers a
do_work
action on remote nodes - how you coordinate these tasks is up to you
- meaning
-
You can run different binaries on differnt nodes
- provising the
actions
on each node match up
- provising the
-
Different platforms on differnt nodes are supported
- but unlikey to be used for most of our HPC codes
- HPX has a runtime
- Everything is a task (at some level)
- HPX provides a parallel STL ... and parallel for etc
auto range = boost::irange<std::size_t>(0, hpx::get_os_thread_count());
//
for_each(par.with(static_chunk_size())
, std::begin(range), std::end(range)
// For each element in the range, call the following lambda
, [id](std::size_t num_thread)
{
std::cout
<< "Hello, I am HPX Thread " << num_thread
<< " executed on Locality " << id << std::endl;
//
compute_something_interesting();
}
);
- We are still writing C++, but HPX consumes this lambda and turns it into tasks
- Those tasks can be as big or as small as you like
-
HPX is implementing the proposals for concurrency and parallelism in c++
-
What works and is successful will probably go into c++2x etc
- what doesn't, will be subsumed into something else that does
**
"It's no use going back to yesterday, because I was a different person then."
Lewis Carroll, Alice in Wonderland
- Watch the film "Memento"
https://github.com/STEllAR-GROUP/hpx
Use the issue tracker to post your bug reports.
Look at the examples and unit tests to get ideas.
"You can get support via the #ste||ar IRC channels at freenode, where you will often find knowledgeable people willing to help with your problems and questions related to HPX."
Subscribe and post your questions here [email protected] (hardly anyone does, they use IRC and github issues)
http://stellar-group.github.io/hpx/docs/html
The best place to look for a first try
class: center, middle