diff --git a/affinity/cpp-20/d0796r3.md b/affinity/cpp-20/d0796r3.md index d287939..f34854c 100644 --- a/affinity/cpp-20/d0796r3.md +++ b/affinity/cpp-20/d0796r3.md @@ -16,6 +16,9 @@ ### P0796r3 (SAN 2018) +* Remove reference counting requirement from `execution_resource`. +* Change lifetime model of `execution_resource`: it now either consistently identifies some underlying resource, or is invalid; context creation rejects an invalid resource. + ### P0796r2 (RAP 2018) * Introduce a free function for retrieving the execution resource underlying the current thread of execution. @@ -38,7 +41,10 @@ This paper provides an initial meta-framework for the drives toward an execution and memory affinity model for C++. It accounts for feedback from the Toronto 2017 SG1 meeting on Data Movement in C++ [[1]][p0687r0] that we should define affinity for C++ first, before considering inaccessible memory as a solution to the separate memory problem towards supporting heterogeneous and distributed computing. -This paper is split into two main parts; firstly a series of executor properties which can be used to apply affinity requirements to bulk execution functions, and secondly an interface for discovering the execution resources within the system topology and querying relative affinity of execution resources. +This paper is split into two main parts: + +1. A series of executor properties which can be used to apply affinity requirements to bulk execution functions. +2. An interface for discovering the execution resources within the system topology and querying relative affinity of execution resources. # Motivation @@ -52,9 +58,9 @@ Operating systems (OSes) traditionally take responsibility for assigning threads The affinity problem is especially challenging for applications whose behavior changes over time or is hard to predict, or when different applications interfere with each other's performance. Today, most OSes already can group processing units according to their locality and distribute processes, while keeping threads close to the initial thread, or even avoid migrating threads and maintain first touch policy. Nevertheless, most programs can change their work distribution, especially in the presence of nested parallelism. -Frequently, data are initialized at the beginning of the program by the initial thread and are used by multiple threads. While some OSes automatically migrate threads or data for better affinity, migration may have high overhead. In an optimal case, the OS may automatically detect which thread access which data most frequently, or it may replicate data which are read by multiple threads, or migrate data which are modified and used by threads residing on remote locality groups. However, the OS often does a reasonable job, if the machine is not overloaded, if the application carefully used first-touch allocation, and if the program does not change its behavior with respect to locality. +Frequently, data are initialized at the beginning of the program by the initial thread and are used by multiple threads. While some OSes automatically migrate threads or data for better affinity, migration may have high overhead. In an optimal case, the OS may automatically detect which thread access which data most frequently, or it may replicate data which are read by multiple threads, or migrate data which are modified and used by threads residing on remote locality groups. However, the OS often does a reasonable job, if the machine is not overloaded, if the application carefully uses first-touch allocation, and if the program does not change its behavior with respect to locality. -Consider a code example *(Listing 1)* that uses the C++17 parallel STL algorithm `for_each` to modify the entries of a `valarray` `a`. The example applies a loop body in a lambda to each entry of the `valarray` `a`, using a parallel execution policy that distributes work in parallel across multiple CPU cores. We might expect this to be fast, but since `valarray` containers are initialized automatically and automatically allocated on the master thread's memory, we find that it is actually quite slow even when we have more than one thread. +Consider a code example *(Listing 1)* that uses the C++17 parallel STL algorithm `for_each` to modify the entries of a `valarray` `a`. The example applies a loop body in a lambda to each entry of the `valarray` `a`, using an execution policy that distributes work in parallel across multiple CPU cores. We might expect this to be fast, but since `valarray` containers are initialized automatically and automatically allocated on the master thread's memory, we find that it is actually quite slow even when we have more than one thread. ```cpp // C++ valarray STL containers are initialized automatically. @@ -62,14 +68,14 @@ Consider a code example *(Listing 1)* that uses the C++17 parallel STL algorithm std::valarray a(N); // Data placement is wrong, so parallel update is slow. -std::for_each(par, std::begin(a), std::end(a), +std::for_each(std::execution::par, std::begin(a), std::end(a), [=] (double& a_i) { a_i *= scalar; }); // Use future affinity interface to migrate data at next // use and move pages closer to next accessing thread. ... // Faster, because data are local now. -std::for_each(par, std::begin(a), std::end(a), +std::for_each(std::execution::par, std::begin(a), std::end(a), [=] (double& a_i) { a_i *= scalar; }); ``` *Listing 1: Parallel vector update example* @@ -111,18 +117,19 @@ In this paper we describe the problem space of affinity for C++, the various cha * How to bind execution and allocation particular execution resource(s). * What kind of and level of interface(s) should be provided by C++ for affinity. -Wherever possible, we also evaluate how an affinity-based solution could be scaled to support both distributed and heterogeneous systems. +Wherever possible, we also evaluate how an affinity-based solution could be scaled to support both distributed and heterogeneous systems. We also have addressed some aspects of dynamic topology discovery. There are also some additional challenges which we have been investigating but are not yet ready to be included in this paper, and which will be presented in a future paper: * How to migrate memory work and memory allocations between execution resources. -* How to support dynamic topology discovery and fault tolerance. +* More general cases of dynamic topology discovery. +* Fault tolerance, as it relates to dynamic topology. ### Querying and representing the system topology The first task in allowing C++ applications to leverage memory locality is to provide the ability to query a *system* for its *resource topology* (commonly represented as a tree or graph) and traverse its *execution resources*. -The capability of querying underlying *execution resources* of a given *system* is particularly important towards supporting affinity control in C++. The current proposal for executors [[22]][p0443r4] leaves the *execution resource* largely unspecified. This is intentional: *execution resources* will vary greatly between one implementation and another, and it is out of the scope of the current executors proposal to define those. There is current work [[23]][p0737r0] on extending the executors proposal to describe a typical interface for an *execution context*. In this paper a typical *execution context* is defined with an interface for construction and comparison, and for retrieving an *executor*, waiting on submitted work to complete and querying the underlying *execution resource*. Extending the executors interface to provide topology information can serve as a basis for providing a unified interface to expose affinity. This interface cannot mandate a specific architectural definition, and must be generic enough that future architectural evolutions can still be expressed. +The capability of querying underlying *execution resources* of a given *system* is particularly important towards supporting affinity control in C++. The current proposal for executors [[22]][p0443r7] mentions execution resources in passing, but leaves the term largely unspecified. This is intentional: *execution resources* will vary greatly between one implementation and another, and it is out of the scope of the current executors proposal to define those. There is current work [[23]][p0737r0] on extending the executors proposal to describe a typical interface for an *execution context*. In this paper a typical *execution context* is defined with an interface for construction and comparison, and for retrieving an *executor*, waiting on submitted work to complete and querying the underlying *execution resource*. Extending the executors interface to provide topology information can serve as a basis for providing a unified interface to expose affinity. This interface cannot mandate a specific architectural definition, and must be generic enough that future architectural evolutions can still be expressed. Two important considerations when defining a unified interface for querying the *resource topology* of a *system*, are (a) what level of abstraction such an interface should have, and (b) at what granularity it should describe the typology's *execution resources*. As both the level of abstraction of an *execution resource* and the granularity that it is described in will vary greatly from one implementation to another, it’s important for the interface to be generic enough to support any level of abstraction. To achieve this we propose a generic hierarchical structure of *execution resources*, each *execution resource* being composed of other *execution resources* recursively. Each *execution resource* within this hierarchy can be used to place memory (i.e., allocate memory within the *execution resource’s* memory region), place execution (i.e. bind an execution to an *execution resource’s execution agents*), or both. @@ -144,9 +151,9 @@ In traditional single-CPU systems, users may reason about the execution resource This assumption, however, does not hold on newer, more complex systems, especially on heterogeneous systems. On these systems, even the type and number of high-level resources available in a particular *system* is not known until the physical hardware attached to a particular system has been identified by the program. This often happens as part of a run-time initialization API [[6]][opencl-2-2] [[7]][hsa] which makes the resources available through some software abstraction. Furthermore, the resources which are identified often have different levels of parallel and concurrent execution capabilities. We refer to this process of identifying resources and their capabilities as *topology discovery*, and we call the point at the point at which this occurs the *point of discovery*. -An interesting question which arises here is whether the *system resource topology* should be fixed at the *point of discovery*, or whether it should be allowed to change during later program execution. We can identify two main reasons for allowing the *system resource topology* to be dynamic after the *point of discovery*: (a) *online resource discovery*, and (b) *fault tolerance*. +An interesting question which arises here is whether the *system resource topology* should be fixed at the *point of discovery*, or whether it should be allowed to change during later program execution. We can identify two main reasons for allowing the *system resource topology* to be dynamic after the *point of discovery*: (a) *dynamic resource discovery*, and (b) *fault tolerance*. -In some systems, hardware can be attached to the system while the program is executing. For example, users may plug in a USB-compute device [[31]][movidius] while the application is running to add additional computational power, or users may have access to hardware connected over a network, but only at specific times. Support for *online resource discovery* would let programs target these situations natively and be reactive to changes to the resources available to a system. +In some systems, hardware can be attached to the system while the program is executing. For example, users may plug in a USB-compute device [[31]][movidius] while the application is running to add additional computational power, or users may have access to hardware connected over a network, but only at specific times. Support for *dynamic resource discovery* would let programs target these situations natively and be reactive to changes to the resources available to a system. Other applications, such as those designed for safety-critical environments, must be able to recover from hardware failures. This requires that the resources available within a system can be queried and can be expected to change at any point during the execution of a program. For example, a GPU may overheat and need to be disabled, yet the program must continue at all costs. *Fault tolerance* would let programs query the availability of resources and handle failures. This could facilitate reliable programming of heterogeneous and distributed systems. @@ -154,17 +161,65 @@ From a historic perspective, programming models for traditional high-performance Some of these programming models also address *fault tolerance*. In particular, PVM has native support for this, providing a mechanism [[27]][pvm-callback] which can notify a program when a resource is added or removed from a system. MPI lacks a native *fault tolerance* mechanism, but there have been efforts to implement fault tolerance on top of MPI [[28]][mpi-post-failure-recovery] or by extensions[[29]][mpi-fault-tolerance]. -Due to the complexity involved in standardizing *dynamic resource discovery* and *fault tolerance*, these are currently out of the scope of this paper. +Due to the complexity involved in standardizing *dynamic resource discovery* and *fault tolerance*, these are currently out of the scope of this paper. However, we leave open the possibility of accommodating both in the future, by not overconstraining *resources*' lifetimes (see next section). + +### Resource lifetime + +The initial solution may only target systems with a single addressable memory region. It may thus exclude devices like discrete GPUs. However, in order to maintain a unified interface going forward, the initial solution should consider these devices and be able to scale to support them in the future. In particular, in order to support heterogeneous systems, the abstraction must let the interface query the *resource topology* of the *system* in order to perform device discovery. + +The *resource* objects returned from the *topology discovery interface* are opaque, implementation-defined objects. They would not perform any scheduling or execution functionality which would be expected from an *execution context*, and they would not store any state related to an execution. Instead, they would simply act as an identifier to a particular partition of the *resource topology*. This means that the lifetime of a *resource* retrieved from an *execution context* must not be tied to the lifetime of that *execution context*. + +The lifetime of a *resource* instance refers to both validity and uniqueness. First, if a *resource* instance exists, does it point to a valid underlying hardware or software resource? That is, could an instance's validity ever change at run time? Second, could a *resource* instance ever point to a different (but still valid) underlying resource? It suffices for now to define "point to a valid underlying resource" informally. We will elaborate this idea later in this proposal. + +Creation of a *context* expresses intent to use the *resource*, not just to view it as part of the *resource topology*. Thus, if a *resource* could ever cease to point to a valid underlying resource, then users must not be allowed to create a *context* from the resource instance, or launch executions with that context. *Context* construction, and use of an *executor* with that *context* to launch an execution, both assert validity of the *context*'s *resource*. + +If a *resource* is valid, then it must always point to the same underlying thing. For example, a *resource* cannot first point to one CPU core, and then suddenly point to a different CPU core. *Contexts* can thus rely on properties like binding of operating system threads to CPU cores. However, the "thing" to which a *resource* points may be a dynamic, possibly software-managed pool of hardware. Here are three examples of this phenomenon: + + 1. The "hardware" may actually be a virtual machine (VM). At any point, the VM may pause, migrate to different physical hardware, and resume. If the VM presents the same virtual hardware before and after the migration, then the *resources* that an application running on the VM sees should not change. + 2. The OS may maintain a pool of a varying number of CPU cores as a shared resource among different user-level processes. When a process stops using the resource, the OS may reclaim cores. It may make sense to present this pool as an *execution resource*. + 3. A low-level device driver on a laptop may switch between a "discrete" GPU and an "integrated" GPU, depending on utilization and power constraints. If the two GPUs have the same instruction set and can access the same memory, it may make sense to present them as a "virtualized" single *execution resource*. + +In summary, a *resource* either identifies a thing uniquely, or harmlessly points to nothing. The section that follows will justify and explain this. + +#### Permit dynamic resource lifetime + +We should not assume that *resource* instances have the same lifetime as the running application. For example, some hardware accelerators like GPUs require calling an initialization function before a running application may use the accelerator, and calling a finalization function after using the accelerator. The software interface for the accelerator may not even be available at application launch time. For instance, the interface may live in a dynamic library that users may load at run time. In the case of a pool of CPU cores managed by the operating system, the application might have to request access to the pool at run time, and the operating system may have to do some work in order to reserve CPU cores and set them up for use in the pool. Applications that do not use the pool should not have to pay this setup cost. The more general cases of dynamic resource discovery and fault tolerance, that we discussed above, also call for dynamic *resource* lifetimes. + +#### Resources should not reference count -### Lifetime considerations +We considered mandating that *execution resources* use reference counting, just like `shared_ptr`. This would clearly define resources' lifetimes. However, there are several arguments against requiring reference counting. -As the execution context would provide a partitioning interface which returns objects describing the components of the system topology of an execution resource, it is important to consider the lifetime of these objects. + 1. Holding a reference to the *execution resource* would prevent execution from shutting down, thus (potentially) deadlocking the program. + 2. Not all kinds of *resources* may have lifetimes that fit reference counting semantics. Some kinds of GPU *resources* only exist during execution, for example; those *resources* cannot be valid if they escape the scope of code that executes on the GPU. In general, programming models that let a "host" processor launch code on a "different processor" have this issue. + 3. Reference counting could have unattractive overhead if accessed concurrently, especially if code wants to traverse a particular subset of the *resource topology* inside a region executing on the GPU (e.g., to access GPU scratch memory). + 4. Since users can construct arbitrary data structures from *resources* in a *resource hierarchy*, the proposal would need another *resource* type analogous to `weak_ptr`, in order to avoid circular dependencies that could prevent releasing *resources*. + 5. There is no type currently in the Standard that has reference-counting semantics, but does not have `shared_` in its name (e.g., `shared_ptr` and `shared_future`). Adding a type like this sets a bad precedent for types with hidden costs and correctness issues (see (4)). -The objects returned from the partitioning interface would be opaque, implementation-defined objects that do not perform any scheduling or execution functionality which would be expected from an *execution context* and would not store any state related to an execution. Instead they would act simply as an identifier to a particular partition of the *resource topology*. +#### What does validity of a resource mean? -For these reasons, *resources* must always outlive any *execution context* which is constructed from them, and any *resource* retrieved from an *execution context* must not be tied to the lifetime of that *execution context*. +Here, we elaborate on what it means for a *resource* to be "valid." This proposal lets users encounter a *resource* either while traversing the *resource topology*, or through a *context* that uses the *resource*. "Viewing" the *resource* in the *resource topology* implies a lower level of "commitment" or "permanence" than using the *resource* in a *context*. In particular, -The initial solution should target systems with a single addressable memory region. It should thus exclude devices like discrete GPUs. In order to maintain a unified interface going forward, the initial solution should consider these devices and be able to scale to support them in the future. In particular, in order to support heterogeneous systems, the abstraction must let the interface query the *resource topology* of the *system* in order to perform device discovery. + 1. Querying the system topology returns a structure of opaque identifiers, the `execution_resource`s, representing a snapshot of the current state of the *system*. + 2. The query may require temporarily initializing underlying resources, but those underlying resources need not stay active after the query. + 3. Ability to iterate a *resource*'s children in the *resource topology* need not imply ability to create a *context* from that *resource*. + 4. Creating a *context* from a *resource* asserts *resource* validity. If the *resource* is invalid, *context* creation must fail. (Compare to how MPI functions report an error if they are called after `MPI_Finalize` has been called on that process.) + 5. Use of a *context* to launch execution asserts *resource* validity, and must thus fail if the *resource* is no longer valid. + +Here is a concrete example. Suppose that company "Aleph" makes an accelerator that can be viewed as a *resource*, and that has its own child *resources*. Users must call `Aleph_initialize()` in order to see the accelerator and its children as *resources* in the *resource topology*. Users must call `Aleph_finalize()` when they are done using the accelerator. + + Questions: + + 1. What should happen if users are traversing the *resource topology*, but never use the accelerator's *resource* (other than to iterate past it), and something else concurrently calls `Aleph_finalize()`? + 2. What should happen if users are traversing the accelerator's child *resources*, and something else concurrently calls `Aleph_finalize()`? + 3. What should happen if users try to create an *execution context* from the accelerator's *resource*, after `Aleph_finalize()` has been called? + 4. What should happen to outstanding *execution contexts* that use the accelerator's *resource*, if something calls `Aleph_finalize()` after the *context* was created? + + Answers: + + 1. Nothing bad must happen. Topology queries return a snapshot. Users must be able to iterate past an invalidated *resource*. If users are iterating a *resource* R's children and one child becomes invalid, that must not invalidate R or the iterators to its children. + 2. Iterating the children after invalidation of the parent must not be undefined behavior, but the child *resources* remain invalid. Attempts to view and iterate the children of the child *resources* may (but need not) fail. + 3. *Context* creation asserts *resource* validity. If the *resource* is invalid, *context* creation must fail. (Compare to how MPI functions report an error if they are called after `MPI_Finalize` has been called on that process.) + 4. Use of a *context* in an *executor* to launch execution asserts *resource* validity, and must thus fail if the *resource* is not longer valid. ### Querying the relative affinity of partitions @@ -178,13 +233,13 @@ This feature could be easily scaled to heterogeneous and distributed systems, as ## Overview -In this paper we propose an interface for querying and representing the execution resources within a system, querying the relative affinity metric between those execution resources, and then using those execution resources to allocate memory and execute work with affinity to the underlying hardware. The interface described in this paper builds on the existing interface for executors and execution contexts defined in the executors proposal [[22]][p0443r4]. +In this paper we propose an interface for querying and representing the execution resources within a system, querying the relative affinity metric between those execution resources, and then using those execution resources to allocate memory and execute work with affinity to the underlying hardware. The interface described in this paper builds on the existing interface for executors and execution contexts defined in the executors proposal [[22]][p0443r7]. ### Interface granularity In this paper is split into two main parts: * A series of executor properties describe desired behavior when using parallel algorithms or libraries. These properties provide a low granularity and is aimed at users who may have little or no knowledge of the system architecture. -* A series of execution resource topology mechanisms for discovering detailed information about the system's topology and affinity properties which can be used to hand optimise parallel applications and libraries for the best performance. These mechanisms provide a high granularity and is aimed at users who have a high knowledge of the system architecture. +* A series of execution resource topology mechanisms for discovering detailed information about the system's topology and affinity properties which can be used to hand optimize parallel applications and libraries for the best performance. These mechanisms provide a high granularity and is aimed at users who have a high knowledge of the system architecture. ## Executor properties @@ -221,7 +276,7 @@ An `execution_resource` is a lightweight structure which acts as an identifier t ### System topology -The system topology is made up of a number of system-level `execution_resource`s, which can be queried through `this_system::get_resources` which returns a `std::vector`. A run-time library may initialize the `execution_resource`s available within the system dynamically. However, this must be done before `main` is called, given that after that point, the system topology may not change. +The system topology is made up of a number of system-level `execution_resource`s, which can be queried through `this_system::get_resources` which returns a `std::vector`. A run-time library may initialize the `execution_resource`s available within the system dynamically. However, `this_system::get_resources` must be thread safe and must initialize and finalize any third-party or OS state before returning. Below *(Listing 3)* is an example of iterating over the system-level resources and printing out their capabilities. @@ -424,11 +479,11 @@ A *thread of execution* can be requested to bind to a particular `execution_reso The `bulk_execution_affinity_t` property describes what guarantees executors provide about the binding of *execution agent*s to the underlying *execution resource*s. -bulk_execution_affinity_t provides nested property types and objects as described below. These properties are behavioral properties as described in [[22]][p0443r4] so must adhere to the requirements of behavioral properties and the requirements described below. +bulk_execution_affinity_t provides nested property types and objects as described below. These properties are behavioral properties as described in [[22]][p0443r7] so must adhere to the requirements of behavioral properties and the requirements described below. | Nested Property Type | Nested Property Name | Requirements | |----------------------|----------------------|--------------| -| bulk_execution_affinity_t::none_t | bulk_execution_affinity_t::none | A call to an executor's bulk execution function may or may not bind the *execution agent*s to the underlying *execution resource*s. The affinity binding pattern may or may not be consistent across invocations of the executor's bulk execution function. | +| bulk_execution_affinity_t::none_t | bulk_execution_affinity_t::none | A call to an executor's bulk execution function may or may not bind the *execution agent*s to the underlying *execution resource*s. The affinity binding pattern may or may not be consistent across invocations of the executor's bulk execution function. | | bulk_execution_affinity_t::scatter_t | bulk_execution_scatter_t::scatter | A call to an executor's bulk execution function must bind the *execution agent*s to the underlying *execution resource*s such that they are distributed across the *execution resource*s where each *execution agent* far from it's preceding and following *execution agent*s. The affinity binding pattern must be consistent across invocations of the executor's bulk execution function. | | bulk_execution_affinity_t::compact_t | bulk_execution_compact_t::compact | A call to an executor's bulk execution function must bind the *execution agent*s to the underlying *execution resource*s such that they are in sequence across the *execution resource*s where each *execution agent* close to it's preceding and following *execution agent*s. The affinity binding pattern must be consistent across invocations of the executor's bulk execution function. | | bulk_execution_affinity_t::balanced_t | bulk_execution_balanced_t::balanced | A call to an executor's bulk execution function must bind the *execution agent*s to the underlying *execution resource*s such that they are in sequence and evenly spread across the *execution resource*s where each *execution agent* is close to it's preceding and following *execution agent*s and all *execution resource*s are utilized. The affinity binding pattern must be consistent across invocations of the executor's bulk execution function. | @@ -443,7 +498,7 @@ bulk_execution_affinity_t provides nested property types and objects as describe The `execution_resource` class provides an abstraction over a system's hardware, that can allocate memory and/or execute lightweight execution agents. An `execution_resource` can represent further `execution_resource`s. We say that these `execution_resource`s are *members of* this `execution_resource`. -> [*Note:* The `execution_resource` is required to be implemented such that the underlying software abstraction is initialized when the `execution_resource` is constructed, maintained through reference counting, and cleaned up on destruction of the final reference. *--end note*] +> [*Note:* Creating an `execution_resource` may require initializing the underlying software abstraction when the `execution_resource` is constructed, in order to discover other `execution_resource`s accessible through it. However, an `execution_resource` is nonowning. *--end note*] ### `execution_resource` constructors @@ -496,7 +551,7 @@ The `execution_context` class provides an abstraction for managing a number of l using executor_type = see-below; -*Requires:* `executor_type` is an implementation defined class which satisfies the general executor requires, as specified by P0443r5. +*Requires:* `executor_type` is an implementation defined class which satisfies the general executor requires, as specified by [[22]][p0443r7]. using pmr_memory_resource_type = see-below; @@ -629,6 +684,9 @@ The free function `this_thread::get_resource` is provided for retrieving the `ex *Returns:* The `execution_resource` underlying the current thread of execution. +> [*Note:* The `execution_resource` underlying the current thread of execution may not necessarily be reachable from "top-level" resources visible through `this_system`. *--end note*] + + # Future Work ## How should we define the execution context? @@ -636,9 +694,9 @@ The free function `this_thread::get_resource` is provided for retrieving the `ex This paper currently defines the execution context as a concrete type which provides the essential interface requires to be constructed from an `execution_resource` and to provide an affinity-based `allocator` or `pmr::memory_resource` and `executor`. However going forward there are a few different directions the execution context could take: -* A) The execution context could be **the** standard execution context type, which can be used polymorphically in place of any concrete execution context type in a similar way to the polymorphic executor [[22]][p0443r4]. This approach allows it to interoperate well with any concrete execution context type, however it may be very difficult to define exactly what this type should look like as the different kinds of execution contexts are still being developed and all the different requirements are still to be fully understood. -* B) The execution context could be a concrete executor type itself, used solely for the purpose of being constructed from and managing a set of `execution_resource`s. This approach would allow the execution context to be tailored specific for it's original purpose, however it would be more difficult to support interoperability with other concrete execution context types. -* C) The execution context could be simply a concept, similar to `OnewayExecutor` or `BulkExecutor`, for executors, where it requires the execution context type to provide the required interface for managing *execution_resource*s. This approach would allow for any concrete execution context type to support necessary interface for managing execution resources by simply implementing the requirements of the concept, and would avoid defining any concrete or generic execution context type. +* A) The execution context could be **the** standard execution context type, which can be used polymorphically in place of any concrete execution context type in a similar way to the polymorphic executor [[22]][p0443r7]. This approach allows it to interoperate well with any concrete execution context type, however it may be very difficult to define exactly what this type should look like as the different kinds of execution contexts are still being developed and all the different requirements are still to be fully understood. +* B) The execution context could be a concrete executor type itself, used solely for the purpose of being constructed from and managing a set of `execution_resource`s. This approach would allow the execution context to be tailored specifically for its intended purpose, but would hinder interoperability with other concrete execution context types. +* C) The execution context could be simply a concept, similar to `OnewayExecutor` or `BulkExecutor` for executors, that requires the execution context type to provide the required interface for managing *execution_resource*s. This approach would allow for any concrete execution context type to support the necessary interface for managing execution resources by simply implementing the requirements of the concept. It would also avoid defining any concrete or generic execution context type. | Straw Poll | |------------| @@ -666,25 +724,17 @@ With the ability to place memory with affinity comes the ability to define algor ## Level of abstraction -The current proposal provides an interface for querying whether an `execution_resource` can allocate and/or execute work, it can provide the concurrency it supports and it can provide a name. We also provide the `affinity_query` structure for querying the relative affinity metrics between two `execution_resource`s. However, this may not be enough information for users to take full advantage of the system. For example, they may also want to know what kind of memory is available or the properties by which work is executed. We decided that attempting to enumerate the various hardware components would not be ideal, as that would make it harder for implementors to support new hardware. We think a better approach would be to parameterize the additional properties of hardware such that hardware queries could be much more generic. +The current proposal provides an interface for querying whether an `execution_resource` can allocate and/or execute work, it can provide the concurrency it supports and it can provide a name. We also provide the `affinity_query` structure for querying the relative affinity metrics between two `execution_resource`s. However, this may not be enough information for users to take full advantage of the system. For example, they may also want to know what kind of memory is available or the properties by which work is executed. We decided that attempting to enumerate the various hardware components would not be ideal, as that would make it harder for implementers to support new hardware. We think a better approach would be to parameterize the additional properties of hardware such that hardware queries could be much more generic. -We may wish to mirror the design of the executors proposal and have a generic query interface using properties for querying information about an `execution_resource`. We expect that an implementation may provide additional nonstandard, implementation-specific queries. +We may wish to mirror the design of the executors proposal [[22]][p0443r7] and have a generic query interface using properties for querying information about an `execution_resource`. We expect that an implementation may provide additional nonstandard, implementation-specific queries. | Straw Poll | |------------| | Is this the correct approach to take? If so, what should such an interface look like and what kind of hardware properties should we expose? | -## Dynamic topology discovery - -The current proposal requires that all `execution_resource`s are initialized before `main` is called. This therefore does not permit an `execution_resource` to become available or go off-line at run time. We may wish to support this in the future, however this is outside of the scope of this paper at the moment. - -| Straw Poll | -|------------| -| Should we support dynamically adding and removing `execution_resource`s at run time? | - -# Acknowledgements +# Acknowledgments -Thanks to Christopher Di Bella, Toomas Remmelg and Morris Hafner for their reviews and suggestions. +Thanks to Christopher Di Bella, Toomas Remmelg, and Morris Hafner for their reviews and suggestions. # References @@ -750,8 +800,9 @@ Thanks to Christopher Di Bella, Toomas Remmelg and Morris Hafner for their revie [lstopo]: https://www.open-mpi.org/projects/hwloc/lstopo/ [[21]][lstopo] Portable Hardware Locality Istopo -[p0443r4]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0443r4.html -[[22]][p0443r4] A Unified Executors Proposal for C++ +[p0443r7]: +http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0443r7.html +[[22]][p0443r7] A Unified Executors Proposal for C++ [p0737r0]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0737r0.html [[23]][p0737r0] P0737r0 : Execution Context of Execution Agents