class: center, middle
Previous: Hello World! - Options and Running Applications
??? Click here to view the Presentation
- DISCLAIMER: This has no real physics or mathematical background. Left as an excercise for the reader ;)
.right-column[
- Running a 5 point stencil on each element:
new(x,y) = 0.25 * (old(x-1,y) + old(x+1,y) + old(x,y-1) + old(x,y+1)) - old(x,y)
- Boundaries are a constant
1
- High-Level Modern C++ ]
- Goal: Store Grid as
std::vector<double>
- Goal: Store Grid as
std::vector<double>
- Goal: Store Grid as
std::vector<double>
- We want to update line by line
- Each line needs its upper and lower neighbor
--
template <typename InIter, typename OutIter>
OutIter line_update(InIter begin, InIter end, OutIter result)
{
++result;
// Iterate over the interior: skip the last and first element
for(InIter it = begin + 1; it != end - 1; ++it, ++result)
{
*result = 0.25 * (it.up[-1] + it.up[+1] + it.down[-1] + it.down[+1])
- *it.middle;
}
++result;
return result;
}
- Feasible to make it hierarchical:
- One iterator that "moves" in y-Direction (row wise)
row_iterator
- One iterator that "moves" in x-Direction (operates on a line)
line_iterator
- One iterator that "moves" in y-Direction (row wise)
template <typename UpIter, typename MiddleIter, typename DownIter>
struct line_iterator
{
UpIter up;
MiddleIter middle;
DownIter down;
void increment() {
++up;
++middle;
++down;
}
void decrement() {
--up;
--middle;
--down;
}
void advance(std::ptrdiff_t n) {
up += n;
middle += n;
down += n;
}
};
template <typename UpIter, typename MiddleIter, typename DownIter>
struct line_iterator
// iterator_facade is a facade class that defines the boilerplate needed for
// a proper standard C++ iterator.
: hpx::util::iterator_facade<
// Our type:
line_iterator<UpIter, MiddleIter, DownIter>,
// Value type (When dereferencing the iterator)
double,
// Our iterator is random access.
std::random_access_iterator_tag>
{
// <snip>
bool equal(line_iterator const& other) const {
return middle == other.middle;
}
typename base_type::reference dereference() const {
return *middle;
}
std::ptrdiff_t distance_to(line_iterator const& other) const {
return other.middle - middle;
}
};
template <typename UpIter, typename MiddleIter, typename DownIter>
struct row_iterator
// iterator_facade is a facade class that defines the boilerplate needed for
// a proper standard C++ iterator.
: hpx::util::iterator_facade<
// Our type:
row_iterator<UpIter, MiddleIter, DownIter>,
// Value type (When dereferencing the iterator)
line_iterator<UpIter, MiddleIter, DownIter>,
// Our iterator is random access.
std::random_access_iterator_tag,
// Since dereferencing should return a new line_iterator, we need to
// explicitly set the reference type.
line_iterator<UpIter, MiddleIter, DownIter>
>
{
typedef line_iterator<UpIter, MiddleIter, DownIter> line_iterator_type;
row_iterator(std::size_t Nx, MiddleIter middle_)
: up_(middle - Nx)
, middle(middle_)
, down_(middle + Nx)
, Nx_(Nx)
{}
// More next slide ...
// ... Continuing
MiddleIter middle;
line_iterator<UpIter, MiddleIter, DownIter> dereference() const
{
return line_iterator<UpIter, MiddleIter, DownIter>(up_, middle, down_);
}
void advance(std::ptrdiff_t n)
{
up_ += (n * Nx_);
middle += (n * Nx_);
down_ += (n * Nx_);
}
UpIter up_;
DownIter down_;
std::size_t Nx_;
// other functions go here...
};
// Initialization
typedef std::vector<double> data_type;
std::array<data_type, 2> U;
U[0] = data_type(Nx * Ny, 0.0);
U[1] = data_type(Nx * Ny, 0.0);
init(U, Nx, Ny);
typedef row_iterator<std::vector<double>::iterator> iterator;
// Construct our column iterators. We want to begin with the second
// row to avoid out of bound accesses.
iterator curr(Nx, U[0].begin());
iterator next(Nx, U[1].begin());
for (std::size_t t = 0; t < steps; ++t)
{
// We store the result of our update in the next middle line.
// We need to skip the first row.
auto result = next.middle + Nx;
// Iterate over the interior: skip the first and last column
for(auto it = curr + 1; it != curr + Ny - 1; ++it)
{
result = line_update(*it, *it + Nx, result);
}
std::swap(curr, next);
}
- Instead of using a for loop, use a Parallel Algorithm.
- But which?
--
- One we I haven't shown:
for_loop
for_loop
// We store the result of our update in the next middle line.
hpx::parallel::for_loop(policy,
curr + 1, curr + Ny-1,
// We need to advance the result by one row each iteration
hpx::parallel::induction(next.middle + Nx, Nx),
[Nx](iterator it, data_type::iterator result)
{
line_update(*it, *it + Nx, result);
}
);
- Get the NUMA targets
- Create right allocator
- Create right data structure
- Create right executor
- Create the parallel execution policy
- Solution
- Synchronization between partitions
- Possibilities:
- Have one master that dispatches to other localities
- Use neighborhood synchronization and let every locality do the same
- Have one master that dispatches to other localities
- Synchronization between partitions
- Possibilities:
- Have one master that dispatches to other localities
- Left as an excercise for the audience ;)
- Use neighborhood synchronization and let every locality do the same
- Using a channel
- Have one master that dispatches to other localities
- A Global object you can set and get values from multiple times:
- sender uses
channel<T>::get(launch_policy, generation)
- receiver uses
channel<T>::set(launch_policy, value, generation)
- sender uses
- Generation can be used to differentiate between iterations!
- Use the source, Luke
hpx::lcos::channel<T>
is a client to a component.- GIDs and clients can be associated with symbolic names
hpx::register_with_basename(name, channel, rank);
- GIDS and clients can be found by its symbolic names
channel = hpx::find_from_basename<channel<T>>(name, rank);
-
If a partition has an upper neighbor
- It needs to send its first row up
- It needs to recieve the last row from its upper neighbor
-
If a partition has a lower neighbor
- It needs to send its last row down
- It needs to recieve the first row from its lower neighbor
- Update first row boundary
if (comm.has_neighbor(communicator_type::up))
{
// Get the first row.
auto result = next.middle;
// retrieve the row which is 'up' from our first row.
std::vector<double> up = comm.get(communicator_type::up, t).get();
// Create a row iterator with that top boundary
auto it = curr.top_boundary(up);
// After getting our missing row, we can update our first row
line_update(it, it + Nx, result);
// Finally, we can send the updated first row for our neighbor
// to consume in the next timestep. Don't send if we are on
// the last timestep
comm.set(communicator_type::up,
std::vector<double>(result, result + Nx), t + 1);
}
- Update last row boundary
// Update our lower boundary if we have an interior partition and a
// neighbor below
if (comm.has_neighbor(communicator_type::down))
{
// Get the last row.
auto result = next.middle + (Ny - 2) * Nx;
// retrieve the row which is 'down' from our last row.
std::vector<double> down = comm.get(communicator_type::down, t).get();
// Create a row iterator with that bottom boundary
auto it = (curr + Ny - 2).bottom_boundary(down);
// After getting our missing row, we can update our last row
line_update(it, it + Nx, result);
// Finally, we can send the updated last row for our neighbor
// to consume in the next timestep. Don't send if we are on
// the last timestep
comm.set(communicator_type::down,
std::vector<double>(result, result + Nx), t + 1);
}
-
The default problem size is pretty small.
- You might want to consider increasing the problem size (--Nx, --Ny command line arguments)
-
We use striping in y-Direction
- Not a very wise partitioning scheme
- Easy for demonstration purposes
class: center, middle
class: center, middle