Name	Name	Last commit message	Last commit date
parent directory ..
images	images
README.md	README.md
index.html	index.html

Worked 2D Stencil Example

From Serial to Distributed

Overview

Previous: Hello World! - Options and Running Applications

??? Click here to view the Presentation

The Problem we want to solve

"Solving" a PDE on a 2D Grid

DISCLAIMER: This has no real physics or mathematical background. Left as an excercise for the reader ;)

Running a 5 point stencil on each element:
new(x,y) = 0.25 * (old(x-1,y) + old(x+1,y) + old(x,y-1) + old(x,y+1)) - old(x,y)
Boundaries are a constant 1
High-Level Modern C++ ]

Modeling the Stencil in C++

Memory structure

Goal: Store Grid as std::vector<double>

Modeling the Stencil in C++

Memory structure

Goal: Store Grid as std::vector<double>

Modeling the Stencil in C++

Memory structure

Goal: Store Grid as std::vector<double>

Modeling the Stencil in C++

Iterating the Grid: Applying the stencil

We want to update line by line
Each line needs its upper and lower neighbor

template <typename InIter, typename OutIter>
OutIter line_update(InIter begin, InIter end, OutIter result)
{
    ++result;
    // Iterate over the interior: skip the last and first element
    for(InIter it = begin + 1; it != end - 1; ++it, ++result)
    {
        *result = 0.25 * (it.up[-1] + it.up[+1] + it.down[-1]  + it.down[+1])
            - *it.middle;
    }
    ++result;

    return result;
}

Modeling the Stencil in C++

Iterating the Grid: Designing the Iterator

Feasible to make it hierarchical:
- One iterator that "moves" in y-Direction (row wise)
  row_iterator
- One iterator that "moves" in x-Direction (operates on a line)
  line_iterator

Modeling the Stencil in C++

Iterating the Grid: The Line Iterator

template <typename UpIter, typename MiddleIter, typename DownIter>
struct line_iterator
{
    UpIter up;
    MiddleIter middle;
    DownIter down;

    void increment() {
        ++up;
        ++middle;
        ++down;
    }

    void decrement() {
        --up;
        --middle;
        --down;
    }

    void advance(std::ptrdiff_t n) {
        up += n;
        middle += n;
        down += n;
    }
};

Modeling the Stencil in C++

Iterating the Grid: The Line Iterator, make it a proper iterator

template <typename UpIter, typename MiddleIter, typename DownIter>
struct line_iterator
    // iterator_facade is a facade class that defines the boilerplate needed for
    // a proper standard C++ iterator.
  : hpx::util::iterator_facade<
        // Our type:
        line_iterator<UpIter, MiddleIter, DownIter>,
        // Value type (When dereferencing the iterator)
        double,
        // Our iterator is random access.
        std::random_access_iterator_tag>
{
    // <snip>

    bool equal(line_iterator const& other) const {
        return middle == other.middle;
    }

    typename base_type::reference dereference() const {
        return *middle;
    }

    std::ptrdiff_t distance_to(line_iterator const& other) const {
        return other.middle - middle;
    }
};

Complete code

Modeling the Stencil in C++

Iterating the Grid: The Row Iterator

template <typename UpIter, typename MiddleIter, typename DownIter>
struct row_iterator
    // iterator_facade is a facade class that defines the boilerplate needed for
    // a proper standard C++ iterator.
  : hpx::util::iterator_facade<
        // Our type:
        row_iterator<UpIter, MiddleIter, DownIter>,
        // Value type (When dereferencing the iterator)
        line_iterator<UpIter, MiddleIter, DownIter>,
        // Our iterator is random access.
        std::random_access_iterator_tag,
        // Since dereferencing should return a new line_iterator, we need to
        // explicitly set the reference type.
        line_iterator<UpIter, MiddleIter, DownIter>
    >
{
    typedef line_iterator<UpIter, MiddleIter, DownIter> line_iterator_type;

    row_iterator(std::size_t Nx, MiddleIter middle_)
      : up_(middle - Nx)
      , middle(middle_)
      , down_(middle + Nx)
      , Nx_(Nx)
    {}
    // More next slide ...

Modeling the Stencil in C++

Iterating the Grid: The Row Iterator, continued

    // ... Continuing
    MiddleIter middle;

    line_iterator<UpIter, MiddleIter, DownIter> dereference() const
    {
        return line_iterator<UpIter, MiddleIter, DownIter>(up_, middle, down_);
    }

    void advance(std::ptrdiff_t n)
    {
        up_ += (n * Nx_);
        middle += (n * Nx_);
        down_ += (n * Nx_);
    }

    UpIter up_;
    DownIter down_;
    std::size_t Nx_;

    // other functions go here...
};

Complete code

Modeling the Stencil in C++

Putting it all together - Initialization

    // Initialization
    typedef std::vector<double> data_type;

    std::array<data_type, 2> U;

    U[0] = data_type(Nx * Ny, 0.0);
    U[1] = data_type(Nx * Ny, 0.0);

    init(U, Nx, Ny);

Modeling the Stencil in C++

Putting it all together - The solver

    typedef row_iterator<std::vector<double>::iterator> iterator;
    // Construct our column iterators. We want to begin with the second
    // row to avoid out of bound accesses.
    iterator curr(Nx, U[0].begin());
    iterator next(Nx, U[1].begin());

    for (std::size_t t = 0; t < steps; ++t)
    {
        // We store the result of our update in the next middle line.
        // We need to skip the first row.
        auto result = next.middle + Nx;

        // Iterate over the interior: skip the first and last column
        for(auto it = curr + 1; it != curr + Ny - 1; ++it)
        {
            result = line_update(*it, *it + Nx, result);
        }

        std::swap(curr, next);
    }

Complete code

Modeling the Stencil in C++ with HPX

Adding shared memory parallelism

Instead of using a for loop, use a Parallel Algorithm.
But which?

One we I haven't shown: for_loop

Modeling the Stencil in C++ with HPX

Adding shared memory parallelism

for_loop

        // We store the result of our update in the next middle line.
        hpx::parallel::for_loop(policy,
            curr + 1, curr + Ny-1,
            // We need to advance the result by one row each iteration
            hpx::parallel::induction(next.middle + Nx, Nx),
            [Nx](iterator it, data_type::iterator result)
            {
                line_update(*it, *it + Nx, result);
            }
        );

Modeling the Stencil in C++ with HPX

Adding NUMA awareness

Get the NUMA targets
Create right allocator
Create right data structure
Create right executor
Create the parallel execution policy
Solution

Modeling the Stencil in C++ with HPX

Going distributed

Synchronization between partitions
Possibilities:
- Have one master that dispatches to other localities
- Use neighborhood synchronization and let every locality do the same

Modeling the Stencil in C++ with HPX

Going distributed

Synchronization between partitions
Possibilities:
- Have one master that dispatches to other localities
  - Left as an excercise for the audience ;)
- Use neighborhood synchronization and let every locality do the same
  - Using a channel

The Channel LCO

Synchronization between Threads of Control

A Global object you can set and get values from multiple times:
- sender uses channel<T>::get(launch_policy, generation)
- receiver uses channel<T>::set(launch_policy, value, generation)
Generation can be used to differentiate between iterations!
Use the source, Luke

Symbolic names

Resolving Global Objects by name

hpx::lcos::channel<T> is a client to a component.
GIDs and clients can be associated with symbolic names
- hpx::register_with_basename(name, channel, rank);
GIDS and clients can be found by its symbolic names
- channel = hpx::find_from_basename<channel<T>>(name, rank);

Read the Documentation!

Parallelization

Communication between Partitions

If a partition has an upper neighbor
- It needs to send its first row up
- It needs to recieve the last row from its upper neighbor
If a partition has a lower neighbor
- It needs to send its last row down
- It needs to recieve the first row from its lower neighbor
Full Code

Modeling the Stencil in C++ with HPX

First Distributed Version

Update first row boundary

if (comm.has_neighbor(communicator_type::up))
{
    // Get the first row.
    auto result = next.middle;
    // retrieve the row which is 'up' from our first row.
    std::vector<double> up = comm.get(communicator_type::up, t).get();
    // Create a row iterator with that top boundary
    auto it = curr.top_boundary(up);

    // After getting our missing row, we can update our first row
    line_update(it, it + Nx, result);

    // Finally, we can send the updated first row for our neighbor
    // to consume in the next timestep. Don't send if we are on
    // the last timestep
    comm.set(communicator_type::up,
        std::vector<double>(result, result + Nx), t + 1);
}

Full Code

Modeling the Stencil in C++ with HPX

First Distributed Version

Update last row boundary

// Update our lower boundary if we have an interior partition and a
// neighbor below
if (comm.has_neighbor(communicator_type::down))
{
    // Get the last row.
    auto result = next.middle + (Ny - 2) * Nx;
    // retrieve the row which is 'down' from our last row.
    std::vector<double> down = comm.get(communicator_type::down, t).get();
    // Create a row iterator with that bottom boundary
    auto it = (curr + Ny - 2).bottom_boundary(down);
    // After getting our missing row, we can update our last row
    line_update(it, it + Nx, result);

    // Finally, we can send the updated last row for our neighbor
    // to consume in the next timestep. Don't send if we are on
    // the last timestep
    comm.set(communicator_type::down,
        std::vector<double>(result, result + Nx), t + 1);
}

Full Code

Modeling the Stencil in C++ with HPX

Notes on running the example

The default problem size is pretty small.
- You might want to consider increasing the problem size (--Nx, --Ny command line arguments)
We use striping in y-Direction
- Not a very wise partitioning scheme
- Easy for demonstration purposes

Modeling the Stencil in C++ with HPX

How good does it scale?

Resource managment - Keeping things under control

Files

session5

Directory actions

More options

Directory actions

More options

Latest commit

History

session5

Folders and files

parent directory

README.md

Worked 2D Stencil Example

From Serial to Distributed

The Problem we want to solve

"Solving" a PDE on a 2D Grid

Modeling the Stencil in C++

Memory structure

Modeling the Stencil in C++

Memory structure

Modeling the Stencil in C++

Memory structure

Modeling the Stencil in C++

Iterating the Grid: Applying the stencil

Modeling the Stencil in C++

Iterating the Grid: Designing the Iterator

Modeling the Stencil in C++

Iterating the Grid: The Line Iterator

Modeling the Stencil in C++

Iterating the Grid: The Line Iterator, make it a proper iterator

Modeling the Stencil in C++

Iterating the Grid: The Row Iterator

Modeling the Stencil in C++

Iterating the Grid: The Row Iterator, continued

Modeling the Stencil in C++

Putting it all together - Initialization

Modeling the Stencil in C++

Putting it all together - The solver

Modeling the Stencil in C++ with HPX

Adding shared memory parallelism

Modeling the Stencil in C++ with HPX

Adding shared memory parallelism

Modeling the Stencil in C++ with HPX

Adding NUMA awareness

Modeling the Stencil in C++ with HPX

Going distributed

Modeling the Stencil in C++ with HPX

Going distributed

The Channel LCO

Synchronization between Threads of Control

Symbolic names

Resolving Global Objects by name

Parallelization

Communication between Partitions

Modeling the Stencil in C++ with HPX

First Distributed Version

Modeling the Stencil in C++ with HPX

First Distributed Version

Modeling the Stencil in C++ with HPX

Notes on running the example

Modeling the Stencil in C++ with HPX

How good does it scale?

Next