Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing: RMA AxpyInterface #34

Open
wants to merge 112 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 109 commits
Commits
Show all changes
112 commits
Select commit Hold shift + click to select a range
969d18d
modified axpy:localtoglobal code to mimic nblocking consensus as show…
sg0 Jun 12, 2014
b8e2e8d
forgot to add mpi-3 macros around a variable
sg0 Jun 12, 2014
61071e7
preposting receives, alternative to iprobe
sg0 Jun 12, 2014
2d52d96
contains code to avoid probe overhead for short messages, Jeffs rma i…
sg0 Jun 17, 2014
28faf25
committing after a week, removed prepost related stuff because of poo…
sg0 Jun 26, 2014
a9c658c
merging with upstream
sg0 Jun 26, 2014
b3e2ec9
forgot a const
sg0 Jun 26, 2014
1f9eb06
populating/fixing rmainterface...
sg0 Jun 26, 2014
e636c1d
intermediate commits
sg0 Jun 27, 2014
aa88f1b
updated macro names - use-barrier to use-nbc, also added provisions t…
sg0 Jun 30, 2014
e4a5d6c
comments
sg0 Jun 30, 2014
d64331e
forgot to add a constructor
sg0 Jun 30, 2014
6c844d7
added a requestfree function to free spurious requests, also added a …
sg0 Jul 1, 2014
e12bd08
fixing put
sg0 Jul 2, 2014
f55685f
modifications to fix errors in p/g/a
sg0 Jul 7, 2014
18598b7
intermediate commit, cleaning up rma interface
sg0 Jul 7, 2014
5432b3f
removed spurious instantiations
sg0 Jul 7, 2014
8a36dd5
added some checks in p/g/a
sg0 Jul 7, 2014
26034bd
intermediate commit
sg0 Jul 7, 2014
46648f9
added a comment in Get
sg0 Jul 7, 2014
e160ce6
fixed dtype of mpi-get, updated comment
sg0 Jul 7, 2014
3fa7135
intermediate commit - introduce another variable to track attachments
sg0 Jul 8, 2014
54eba3e
bugs wrt get-put flags
sg0 Jul 8, 2014
cd00d7d
fixing detach
sg0 Jul 8, 2014
add6b7d
added another flag to track detach, as dtor will call detach again, p…
sg0 Jul 8, 2014
d03441f
temporarily reverting back to something that works...the idea is, whe…
sg0 Jul 8, 2014
2a914b2
added rmainterface test in line with axpyinterface, removed all heade…
sg0 Jul 9, 2014
a759aba
called flush local after p/g/a, modified put-getvector types and init…
sg0 Jul 11, 2014
7c169ed
added disp in mpi functions
sg0 Jul 11, 2014
15263ff
figured out the displacement, Acc contains correct displacement, modi…
sg0 Jul 16, 2014
194298e
addding interface with no local flush, IMO this is temporary, for som…
sg0 Jul 16, 2014
8cbc72f
fixed acc and flush...next is get
sg0 Jul 16, 2014
77918b2
trying to fix get, tracked some problem in acc, will fix it soon
sg0 Jul 16, 2014
51efcae
removed op from acc in rmainterface as well as mpi, mpi has overloade…
sg0 Jul 16, 2014
da57bb1
intermediate commit to fixing get
sg0 Jul 16, 2014
be45ddf
fixed get
sg0 Jul 16, 2014
dfe393e
trying to fix acc...
sg0 Jul 16, 2014
b0411ba
modified one-sided interface significantly, using typemap everywhere …
sg0 Jul 18, 2014
c18aab4
fixing rma interface - intmd commit
sg0 Jul 21, 2014
3a4ca35
removed const_cast and changed type of put-get vectors from byte to T
sg0 Jul 21, 2014
47862e0
turning off codepath not required for nb consensus
sg0 Jul 22, 2014
27c8da6
some modifications for nb consensus
sg0 Jul 23, 2014
ec43dd3
restructure detach for nb consensus
sg0 Jul 23, 2014
b70731e
add a progress routine
sg0 Jul 23, 2014
5fc36e9
added strided/vector api, untested...removed scale from put
sg0 Jul 23, 2014
f76e507
committing test codes, further updates upon testing would ensue
sg0 Jul 24, 2014
5de32c4
some more testing, better debug prints
sg0 Jul 24, 2014
9f29de0
added HF proxy using original AxpyInterface, got to test well
sg0 Jul 24, 2014
c6a067b
following AXPYinterface syntax
sg0 Jul 24, 2014
79cddb9
exploring an alternate design of nbc, where notification is sent imme…
sg0 Jul 25, 2014
cbd7ff5
made a modification in nb consensus, unless there is a strong reason …
sg0 Jul 28, 2014
e88c840
commented out some of the stuff I had for nbc, it will get completely…
sg0 Jul 28, 2014
c8ac446
intermediate commit, removed code paths that required request objects…
sg0 Jul 29, 2014
e24ae01
removed scale as a parameter from RmaInterface, added a macro to enab…
sg0 Jul 29, 2014
1eea9cd
compacted acc function, got to test this
sg0 Jul 29, 2014
85dde6a
remove const from Get params, in Get matrix always need to be updatable
sg0 Jul 29, 2014
bca226c
move winfree and winunlock in try block
sg0 Jul 29, 2014
e7b44c9
removing a barrier in detach, probably not required, we'll see
sg0 Jul 31, 2014
b5e1ddb
testing an implementation for nbc, this should be slightly fast, if n…
sg0 Aug 4, 2014
bc0441a
minor
sg0 Aug 5, 2014
8436525
fixing nbc
sg0 Aug 5, 2014
062565c
cp axpyinterface to axpyint2...optimized 2 sided implementation
sg0 Aug 6, 2014
793a1a9
intermediate commit for axpy 2.0, to fix rma and orig nbc soon
sg0 Aug 9, 2014
db50fa2
intermediate commits, got to fix nbc next, then start over with axpy2
sg0 Aug 9, 2014
941a054
non blocking consensus is possible only for local to global in the cu…
sg0 Aug 10, 2014
718ca7e
intermediate commit, fixing send/recv
sg0 Aug 11, 2014
5dc3697
fixing axpy 2.0...
sg0 Aug 12, 2014
78eb494
the coordinates are wrong (to-do next), but the structure is taking s…
sg0 Aug 12, 2014
c63fda3
intermediate commit
sg0 Aug 13, 2014
fa86f05
packed/unpacked send/receive for local to global...still bugfixing
sg0 Aug 14, 2014
c58be27
remove all packed send/recv stuff, falling back to send/recving coord…
sg0 Aug 14, 2014
3b055f3
updated the nbc version of axpy original, this is better than before
sg0 Aug 14, 2014
02d3b14
modify the ssends to isend for small messages
sg0 Aug 16, 2014
757eb3c
redesigned the nextindex logic completely, I got to do the same thing…
sg0 Aug 18, 2014
924b36c
flush modified, snipped some junk code...however, we should move away…
sg0 Aug 18, 2014
e34ad17
separate struct to handle data and coord...we send coord and data sep…
sg0 Aug 18, 2014
94f8f82
tweaking rma
sg0 Aug 18, 2014
f5dba36
removing functionality to store put/acc statuses...will use flush always
sg0 Aug 18, 2014
40de290
adding local flush after put-acc
sg0 Aug 18, 2014
67624b7
remove flush local from put/acc
sg0 Aug 22, 2014
e4e7218
add a barrier after nbc stuff in original axpyinterface; replace isse…
sg0 Aug 28, 2014
6b4a6e2
axpy2: adding put/get/acc functions which are blocking, the nb ones a…
sg0 Sep 2, 2014
55e4d4b
added axpy blocking interface, added waitall in mpi...blocking functi…
sg0 Sep 10, 2014
af9f3ca
added some new functions in mpi, still fixing blocking interfaces...g…
sg0 Sep 17, 2014
a8a70a8
three small test cases that tests axpy2 blocking and nb, and rma
sg0 Sep 17, 2014
63f2f22
adding a sample makefile for tests, will remove it in future though
sg0 Sep 17, 2014
1bfa059
modified most of axpy2.0, got to fix get tomorrow
sg0 Oct 8, 2014
8d28666
fixing handle_ functions
sg0 Oct 8, 2014
e93dbc6
fixed flush implementation...more testing required
sg0 Oct 9, 2014
9c22a83
modifying Get
sg0 Oct 9, 2014
628f6b0
fixed get
sg0 Oct 10, 2014
048875b
fixed Iget, Get and Flush (for Ig/p/a)
sg0 Oct 31, 2014
ce206e9
modified blocking interfaces
sg0 Nov 8, 2014
a9c85ea
updated get/acc/put...got to test them with larger PEs
sg0 Nov 8, 2014
d6eddd6
updated readinc to localflush instead of flush remote
sg0 Nov 10, 2014
a515315
introduced rma-blocking interface for put/get and ensured local compl…
sg0 Nov 19, 2014
2f802f1
forgot to add cflush in header
sg0 Nov 19, 2014
7ba1cce
forgot to add function signatures
sg0 Nov 19, 2014
ef56b30
test on data rather than coords
sg0 Nov 20, 2014
803305c
added local completion logic
sg0 Nov 20, 2014
57093d6
changed the data structure, now we have coords and matrices as struct…
sg0 Jan 27, 2015
01a44a2
modified locally blocking routines Acc/Put - memcpying input Z to an …
sg0 Feb 11, 2015
5e96051
added const interfaces, some new functions for rmainterface for reque…
sg0 Feb 13, 2015
95c9c4b
added some const interfaces, and fixed an error for request based rma…
sg0 Feb 14, 2015
2fb071c
delete test case
sg0 Feb 23, 2015
85ffff3
merge
sg0 Feb 24, 2015
6bb2242
merge
sg0 Feb 24, 2015
86f4a80
added a macro for rma axpy, so rma axpy gets enabled when mpi version…
sg0 Feb 24, 2015
b235bd2
moved axpy related macros from header file to cmake build files
sg0 Feb 24, 2015
ebb4fa6
applied astyle to fix indentation, would do some more commits
sg0 Feb 25, 2015
0bf9fc5
final round of indentation checks
sg0 Feb 25, 2015
361de46
minor indentation
sg0 Feb 25, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,11 @@ option(EL_USE_64BIT_INTS "Use 64-bit integers where possible" OFF)
option(EL_USE_CUSTOM_ALLTOALLV "Avoid MPI_Alltoallv for performance reasons" ON)
option(EL_BARRIER_IN_ALLTOALLV "Barrier before posting non-blocking recvs" OFF)

# MPI misc.
# Enable MPI-3 routines
option(EL_ENABLE_RMA_AXPY "Choose new Rma Axpy interface implemented using MPI-3 one sided routines" ON)
option(EL_USE_IBARRIER_FOR_AXPY "Use MPI-3 IBarrier for synchronization in AxpyInterface" ON)

# If the version of METIS packaged with Elemental is to be built (the default),
# then no METIS-specific variables need to be specified, but if the user prefers
# to use their own version, then the root path of the installation should be
Expand Down
5 changes: 5 additions & 0 deletions cmake/config.h.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,11 @@
#cmakedefine EL_VECTOR_WARNINGS
#cmakedefine EL_AVOID_OMP_FMA

/* MPI-3 related */
#cmakedefine EL_ENABLE_RMA_AXPY
#cmakedefine EL_USE_IBARRIER_FOR_AXPY


#cmakedefine EL_DECLSPEC
#ifdef EL_DECLSPEC
# define EL_EXPORT __declspec(dllexport)
Expand Down
2 changes: 2 additions & 0 deletions include/El/core.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,8 @@ template<typename T=double,Dist U=MC,Dist V=MR> class BlockDistMatrix;
#include "El/core/random/decl.hpp"
#include "El/core/random/impl.hpp"
#include "El/core/AxpyInterface.hpp"
#include "El/core/RmaInterface.hpp"
#include "El/core/AxpyInterface2.0.hpp"

#include "El/core/Graph.hpp"
// TODO: Sequential map
Expand Down
60 changes: 39 additions & 21 deletions include/El/core/AxpyInterface.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ class AxpyInterface
public:
AxpyInterface();
~AxpyInterface();

AxpyInterface( AxpyType type, DistMatrix<T,MC,MR>& Z );
AxpyInterface( AxpyType type, const DistMatrix<T,MC,MR>& Z );

Expand All @@ -41,48 +41,66 @@ class AxpyInterface
void Detach();

private:
#if MPI_VERSION>=3 && defined(EL_USE_IBARRIER_FOR_AXPY)
static const Int
DATA_TAG =1,
DATA_REQUEST_TAG=2,
DATA_REPLY_TAG =3;
#else
static const Int
DATA_TAG =1,
EOM_TAG =2,
DATA_REQUEST_TAG=3,
DATA_REPLY_TAG =4;
#endif

//request object for polling on Issends
bool attachedForLocalToGlobal_, attachedForGlobalToLocal_;

DistMatrix<T,MC,MR>* localToGlobalMat_;
const DistMatrix<T,MC,MR>* globalToLocalMat_;

vector<bool> sentEomTo_, haveEomFrom_;
vector<byte> recvVector_;
vector<mpi::Request> eomSendRequests_;

vector<deque<vector<byte>>> dataVectors_, requestVectors_, replyVectors_;
vector<deque<bool>> sendingData_, sendingRequest_, sendingReply_;
vector<deque<mpi::Request>>
#if MPI_VERSION>=3 && defined(EL_USE_IBARRIER_FOR_AXPY)
#else
std::vector<bool> sentEomTo_, haveEomFrom_;
std::vector<mpi::Request> eomSendRequests_;
#endif

std::vector<std::deque<bool>>
sendingData_, sendingRequest_, sendingReply_;
std::vector<std::deque<mpi::Request>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::vector and std::deque were imported into the El namespace, and so it is now okay (and preferred) to use just "vector" and "deque" (the same holds for "cout", "cerr", and "endl").

dataSendRequests_, requestSendRequests_, replySendRequests_;


std::vector<byte> recvVector_;
std::vector<std::deque<std::vector<byte>>>
dataVectors_, requestVectors_, replyVectors_;

byte sendDummy_, recvDummy_;

// Progress functions
#if MPI_VERSION>=3 && defined(EL_USE_IBARRIER_FOR_AXPY)
bool ReturnRequestStatuses();
#else
// Check if we are done with this attachment's work
bool Finished();

// Progress functions
void UpdateRequestStatuses();
void HandleEoms();
void HandleLocalToGlobalData();
void HandleGlobalToLocalRequest();
void StartSendingEoms();
void FinishSendingEoms();

void AxpyLocalToGlobal( T alpha, const Matrix<T>& X, Int i, Int j );
void AxpyGlobalToLocal( T alpha, Matrix<T>& Y, Int i, Int j );
void UpdateRequestStatuses();
#endif

Int ReadyForSend
( Int sendSize,
deque<vector<byte>>& sendVectors,
deque<mpi::Request>& requests,
deque<bool>& requestStatuses );
std::deque<std::vector<byte>>& sendVectors,
std::deque<mpi::Request>& requests,
std::deque<bool>& requestStatuses );

void HandleLocalToGlobalData();
void HandleGlobalToLocalRequest();

void AxpyLocalToGlobal( T alpha, const Matrix<T>& X, Int i, Int j );
void AxpyGlobalToLocal( T alpha, Matrix<T>& Y, Int i, Int j );
};

} // namespace El

#endif // ifndef EL_AXPYINTERFACE_HPP
147 changes: 147 additions & 0 deletions include/El/core/AxpyInterface2.0.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
/*
This file is part of Elemental and is under the BSD 2-Clause License,
which can be found in the LICENSE file in the root directory, or at
http://opensource.org/licenses/BSD-2-Clause
*/
#pragma once
#ifndef EL_AXPYINTERFACE2_HPP
#define EL_AXPYINTERFACE2_HPP

namespace El {
template<typename T>
class AxpyInterface2
{
public:
AxpyInterface2();
~AxpyInterface2();

AxpyInterface2( DistMatrix<T,MC,MR>& Z );
AxpyInterface2( const DistMatrix<T,MC,MR>& Z );

// collective epoch initialization routines
void Attach( DistMatrix<T,MC,MR>& Z );
void Attach( const DistMatrix<T,MC,MR>& Z );
void Detach();

// remote update routines

// requires Flush for local+remote
// completion
void Iput( Matrix<T>& Z, Int i, Int j );
void Iput( const Matrix<T>& Z, Int i, Int j );

void Iget( Matrix<T>& Z, Int i, Int j );

void Iacc( Matrix<T>& Z, Int i, Int j );
void Iacc( const Matrix<T>& Z, Int i, Int j );

// locally blocking update routines
// reuse input buffer when returns
void Acc( Matrix<T>& Z, Int i, Int j );
void Acc( const Matrix<T>& Z, Int i, Int j );

void Put( Matrix<T>& Z, Int i, Int j );
void Put( const Matrix<T>& Z, Int i, Int j );

// End to End blocking
// will be deprecated soon
void Eacc( Matrix<T>& Z, Int i, Int j );
void Eacc( const Matrix<T>& Z, Int i, Int j );

void Eput( Matrix<T>& Z, Int i, Int j );
void Eput( const Matrix<T>& Z, Int i, Int j );

void Get( Matrix<T>& Z, Int i, Int j );

// synchronization routines
void Flush( Matrix<T>& Z );
void Flush( const Matrix<T>& Z );

private:

static const Int
DATA_PUT_TAG =1,
DATA_GET_TAG =2,
DATA_ACC_TAG =3,
REQUEST_GET_TAG =4,
COORD_ACC_TAG =5,
COORD_PUT_TAG =6;

// struct for passing data
struct matrix_params_
{
const void *base_;
std::vector<std::deque<std::vector<T>>>
data_;
std::vector<std::deque<mpi::Request>>
requests_;
std::vector<std::deque<bool>>
statuses_;
};

std::vector<struct matrix_params_> matrices_;

// struct for passing coordinates
struct coord_params_
{
const void *base_;
std::vector<std::deque<std::array<Int, 3>>>
coord_;
std::vector<std::deque<mpi::Request>>
requests_;
std::vector<std::deque<bool>>
statuses_;
};

std::vector<struct coord_params_> coords_;

// for blocking interface
// copying input buffer in this
// intermediate buffer so that input
// buffer could be reused
std::vector<std::vector<std::vector< T >>>
dataVectors_;

DistMatrix<T,MC,MR>* GlobalArrayPut_;
const DistMatrix<T,MC,MR>* GlobalArrayGet_;

bool toBeAttachedForPut_, toBeAttachedForGet_,
attached_, detached_;

// next index for data and coord
Int NextIndexData (
Int target,
Int dataSize,
const void* base_address,
Int *mindex);

Int NextIndexCoord (
Int i, Int j,
Int target,
const void* base_address,
Int *cindex);

bool Testall();
bool Test( Matrix<T>& Z );
bool Test( const Matrix<T>& Z );
bool TestAny( Matrix<T>& Z );
bool TestAny( const Matrix<T>& Z );

void Waitall();
void Wait( Matrix<T>& Z );
void Wait( const Matrix<T>& Z );
void WaitAny( Matrix<T>& Z );
void WaitAny( const Matrix<T>& Z );

// these are only used for nonblocking
// update rountines
void HandleGlobalToLocalData( Matrix<T>& Z );

void HandleLocalToGlobalData( Matrix<T>& Z, Int source );
void HandleLocalToGlobalAcc( Matrix<T>& Z, Int source );

void HandleLocalToGlobalData( const Matrix<T>& Z, Int source );
void HandleLocalToGlobalAcc( const Matrix<T>& Z, Int source );
};
} // namespace El
#endif // ifndef EL_AXPYINTERFACE2_HPP
Loading