-
Notifications
You must be signed in to change notification settings - Fork 42
Adding shmem_malloc_with_hints interface #259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
90bfe71
77f4c85
c1d9ce6
c3125cd
90a3ee4
016bf85
150e79f
7f01139
79a4f10
e0f80af
7c6c93b
45f4540
3b0b10a
5cd618c
bf4de2e
e7d011c
755a4ed
fc143db
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,90 @@ | ||
|
|
||
| \apisummary{ | ||
| Collective memory allocation routine with support for providing hints. | ||
| } | ||
|
|
||
| \begin{apidefinition} | ||
|
|
||
| \begin{Csynopsis} | ||
| void *@\FuncDecl{shmem\_malloc\_with\_hints}@(size_t size, long hints); | ||
| \end{Csynopsis} | ||
|
|
||
| \begin{apiarguments} | ||
| \apiargument{IN}{size}{The size, in bytes, of a block to be | ||
| allocated from the symmetric heap. This argument is of type \CTYPE{size\_t}} | ||
| \apiargument{IN}{hints}{A bit array of hints provided by the user to the implementation} | ||
| \end{apiarguments} | ||
|
|
||
|
|
||
| \apidescription{ | ||
|
|
||
| The \FUNC{shmem\_malloc\_with\_hints} routine, like \FUNC{shmem\_malloc}, returns a pointer to a block of at least | ||
| \VAR{size} bytes, which shall be suitably aligned so that it may be | ||
| assigned to a pointer to any type of object. This space is allocated from | ||
| the symmetric heap (similar to \FUNC{shmem\_malloc}). When the \VAR{size} is zero, | ||
| the \FUNC{shmem\_malloc\_with\_hints} routine performs no action and returns a null pointer. | ||
|
|
||
| In addition to the \VAR{size} argument, the \VAR{hints} argument is provided by the user. | ||
| The \VAR{hints} describes the expected manner in which the \openshmem program may use the allocated memory. | ||
| The valid usage hints are described in Table~\ref{usagehints}. Multiple hints may be requested by combining them with a bitwise \CONST{OR} operation. | ||
| A zero option can be given if no options are requested. | ||
|
|
||
| The information provided by the \VAR{hints} is used to optimize for performance by the implementation. | ||
| If the implementation cannot optimize, the behavior is same as \FUNC{shmem\_malloc}. | ||
| If more than one hint is provided, the implementation will make the best effort to use one or more hints | ||
| to optimize performance. | ||
|
|
||
| The \FUNC{shmem\_malloc\_with\_hints} routine is provided so that multiple \acp{PE} in a program can allocate symmetric, | ||
| remotely accessible memory blocks. When no action is performed, these | ||
| routines return without performing a barrier. Otherwise, the routine will call a barrier on exit. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What does this mean? That the function returns NULL for all PEs, no memory has been allocated, and no implicit barrier has occurred? Some applications may use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When no allocation is done, “dropping the implicit barrier” is the behavior we have for shmem_malloc in OpenSHMEM 1.4 - please refer page 26 line 40-41. The proposal is aiming to maintain the same behavior for that case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oops. In OpenSHMEM 1.4, that was not the case. This behavior was changed between 1.4 and now in #201. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah… Thanks for correcting that. It has been so long that we debated about this I did not realize it is relatively new. I was looking at my git copy. :) Can we consider this issue resolved? |
||
| This ensures that all \acp{PE} participate in the memory allocation, and that the memory on other | ||
| \acp{PE} can be used as soon as the local \ac{PE} returns. The implicit barrier performed by this routine will quiet the | ||
| default context. It is the user's responsibility to ensure that no communication operations involving the given memory block are pending on | ||
| other contexts prior to calling the \FUNC{shmem\_free} and \FUNC{shmem\_realloc} routines. | ||
manjugv marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| The user is also responsible for calling these routines with identical argument(s) on all | ||
| \acp{PE}; if differing \VAR{size}, or \VAR{hints} arguments are used, the behavior of the call | ||
| and any subsequent \openshmem calls is undefined. | ||
| } | ||
|
|
||
| \apireturnvalues{ | ||
| The \FUNC{shmem\_malloc\_with\_hints} routine returns a pointer to the allocated space; | ||
| otherwise, it returns a null pointer. | ||
| } | ||
|
|
||
| \begin{longtable}{|p{0.45\textwidth}|p{0.5\textwidth}|} | ||
| \hline | ||
| \textbf{Hints} & \textbf{Usage hint} | ||
| \tabularnewline \hline | ||
| \endhead | ||
| %% | ||
| \newline | ||
| \CONST{0} & | ||
| \newline | ||
| Behavior same as \FUNC{shmem\_malloc} | ||
| \tabularnewline \hline | ||
|
|
||
|
|
||
| \LibConstDecl{SHMEM\_HINT\_ATOMICS\_REMOTE} & | ||
| \newline | ||
| Memory used for \VAR{atomic} operations | ||
| \tabularnewline \hline | ||
|
|
||
| \LibConstDecl{SHMEM\_HINT\_SIGNAL} & | ||
| \newline | ||
| Memory used for \VAR{signal} operations | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens when the flag was not used, but signal operation was used ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm .. Which flag are you referring to, can you please elaborate? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Above |
||
| \tabularnewline \hline | ||
|
|
||
| \TableCaptionRef{Memory usage hints} | ||
| \label{usagehints} | ||
| \end{longtable} | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also we should to state that neither alignment requirements or memory properties, such as cache line size are not get impacted by the hint. We also shell state the memory semantics for local assess (load and store) are not impacted by the hint. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIU, for the shmem_malloc routine the implementation is free to allocate the memory, which is either cache aligned or not. One of the constraints is that it should be word-aligned. Similarly, the memory access model (which is yet be defined or clarified here #229) will provide certain access guarantees to the memory allocated by shmem_malloc. In both cases, the proposal intends to follow the semantics of shmem_malloc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The memory gets eventually mapped into the core. The core architecture defined multiple way how it can be mapped. Each on of the mapping has own semantics and constrains. For example semantics between normal cacheable (write back) and non-cacheable (WC) is very different. Since user has direct access through the pointer, code that worked on one machine will break on another with exception or even data corruption. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In order to support something like this, you have to remove shmem_ptr and prohibit any direct asses to the memory through load and store semantics. Next you would have to introduce shmem_memcpy function to copy-in-out shmem_malloced region. |
||
|
|
||
| \apinotes{ | ||
| The \openshmem programs should allocate memory with | ||
| \CONST{SHMEM\_HINT\_ATOMICS\_REMOTE}, when the majority of | ||
| operations performed on this memory are atomic operations, and origin | ||
| and target \ac{PE} of the atomic operations do not share a memory domain | ||
| .i.e., symmetric objects on the target \ac{PE} is not accessible using | ||
| load/store operations from the origin \ac{PE} or vice versa. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor: I would consider to move the text under There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I understand the definition of remote vs local wrt this change, I'm not sure whether there is need to differentiate the local side to the signal. All signals from the source PE are passed-by-value to the remote signal address buffer. A buffer created with SHMEM_HINT_SIGNAL will mostly be updated by some remote PE. void shmem_put_signal(shmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems,
uint64_t *sig_addr, uint64_t signal, int sig_op, int pe); |
||
| } | ||
| \end{apidefinition} | ||
| \newpage | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must all PEs allocate from the same special memory? It's not clear if asymmetry can exist. Does this impose additional implicit synchronization for each subset configuration of hints if it cannot satisfy the entire hint list?
Also, what happens if you OR
SHMEM_HINT_NONEwith other hint behaviors?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question.
All use cases that I have thought of requires the memory to be symmetric (same kind of memory).
Regarding extra synchronization, it depends on the implementation. If the implementations maintain asymmetric memory sizes (say each PE starts with different amount of special memory) on the PEs, you might need the extra synchronization for agreement. Otherwise, I do not see a need. In a way, it is similar to current DRAM allocations. Also, for the implementation we explored, we did not need extra synchronization.
I’m reluctant to add such a constraint. Without such constraint, the implementations are free to explore either approach (symmetric and asymmetric).
Do see value in specifying one way or other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, what would using
long hint = SHMEM_HINT_NONE | SHMEM_HINT_LOW_LAT_MEM | SHMEM_HINT_HIGH_BW_MEMdo?Dropping
SHMEM_HINT_NONE, what if the platform could provideSHMEM_HINT_LOW_LAT_MEMorSHMEM_HINT_HIGH_BW_MEMbut not both simultaneously? Will we see application code marked up like this because who doesn't want to use low latency and high bandwidth memory for their application? Does the "best effort" default to a platform-specific precedence? The only feedback is that an allocation succeeded or it did not.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though this is a legal usage, it does not make sense to use. The implementations are allowed to default to shmem_malloc in this case.
My intention with this statement was to provide flexibility for the implementations to optimize as they wish when the user provides multiple hints. Obviously, some combinations of hints might not make sense. In such cases, If the implementations want to give precedence of one hint over others, the proposal allows it. That (assigning priorities to hints) is one way to implement it, but not the only way.