forked from memkind/memkind
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
390 lines (294 loc) · 16.3 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
# **MEMKIND**
[![Build Status](https://travis-ci.org/memkind/memkind.svg)](https://travis-ci.org/memkind/memkind)
[![MEMKIND version](https://img.shields.io/github/tag/memkind/memkind.svg)](https://github.com/memkind/memkind/releases/latest)
[![Coverage Status](http://codecov.io/github/memkind/memkind/coverage.svg?branch=master)](http://codecov.io/gh/memkind/memkind?branch=master)
[![Packaging status](https://repology.org/badge/tiny-repos/memkind.svg)](https://repology.org/project/memkind/versions)
The memkind library is a user extensible heap manager built on top of
jemalloc which enables control of memory characteristics and a
partitioning of the heap between kinds of memory. The kinds of memory
are defined by operating system memory policies that have been applied
to virtual address ranges. Memory characteristics supported by
memkind without user extension include control of NUMA and page size
features. The jemalloc non-standard interface has been extended to
enable specialized arenas to make requests for virtual memory from the
operating system through the memkind partition interface. Through the
other memkind interfaces the user can control and extend memory
partition features and allocate memory while selecting enabled
features. Memkind interface allows to create and control file-backed memory
(PMEM kind) on specified device.
## Contents
1. [Interfaces](#interfaces)
2. [Dependencies](#dependencies)
3. [Building and Installing](#building-and-installing)
* [jemalloc](#jemalloc)
4. [Run Requirements](#run-requirements)
5. [Kind Requirements](#kind-requirements)
6. [Kernel](#kernel)
7. [NVDIMM volatile usage](#nvdimm-volatile-usage)
* [DAX device](#dax-device)
* [DAX filesystem](#dax-filesystem)
8. [The Detection Mechanism of the Kind](#the-detection-mechanism-of-the-kind)
9. [Setting Logging Mechanism](#setting-logging-mechanism)
10. [Setting Heap Manager](#setting-heap-manager)
11. [Testing](#testing)
12. [Simulating High Bandwidth Memory](#simulating-high-bandwidth-memory)
13. [Notes](#notes)
14. [Status](#status)
15. [Disclaimer](#disclaimer)
## Interfaces
The memkind library delivers four interfaces:
* hbwmalloc.h - recommended for high-bandwidth memory use cases (stable)
* memkind.h - generic interface for more complex use cases (stable)
* pmem_allocator.h - the C++ allocator that satisfies the C++ Allocator
[requirements](https://en.cppreference.com/w/cpp/named_req/Allocator)
used for PMEM memory use cases (stable)
* memkind_allocator.h - the C++ allocator that satisfies the C++ Allocator
[requirements](https://en.cppreference.com/w/cpp/named_req/Allocator)
used for static kinds (stable)
For more detailed information about those interfaces see
corresponding manpages (located in man/ subdir):
man memkind
man hbwmalloc
man pmemallocator
man memkindallocator
## Dependencies
You will need to install required packages on the build system:
* **autoconf**
* **automake**
* **gcc-c++**
* **libnuma-devel**
* **libtool**
* **numactl-devel**
* **unzip**
For using automatic recognition of PMEM NUMA in MEMKIND_DAX_KMEM:
* **libdaxctl-devel** (v66 or later)
## Building and Installing
The memkind library has a dependency on a related fork of jemalloc.
The configure scripts and gtest source code are distributed with the
source tarball included in the source RPM, and this tarball is created
with the memkind "make dist" target. In contrast to the distributed source
tarball, the git repository does not include any generated files.
For this reason some additional steps are required when building
from a checkout of the git repo. Those steps include running
the bash script called "autogen.sh" prior to configure. This script
will populate a VERSION file based on "git describe", and use
autoreconf to generate a configure script.
Building and installing memkind in standard system location can be as simple as
typing the following while in the root directory of the source tree:
./autogen.sh
./configure
make
make install
To install this library into **other locations**, you can use the prefix variable, e.g.:
./autogen.sh
./configure --prefix=/usr
make
make install
This will install files to /usr/lib, /usr/include, /usr/share/doc/, usr/share/man.
See the output of:
./configure --help
for more information about either the memkind or the jemalloc configuration options.
## jemalloc
The jemalloc source was forked from jemalloc version 5.2.1. This source tree
is located within the jemalloc subdirectory of the memkind source. The jemalloc
source code has been kept close to the original form, and in particular
the build system has been lightly modified.
## Run Requirements
You will need to install required packages for applications,
which are using the memkind library for dynamic linking at run time:
* **libnuma**
* **numactl**
* **pthread**
## Kind Requirements
| Memory kind | NUMA | HBW Memory | Hugepages | Device DAX | Filesystem supporting hole punching |
| ----------------------------- |:-----:|:-----------:|:---------:|:----------:|:-----------------------------------:|
| MEMKIND_DEFAULT | | | | | |
| MEMKIND_HUGETLB | X | X | X | | |
| MEMKIND_HBW | X | X | | | |
| MEMKIND_HBW_ALL | X | X | | | |
| MEMKIND_HBW_HUGETLB | X | X | X | | |
| MEMKIND_HBW_ALL_HUGETLB | X | X | X | | |
| MEMKIND_HBW_PREFERRED | X | X | | | |
| MEMKIND_HBW_PREFERRED_HUGETLB | X | X | X | | |
| MEMKIND_HBW_INTERLEAVE | X | X | | | |
| MEMKIND_REGULAR | X | | | | |
| MEMKIND_DAX_KMEM | X | | | X | |
| MEMKIND_DAX_KMEM_ALL | X | | | X | |
| MEMKIND_DAX_KMEM_PREFERRED | X | | | X | |
| PMEM kind | | | | | X |
## Kernel
To correctly control of NUMA, huge pages and file-backed memory following
requirements regarding Linux kernel must be satisfied:
* **NUMA**
Requires kernel
[patch](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3964acd0dbec123aa0a621973a2a0580034b4788)
introduced in Linux v3.11 that impacts
functionality of the NUMA system calls. Red Hat has back-ported this patch to the
v3.10 kernel in the RHEL 7.0 GA release, so RHEL 7.0 onward supports
memkind even though this kernel version predates v3.11.
* **Hugepages**
Functionality related to hugepages allocation require patches
[patch1](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e0ec90ee7e6f6cbaa6d59ffb48d2a7af5e80e61d) and
[patch2](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=099730d67417dfee273e9b10ac2560ca7fac7eb9)
Without them physical memory may end up being located on incorrect NUMA node.
* **2MB Pages**
To use the interfaces for obtaining 2MB pages please be sure to follow the instructions
[here](https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt) and pay particular
attention to the use of the procfs files:
/proc/sys/vm/nr_hugepages
/proc/sys/vm/nr_overcommit_hugepages
for enabling the kernel's huge page pool.
* **Filesystem supporting hole punching**
To use the PMEM kind, please be sure that filesystem which is used for PMEM creation supports
[FALLOC_FL_PUNCH_HOLE](http://man7.org/linux/man-pages/man2/fallocate.2.html) flag.
* **Device DAX**
To use MEMKIND_DAX_KMEM_* kinds you need at least Linux Kernel 5.1 (with enabled DEV_DAX_KMEM Kernel option) and created DAX device.
If you have loaded dax_pmem_compat module instead of dax_pmem, please read
[this](https://pmem.io/ndctl/daxctl-migrate-device-model.html) article and migrate device model
to use alternative device dax drivers, e.g. kmem.
If you want to migrate device model type:
daxctl migrate-device-model
To create a new DAX device you can type:
ndctl create-namespace --mode=devdax --map=mem
To display a list of created devices:
ndctl list
If you have already created device in another mode, you can reconfigure the device using:
ndctl create-namespace --mode=devdax --map=mem --force -e namespaceX.Y
Where namespaceX.Y means namespace you want to reconfigure.
For more details about creating new namespace read [this](https://pmem.io/ndctl/ndctl-create-namespace.html).
Conversion from device dax to NUMA node can be performed using following commands:
daxctl reconfigure-device daxX.Y --mode=system-ram
Where daxX.Y is DAX device you want to reconfigure.
This will migrate device from device_dax to kmem driver. After this step daxctl should be able to
see devices in system-ram mode:
daxctl list
Example output after reconfigure dax2.0 as system-ram:
[
{
"chardev":"dax1.0",
"size":263182090240,
"target_node":3,
"mode":"devdax"
},
{
"chardev":"dax2.0",
"size":263182090240,
"target_node":4,
"mode":"system-ram"
}
]
Also you should be able to check new NUMA node configuration by:
numactl -H
Example output, where NUMA node 4 is NUMA node created from persistent memory:
available: 3 nodes (0-1,4)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
node 0 size: 192112 MB
node 0 free: 185575 MB
node 1 cpus: 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
node 1 size: 193522 MB
node 1 free: 193107 MB
node 4 cpus:
node 4 size: 250880 MB
node 4 free: 250879 MB
node distances:
node 0 1 4
0: 10 21 17
1: 21 10 28
4: 17 28 10
## NVDIMM volatile usage
Memkind supports using persistent memory as an extension of DRAM.
This volatile memory solution is provided by the library with two separate ways described below.
## DAX device
NVDIMM memory as DAX device is supported by MEMKIND_DAX_KMEM_* kinds. With this solution
persistent memory will be seen in OS as separate NUMA nodes.
Memkind allows two ways to use this kind:
- first implicitly, by allowing memkind library for automatic recognition of NUMA nodes
created from persistent memory using [libdaxctl-devel](#dependencies)
- secondary explicitly, by using MEMKIND_DAX_KMEM_NODES environment variable set to
comma separated list of NUMA nodes which will be treated as NUMA nodes created from persistent memory,
this solution overrides the first one
## DAX filesystem
PMEM kind supports the traditional malloc/free interfaces on a memory mapped file.
This allows the use of [persistent memory](https://pmem.io/)
as volatile memory, for cases where the region of persistent memory
is useful to an application, but when the application doesn't need it to be persistent.
PMEM kind is most useful when used with Direct Access storage
[DAX](https://www.kernel.org/doc/Documentation/filesystems/dax.txt), which is
memory-addressable persistent storage that supports load/store access without being paged via the system page cache.
Application using memkind library supports managing a data placement:
| Data placement | Memory kind |
|----------------|---------------------------|
| PMEM (fsdax) | PMEM kind |
| PMEM (devdax) | MEMKIND_DAX_KMEM kind |
| DRAM | e.g. MEMKIND_DEFAULT kind |
Currently, the PMEM kind is supported only by the jemalloc heap manager.
## The Detection Mechanism of the Kind
One of the notable features of the memkind is to detect the correct kind of previously allocated memory.
| Operations | Memkind API function |
|---------------------------------------------|----------------------------------------|
| Freeing memory | memkind_free(kind, ptr) |
| Reallocating memory | memkind_realloc(kind, ptr, size) |
| Obtaining the size of allocated memory | memkind_malloc_usable_size(kind, ptr) |
| Reallocating memory to reduce fragmentation | memkind_defrag_reallocate(kind, ptr) |
Operations above could be unified for all used memory kinds
by passing a NULL value as a kind to the functions mentioned above.
For more details, please see the following
[example](https://github.com/memkind/memkind/blob/master/examples/pmem_free_with_unknown_kind.c).
**Important Notes:**
The look up for correct kind could result in serious performance penalty,
which can be avoided by specifying a correct kind explicitly.
## Setting Logging Mechanism
In memkind library logging mechanism could be enabled by setting MEMKIND_DEBUG
environment variable. Setting MEMKIND_DEBUG to "1" enables printing messages
like errors and general information about environment to stderr.
## Setting Heap Manager
In memkind library heap management can be adjusted with MEMKIND_HEAP_MANAGER
environment variable, which allows for switching to one of the available
heap managers.
Values:
- JEMALLOC - sets the jemalloc heap manager
- TBB - sets Intel Threading Building Blocks heap manager. This option requires installed
Intel Threading Building Blocks library.
If the MEMKIND_HEAP_MANAGER is not set then the jemalloc heap manager will be used by default.
## Testing
All existing tests pass. For more information on how to execute tests
see the [CONTRIBUTING](CONTRIBUTING) file.
When tests are run on a NUMA platform without high bandwidth memory
the MEMKIND_HBW_NODES environment variable is used in conjunction with
"numactl --membind" to force standard allocations to one NUMA node and
high bandwidth allocations through a different NUMA node. See next
section for more details.
## Simulating High Bandwidth Memory
A method for testing for the benefit of high bandwidth memory on a
dual socket Intel(R) Xeon(TM) system is to use the QPI bus to simulate
slow memory. This is not an accurate model of the bandwidth and
latency characteristics of the Intel's 2nd generation Intel(R) Xeon Phi(TM)
Product Family on package memory, but is a reasonable way to determine
which data structures rely critically on bandwidth.
If the application a.out has been modified to use high bandwidth
memory with the memkind library then this can be done with numactl as
follows with the bash shell:
export MEMKIND_HBW_NODES=0
numactl --membind=1 --cpunodebind=0 a.out
or with csh:
setenv MEMKIND_HBW_NODES 0
numactl --membind=1 --cpunodebind=0 a.out
The MEMKIND_HBW_NODES environment variable set to zero will bind high
bandwidth allocations to NUMA node 0. The --membind=1 flag to numactl
will bind standard allocations, static and stack variables to NUMA
node 1. The --cpunodebind=0 option to numactl will bind the process
threads to CPUs associated with NUMA node 0. With this configuration
standard allocations will be fetched across the QPI bus, and high
bandwidth allocations will be local to the process CPU.
## Notes
* Using memkind with Transparent Huge Pages enabled may result in
undesirably high memory footprint. To avoid that disable THP using following
[instruction](https://www.kernel.org/doc/Documentation/vm/transhuge.txt)
## Status
Different interfaces can represent different maturity level
(as described in corresponding man pages).
Feedback on design and implementation is greatly appreciated.
## Disclaimer
SEE [COPYING](COPYING) FILE FOR LICENSE INFORMATION.
THIS SOFTWARE IS PROVIDED AS A DEVELOPMENT SNAPSHOT TO AID
COLLABORATION AND WAS NOT ISSUED AS A RELEASED PRODUCT BY INTEL.