@@ -18,8 +18,8 @@ ball tree.
18
18
19
19
Technically, this project is a library which exports the two functions
20
20
defined in ` kmcuda.h ` : ` kmeans_cuda ` and ` knn_cuda ` .
21
- It has the built-in Python3 native extension support, so you can
22
- ` from libKMCUDA import kmeans_cuda ` .
21
+ It has the built-in Python3 and R native extension support, so you can
22
+ ` from libKMCUDA import kmeans_cuda ` or ` dyn.load("libKMCUDA.so") ` .
23
23
24
24
[ ![ source{d}] ( img/sourced.png )] ( http://sourced.tech )
25
25
<p align =" right " ><a href =" img/kmeans_image.ipynb " >How this was created?</a ></p >
@@ -33,16 +33,23 @@ Table of contents
33
33
* [ macOS] ( #macos )
34
34
* [ Testing] ( #testing )
35
35
* [ Benchmarks] ( #benchmarks )
36
- * [ 100000x256 @1024] ( #100000x2561024 )
36
+ * [ 100,000x256 @1024] ( #100000x2561024 )
37
37
* [ Configuration] ( #configuration )
38
38
* [ Contestants] ( #contestants )
39
39
* [ Data] ( #data )
40
40
* [ Notes] ( #notes-1 )
41
+ * [ 8,000,000x256@1024] ( #8000000x2561024 )
42
+ * [ Data] ( #data-1 )
43
+ * [ Notes] ( #notes-2 )
41
44
* [ Python examples] ( #python-examples )
42
45
* [ K-means, L2 (Euclidean) distance] ( #k-means-l2-euclidean-distance )
43
- * [ K-means, angular (cosine) distance average] ( #k-means-angular-cosine-distance--average )
46
+ * [ K-means, angular (cosine) distance + average] ( #k-means-angular-cosine-distance--average )
44
47
* [ K-nn] ( #k-nn-1 )
45
48
* [ Python API] ( #python-api )
49
+ * [ R examples] ( #r-examples )
50
+ * [ K-means] ( #k-means-1 )
51
+ * [ K-nn] ( #k-nn-2 )
52
+ * [ R API] ( #r-api )
46
53
* [ C examples] ( #c-examples )
47
54
* [ C API] ( #c-api )
48
55
* [ License] ( #license )
@@ -123,6 +130,7 @@ It requires cudart 8.0 / Pascal and OpenMP 4.0 capable compiler. The build has
123
130
been tested primarily on Linux but it works on macOS too with some blows and whistles
124
131
(see "macOS" subsection).
125
132
If you do not want to build the Python native module, add ` -D DISABLE_PYTHON=y ` .
133
+ If you do not want to build the R native module, add ` -D DISABLE_R=y ` .
126
134
If CUDA is not automatically found, add ` -D CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-8.0 `
127
135
(change the path to the actual one). By default, CUDA kernels are compiled for
128
136
the architecture 60 (Pascal). It is possible to override it via ` -D CUDA_ARCH=52 ` ,
@@ -167,8 +175,6 @@ Benchmarks
167
175
----------
168
176
169
177
### 100000x256@1024
170
- Comparison of some KMeans implementations:
171
-
172
178
| | sklearn KMeans | KMeansRex | KMeansRex OpenMP | Serban | kmcuda | kmcuda 2 GPU |
173
179
| ------------| ----------------| -----------| ------------------| --------| --------| --------------|
174
180
| time, s | 164 | 36 | 20 | 10.6 | 9.2 | 5.5 |
@@ -193,6 +199,21 @@ Comparison of some KMeans implementations:
193
199
#### Notes
194
200
100000 is the maximum size Serban KMeans can handle.
195
201
202
+ ### 8000000x256@1024
203
+ | | sklearn KMeans | KMeansRex | KMeansRex OpenMP | Serban | kmcuda 2 GPU | kmcuda Yinyang 2 GPU |
204
+ | ------------| ----------------| -----------| ------------------| --------| --------------| ----------------------|
205
+ | time | please no | - | 6h 34m | fail | 44m | 36m |
206
+ | memory, GB | - | - | 205 | fail | 8.7 | 10.4 |
207
+
208
+ kmeans++ initialization, 93 iterations (1% reassignments equivalent).
209
+
210
+ #### Data
211
+ 8,000,000 secret production samples.
212
+
213
+ #### Notes
214
+ KmeansRex did eat 205 GB of RAM on peak; it uses dynamic memory so it constantly
215
+ bounced from 100 GB to 200 GB.
216
+
196
217
Python examples
197
218
---------------
198
219
@@ -276,7 +297,7 @@ calculated 0.276552 of all the distances
276
297
Python API
277
298
----------
278
299
``` python
279
- def kmeans_cuda (samples , clusters , tolerance = 0.0 , init = " k-means++" ,
300
+ def kmeans_cuda (samples , clusters , tolerance = 0.01 , init = " k-means++" ,
280
301
yinyang_t = 0.1 , metric = " L2" , average_distance = False ,
281
302
seed = time(), device = 0 , verbosity = 0 )
282
303
```
@@ -289,18 +310,20 @@ def kmeans_cuda(samples, clusters, tolerance=0.0, init="k-means++",
289
310
290
311
** clusters** integer, the number of clusters.
291
312
292
- ** tolerance** float, if the relative number of reassignments drops below this value, stop.
313
+ ** tolerance** float, if the relative number of reassignments drops below this value,
314
+ algorithm stops.
293
315
294
316
** init** string or numpy array, sets the method for centroids initialization,
295
- may be "k= means++"/"kmeans++ ", "random" or numpy array of shape
317
+ may be "k- means++", "afk-mc2 ", "random" or numpy array of shape
296
318
\[ ** clusters** , number of features\] . dtype must be float32.
297
319
298
320
** yinyang_t** float, the relative number of cluster groups, usually 0.1.
321
+ 0 disables Yinyang refinement.
299
322
300
323
** metric** str, the name of the distance metric to use. The default is Euclidean (L2),
301
- can be changed to "cos" to behave as Spherical K-means with the
302
- angular distance. Please note that samples * must* be normalized in that
303
- case.
324
+ it can be changed to "cos" to change the algorithm to Spherical K-means
325
+ with the angular distance. Please note that samples * must* be normalized
326
+ in the latter case.
304
327
305
328
** average_distance** boolean, the value indicating whether to calculate
306
329
the average distance between cluster elements and
@@ -309,17 +332,18 @@ def kmeans_cuda(samples, clusters, tolerance=0.0, init="k-means++",
309
332
310
333
** seed** integer, random generator seed for reproducible results.
311
334
312
- ** device** integer, bitwise OR-ed CUDA device indices, e.g. 1 means first device, 2 means second device,
313
- 3 means using first and second device. Special value 0 enables all available devices.
314
- The default is 0.
335
+ ** device** integer, bitwise OR-ed CUDA device indices, e.g. 1 means first device,
336
+ 2 means second device, 3 means using first and second device. Special
337
+ value 0 enables all available devices. The default is 0.
315
338
316
339
** verbosity** integer, 0 means complete silence, 1 means mere progress logging,
317
340
2 means lots of output.
318
341
319
- ** return** tuple(centroids, assignments). If ** samples** was a numpy array or
320
- a host pointer tuple, the types are numpy arrays, otherwise, raw pointers
321
- (integers) allocated on the same device. If ** samples** are float16,
322
- the returned centroids are float16 too.
342
+ ** return** tuple(centroids, assignments, \[ average_distance\] ).
343
+ If ** samples** was a numpy array or a host pointer tuple, the types
344
+ are numpy arrays, otherwise, raw pointers (integers) allocated on the
345
+ same device. If ** samples** are float16, the returned centroids are
346
+ float16 too.
323
347
324
348
``` python
325
349
def knn_cuda (k , samples , centroids , assignments , metric = " L2" , device = 0 , verbosity = 0 )
@@ -342,6 +366,108 @@ def knn_cuda(k, samples, centroids, assignments, metric="L2", device=0, verbosit
342
366
to be compatible with uint32. If ** samples** is a tuple then
343
367
** assignments** is a pointer. The shape is (number of samples,).
344
368
369
+ ** metric** str, the name of the distance metric to use. The default is Euclidean (L2),
370
+ it can be changed to "cos" to change the algorithm to Spherical K-means
371
+ with the angular distance. Please note that samples * must* be normalized
372
+ in the latter case.
373
+
374
+ ** device** integer, bitwise OR-ed CUDA device indices, e.g. 1 means first device,
375
+ 2 means second device, 3 means using first and second device. Special
376
+ value 0 enables all available devices. The default is 0.
377
+
378
+ ** verbosity** integer, 0 means complete silence, 1 means mere progress logging,
379
+ 2 means lots of output.
380
+
381
+ ** return** neighbor indices. If ** samples** was a numpy array or
382
+ a host pointer tuple, the return type is numpy array, otherwise, a
383
+ raw pointer (integer) allocated on the same device. The shape is
384
+ (number of samples, k).
385
+
386
+ R examples
387
+ ----------
388
+ #### K-means
389
+ ``` R
390
+ dyn.load(" libKMCUDA.so" )
391
+ samples = replicate(4 , runif(16000 ))
392
+ result = .External(" kmeans_cuda" , samples , 50 , tolerance = 0.01 ,
393
+ seed = 777 , verbosity = 1 , average_distance = TRUE )
394
+ print(result $ average_distance )
395
+ print(result $ centroids [1 : 10 ,])
396
+ print(result $ assignments [1 : 10 ])
397
+ ```
398
+
399
+ #### K-nn
400
+ ``` R
401
+ dyn.load(" libKMCUDA.so" )
402
+ samples = replicate(4 , runif(16000 ))
403
+ cls = .External(" kmeans_cuda" , samples , 50 , tolerance = 0.01 ,
404
+ seed = 777 , verbosity = 1 )
405
+ result = .External(" knn_cuda" , 20 , samples , cls $ centroids , cls $ assignments ,
406
+ verbosity = 1 )
407
+ print(result [1 : 10 ,])
408
+ ```
409
+
410
+ R API
411
+ -----
412
+ ``` R
413
+ function kmeans_cuda(
414
+ samples , clusters , tolerance = 0.01 , init = " k-means++" , yinyang_t = 0.1 ,
415
+ metric = " L2" , average_distance = FALSE , seed = Sys.time(), device = 0 , verbosity = 0 )
416
+ ```
417
+ ** samples** real matrix of shape \[ number of samples, number of features\]
418
+ or list of real matrices which are rbind()-ed internally. No more
419
+ than INT32_MAX samples and UINT16_MAX features are supported.
420
+
421
+ ** clusters** integer, the number of clusters.
422
+
423
+ ** tolerance** real, if the relative number of reassignments drops below this value,
424
+ algorithm stops.
425
+
426
+ ** init** character vector or real matrix, sets the method for centroids initialization,
427
+ may be "k-means++", "afk-mc2", "random" or real matrix, of shape
428
+ \[ ** clusters** , number of features\] .
429
+
430
+ ** yinyang_t** real, the relative number of cluster groups, usually 0.1.
431
+ 0 disables Yinyang refinement.
432
+
433
+ ** metric** character vector, the name of the distance metric to use. The default
434
+ is Euclidean (L2), it can be changed to "cos" to change the algorithm
435
+ to Spherical K-means with the angular distance. Please note that
436
+ samples * must* be normalized in the latter case.
437
+
438
+ ** average_distance** logical, the value indicating whether to calculate
439
+ the average distance between cluster elements and
440
+ the corresponding centroids. Useful for finding
441
+ the best K. Returned as the third list element.
442
+
443
+ ** seed** integer, random generator seed for reproducible results.
444
+
445
+ ** device** integer, bitwise OR-ed CUDA device indices, e.g. 1 means first device,
446
+ 2 means second device, 3 means using first and second device. Special
447
+ value 0 enables all available devices. The default is 0.
448
+
449
+ ** verbosity** integer, 0 means complete silence, 1 means mere progress logging,
450
+ 2 means lots of output.
451
+
452
+ ** return** list(centroids, assignments\[ , average_distance\] ). Indices in
453
+ assignments start from 1.
454
+
455
+ ``` R
456
+ function knn_cuda(k , samples , centroids , assignments , metric = " L2" , device = 0 , verbosity = 0 )
457
+ ```
458
+ ** k** integer, the number of neighbors to search for each sample. Must be ≤ 1<sup >16</sup >.
459
+
460
+ ** samples** real matrix of shape \[ number of samples, number of features\]
461
+ or list of real matrices which are rbind()-ed internally.
462
+ In the latter case, is is possible to pass in more than INT32_MAX
463
+ samples.
464
+
465
+ ** centroids** real matrix with precalculated clusters' centroids (e.g., using
466
+ kmeans() or kmeans_cuda()).
467
+
468
+ ** assignments** integer vector with sample-cluster associations. Indices start
469
+ from 1.
470
+
345
471
** metric** str, the name of the distance metric to use. The default is Euclidean (L2),
346
472
can be changed to "cos" to behave as Spherical K-means with the
347
473
angular distance. Please note that samples * must* be normalized in that
@@ -354,10 +480,8 @@ def knn_cuda(k, samples, centroids, assignments, metric="L2", device=0, verbosit
354
480
** verbosity** integer, 0 means complete silence, 1 means mere progress logging,
355
481
2 means lots of output.
356
482
357
- ** return** neighbor indices. If ** samples** was a numpy array or
358
- a host pointer tuple, the return type is numpy array, otherwise, a
359
- raw pointer (integer) allocated on the same device. The shape is
360
- (number of samples, k).
483
+ ** return** integer matrix with neighbor indices. The shape is (number of samples, k).
484
+ Indices start from 1.
361
485
362
486
C examples
363
487
----------
0 commit comments