Skip to content

Commit d4535d2

Browse files
authored
reduce memory_footprint for sparse PCA transform (#5964)
The sparse PCA still densified `X` during the transform step. This defeats the purpose of a sparse PCA in a sense. However ``` precomputed_mean_impact = self.mean_ @ self.components_.T mean_impact = cp.ones((X.shape[0], 1)) @ precomputed_mean_impact.reshape(1, -1) X_transformed = X.dot(self.components_.T) -mean_impact ``` is the same as ``` X = X - self.mean_ X_transformed = X.dot(self.components_.T) ``` The new implementation is faster (but mainly due to the fact that we don't have to rely on cupy's `to_array()`) and uses a lot less memory. Authors: - Severin Dicks (https://github.com/Intron7) - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #5964
1 parent a8fda19 commit d4535d2

File tree

1 file changed

+4
-3
lines changed
  • python/cuml/cuml/decomposition

1 file changed

+4
-3
lines changed

python/cuml/cuml/decomposition/pca.pyx

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#
2-
# Copyright (c) 2019-2023, NVIDIA CORPORATION.
2+
# Copyright (c) 2019-2024, NVIDIA CORPORATION.
33
#
44
# Licensed under the Apache License, Version 2.0 (the "License");
55
# you may not use this file except in compliance with the License.
@@ -632,8 +632,9 @@ class PCA(UniversalBase,
632632
self.components_ *= cp.sqrt(self.n_samples_ - 1)
633633
self.components_ /= self.singular_values_.reshape((-1, 1))
634634

635-
X = X - self.mean_
636-
X_transformed = X.dot(self.components_.T)
635+
precomputed_mean_impact = self.mean_ @ self.components_.T
636+
mean_impact = cp.ones((X.shape[0], 1)) @ precomputed_mean_impact.reshape(1, -1)
637+
X_transformed = X.dot(self.components_.T) -mean_impact
637638

638639
if self.whiten:
639640
self.components_ *= self.singular_values_.reshape((-1, 1))

0 commit comments

Comments
 (0)