Implementation seem to be not quite efficient #485

Hope1337 · 2024-10-24T16:55:56Z

After reading the paper, I believe the KAN model should have $O(N^2L(G + k))$ parameters, which is only slightly larger than an MLP by a factor of (G + k) and should still be manageable in terms of size (in certain specific settings).

However, when I attempted to use model = KAN(width=[8,16,32,16,8,1], grid=15, k=3, seed=1, device=device) with an input batch size of 90, my 24GB VRAM GPU ran out of memory unexpectedly. I mean, yes, the KAN model requires more parameters than just (G + k), but with the above settings, it shouldn't exceed 24GB of VRAM, right? I suspect this might be due to the implementation, but the code is too complex to analyze in a short period of time. Has anyone here found anything related to the implementation causing such high VRAM usage? I would appreciate it if anyone could share their insights.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation seem to be not quite efficient #485

Implementation seem to be not quite efficient #485

Hope1337 commented Oct 24, 2024 •

edited

Loading

Implementation seem to be not quite efficient #485

Implementation seem to be not quite efficient #485

Comments

Hope1337 commented Oct 24, 2024 • edited Loading

Hope1337 commented Oct 24, 2024 •

edited

Loading