Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation seem to be not quite efficient #485

Open
Hope1337 opened this issue Oct 24, 2024 · 0 comments
Open

Implementation seem to be not quite efficient #485

Hope1337 opened this issue Oct 24, 2024 · 0 comments

Comments

@Hope1337
Copy link

Hope1337 commented Oct 24, 2024

After reading the paper, I believe the KAN model should have $O(N^2L(G + k))$ parameters, which is only slightly larger than an MLP by a factor of (G + k) and should still be manageable in terms of size (in certain specific settings).

However, when I attempted to use model = KAN(width=[8,16,32,16,8,1], grid=15, k=3, seed=1, device=device) with an input batch size of 90, my 24GB VRAM GPU ran out of memory unexpectedly. I mean, yes, the KAN model requires more parameters than just (G + k), but with the above settings, it shouldn't exceed 24GB of VRAM, right? I suspect this might be due to the implementation, but the code is too complex to analyze in a short period of time. Has anyone here found anything related to the implementation causing such high VRAM usage? I would appreciate it if anyone could share their insights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant