-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Memory Management Issue in Multi-Shank Configuration with Kilosort 4.0.16 #771
Comments
I get the same problem, did you try clear_cache=True? it helps sometimes, not in my case though. |
I’ve tried the clear_cache option but encountered the same error. |
@HiroMiyawaki Can you please uploading |
kilosort4.log I'm OK to share the data, which is ~87GB. Do you have a preferred method for transferring the data? |
The log file has some garbage on the last thirds, please ignore them. |
@HiroMiyawaki Any kind of link you can post that I can download the data from is fine. Most people have been sending google drive or dropbox links. You can post it here if you're comfortable with that, or e-mail it to me at [email protected] if you don't want the link to be publicly visible. |
@jacobpennington I've just send an e-mail to you. |
Hi, I'm getting a similar error when running KS4, was this cuda memory issue ever resolved? |
Still working on it. Can you please give some more details @Sara-Brooke, like attaching kilosort4.log? |
I'm having the same issue using a single NP2.0 in 2- and even 1-shank configurations. The 2-shank sorting attempt got to 39% complete during the "kilosort.spikedetect: Re-computing universal templates from data" phase before stopping due to CUDA out of memory error. The 1-shank attempt got to the "first clustering" phase before stopping. I should also mention that just loading the data into the kilosort gui takes up ~3gb of my 8 gb dedicated GPU memory. Recording size: 90 min, Kilosort version: 4.0.17, "Clear PyTorch Cache" = True. |
I'm using NP2.0 in a four shank configuration, a recording of ~25 minutes, and I got the "cuda out of memory" error at the start of spike detection. I am trying to set up my spike sorting still so I don't have any successful runs to go off of unfortunately. I am using a 12GB GPU (GeForce rtx 3060), running ks4 from terminal in a conda environment on data collected in spikeGLX and preprocessed with CatGT. Python 3.9.19 |
Okay I actually got mine to work! I had to manually find the most up-to-date nvidia driver on their website (the device manager lied to me, it was not actually up to date). Having the new driver on my GPU allowed me to install the newest cuda version (compatibility checked by typing nvidia-smi in the conda terminal). Log File: So final (working) versions/equipment/packages: |
Great, thanks for letting us know! |
Hi @HiroMiyawaki, Can you please try sorting again with the latest version (v4.0.19)? There was a bug in the way template positions were generated for multi-shank probes, and fixing the bug reduced memory usage on your dataset by 75% for me. |
Hello @jacobpennington, KS 4.0.19 successfully processed a relatively short (~70 min) 4-shank recording, which was not possible with v4.0.16. However, for a longer (~390 min) 4-shank recording, KS 4.0.19 ran into an “out of memory” error (I’ve attached the log file). I’m not sure whether this indicates that there is another bug or if a 390-min recording at 30 kHz is simply too large for my GPU (which has 16GB RAM). Note that the same data can be processed with KS 4.0.16 if each shank processed separarely. |
I had a similar error. You can try the version on the only open pull request to see if it fixes your problem too. You can see I am the author. |
Hi @RobertoDF It has been quite hectic for a while, but I finally had a chance to try your modification. In short, it works! Here are the details: I cloned the latest version a few days ago (the log indicates that it’s version 4.0.21.dev8+g44252a2.d20241115) and applied the modification as outlined in your pull request. The modified version successfully processed the ~390-minute, 4-shank dataset, and the results appear to be fine, at least in the Phy software. I’ve attached the log file just in case. Thanks a lot! |
Happy to hear that 🚀 |
@HiroMiyawaki Would you be willing to share your data again, for the longer recording? Just to help me test some memory improvements on a dataset that I know is running into this problem. |
Describe the issue:
I am encountering what appears to be a GPU memory management issue when using the multi-shank configuration in Kilosort 4.0.16. Specifically, when processing data from a Neuropixels 2.0 probe in a 4-shank configuration (384 channels in total, sampled at 30 kHz) for approximately 60 minutes, I receive an error indicating a shortage of GPU memory (detailed error message provided below).
However, when I run Kilosort on data of similar duration (~60 minutes) but in a one-shank configuration (still 384 channels), it processes without any issues. Additionally, when I split the 4-shank dataset into individual shanks and process them separately (96 channels each), the operation also completes successfully, even for longer recordings (>300 minutes).
Given this, I suspect that the multi-shank configuration might require significantly more GPU memory. Could you please confirm if this is the case? If so, is there a guideline for estimating the amount of GPU memory required based on the number of shanks and/or the length of the recording?
Reproduce the bug:
Error message:
Version information:
python: 3.9.19
Kilosort version: 4.0.16
os: Windows 11 Home
CUDA toolkit: 11.8
The text was updated successfully, but these errors were encountered: