-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert Python Package from Pandas to Polars #159
Comments
This seems like a good move. It'd be nice to move all the build scripts to polars as well, but that will break a lot. |
Great, I fully agree with that too. |
Just to be explicit (I thought it was assumed), we should make sure that the forward facing code is still in pandas, so that users dont have to learn polars to use the package. |
I thought the goal was to migrate the package from pandas to polars? Pandas is slower and has memory issues. If users have to learn one of the two, they might as well learn the one that is more optimized for our data? Everything in the package that uses pandas is user-facing. |
As this code is nearly complete, we could have two versions if we wanted? - DatasetLoader_pd() and DatasetLoader_pl()? But having two versions would also increase time for future code development such as the upcoming DataTypeLoader. |
no, that'd be a nightmare. Just add a flag called 'use_polars' and set the default to False. HOw you imagine this working when #111 is complete? |
Oh yeah, thats a much better solution (using a flag). For #111, I assume we should also create a pandas and polars version as well (with the same flag option)? |
I was hoping it'd speed up things on the backend, but polars is too new to really rely on at this stage. Basically check the flag and add in a |
@sgosline We've briefly spoken about this before and it came up after my comp bio presentation as well. I'd like to do this before addressing #111.
Currently the package is build from Python, converting it all to Polars will speed up operations and make it less susceptible to memory issues as we continue to add more data. Even now, I could see users on low memory devices potentially having issues.
This update should address the following 7 files:
The text was updated successfully, but these errors were encountered: