-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚚 Move PertCurator
from wetlab
here and add CellxGene
Curator
test
#2408
Conversation
PertCurator
from wetlab here
@Zethson, do you have an idea for how to speed the cxg and pertcurator tests up more and turn them more into proper unit tests? (I hope that with the last commit things are passing; they've been passing locally in isolation all along but these notebook-derived tests seem to create entanglement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zethson, do you have an idea for how to speed the cxg and pertcurator tests up more and turn them more into proper unit tests?
All of our XCurator
tests are currently like pipelines which is easiest to implement and acts like an integration test for the different XCurator
functions but it won't ever allow us to parallelize them.
These tests only need super simple datasets with only 1 var index and 2 categoricals each. You already have simple datasets here but maybe we can downsample and reduce them even more.
Honestly, I think we need to sit down and find ways to speed up the fundamental DataFrameCurator
. Maybe polars, maybe somehow more bulk transactions, maybe we can somehow cache more? Spending time on that is probably better than optimizing these tests now, no?
I don't have further simple ideas.
{"CRISPR": "genetic", "drug": "compound"} | ||
) | ||
|
||
adata.obs["tissue_type"] = "cell culture" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 26 to 41 don't need to be in the test. Just set that as default for the micro dataset.
Thanks and understood! It's a chicken egg problem. I want fast tests to do the refactor / the refactor will start to speed things up. I will see what can be done more. |
Pasting our current run times; total is ~5 min:
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2408 +/- ##
==========================================
- Coverage 91.71% 91.54% -0.18%
==========================================
Files 62 62
Lines 9138 9646 +508
==========================================
+ Hits 8381 8830 +449
- Misses 757 816 +59 ☔ View full report in Codecov by Sentry. |
PertCurator
from wetlab herePertCurator
from wetlab with tests here and add add CellXGeneCurator
test
PertCurator
from wetlab with tests here and add add CellXGeneCurator
testPertCurator
from wetlab here and add add CellXGeneCurator
test
PertCurator
from wetlab here and add add CellXGeneCurator
testPertCurator
from wetlab
here and add add CellXGeneCurator
test
PertCurator
from wetlab
here and add add CellXGeneCurator
testPertCurator
from wetlab
here and add CellXGeneCurator
test
PertCurator
from wetlab
here and add CellXGeneCurator
testPertCurator
from wetlab
here and add CellxGeneCurator
test
PertCurator
from wetlab
here and add CellxGeneCurator
testPertCurator
from wetlab
here and add CellxGene
Curator
test
The commits before already sped up things to ~3 min total, bringing down the CxG test from 130 sec to 50 sec:
|
More to come in the next PR with the first refacor. So far, everything has been merely copy & pasted. |
In the next PR, the CxG test runtime goes further down to 20sec by removing the coupling to |
Is part of a sequence of PRs that refactors the curators:
CellxGene
schema #2412CellxGene
Curator
fromcellxgene-lamin
here #2403DataFrameCurator
#2388Made a "micro-farland2020" dataset to speed up the tests that were running in the
wetlab
repository: https://lamin.ai/laminlabs/lamindata/transform/JreAW9tkAHrcPertCurator
fromwetlab
here and addCellxGene
Curator
test #2408Also added a test for the CxG curator based on the notebook in
cellxgene-lamin
.Curator
tolamindb
cellxgene-lamin#120