-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(feat): allow join=outer
for concat_on_disk
#1504
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1504 +/- ##
==========================================
+ Coverage 84.44% 84.45% +0.01%
==========================================
Files 36 36
Lines 5747 5745 -2
==========================================
- Hits 4853 4852 -1
+ Misses 894 893 -1
|
if join != "inner": | ||
raise NotImplementedError("only inner join is currently supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to become more familiar with the code, but it's shocking this was all that was needed. That being said, the tests do pass and I looked at the results so 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, looking at the code path, it makes some sense. concat_on_disk
constructs sensible-looking reindexers for the alt_dim
axis and then passes those on so there shouldn't be a need for the join
argument anywhere else despite the option to use it
In a first run, the object didn't contain the "var" dimension. Could you check whether you can replicate it? |
@zimea #1505 is what you want but there are clearly some problems there. So if you need you my_adatas = []
for path in my_paths:
adata = ad.read_h5ad(path)
del adata.X
my_adatas += [adata]
merged_in_memory = concat(my_adatas, merge="...")
merged_on_disk.var = merged_in_memory.var which should be manageable unless your |
Ok @ivirshup I think I have a slightly better sense of what you mean when you talk about For example, I have been looking into adding anndata/src/anndata/experimental/merge.py Lines 60 to 90 in 105f354
merge types we should merge #1469 and then integrate read_dask_as_elem here. Sound good?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Happy to see this was somewhat smooth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. It tests correctness, and if performance turns out to be a problem, we can do that later
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon! Remember to remove the If these instructions are inaccurate, feel free to suggest an improvement. |
concat_on_disk
outer join #1503