-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] tiledbsoma.io.add_obs_layer
helper function
#3625
base: main
Are you sure you want to change the base?
Conversation
tiledbsoma.io.add_obs_layer
helper function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very interesting -- thanks for sharing, @maarten-devries !!
Let me think about this a bit 🙏
@@ -1811,6 +1811,52 @@ def update_matrix( | |||
_util.format_elapsed(s, f"FINISH UPDATING {soma_ndarray.uri}"), | |||
) | |||
|
|||
def add_obs_layer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The add_X_layer
function can be used to write a new component in the X
collection -- e.g. if X["raw"]
exists, you can create X["norm"]
-- as X
is a Collection
of SparseNDArray
.
Here, there's just one obs
-- since obs
is a DataFrame
, not a Collection
of DataFrame
.
This is really a variant of update_obs
.
What this does that tiledbsoma.io.update_obs
doesn't:
- It accepts a
pyarrow.Table
(which is awesome!)
What tiledbsoma.io.update_obs
does that this doesn't:
- Check old and new schema, and evolve the schema if necessary
- That's fine here as long as it's clear, but it might benefit us to assert that the old and new schema match -- rather than deferring to
DataFrame.write
to raise an exception.
- That's fine here as long as it's clear, but it might benefit us to assert that the old and new schema match -- rather than deferring to
- Check that the old and new row-count are the same -- this allows row-count changes
- If the row-count is increasing here, that's probably a good thing (although it needs a warning label that this may require a referential-integrity update on
X
,obsm
, andobsp
matrices -- caveat emptor) - If the row-count is decreasing here -- say from 100 rows to 98 -- then those last two rows from the previous write will still be visible. We'd need to write through the TileDB Core deletion API (which is solid and well-tested, simply not yet wired into tiledbsoma) to mark them as deleted. And in this case, too, a warning label about referential integrity.
- If the row-count is increasing here, that's probably a good thing (although it needs a warning label that this may require a referential-integrity update on
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3625 +/- ##
==========================================
- Coverage 86.23% 86.12% -0.11%
==========================================
Files 55 55
Lines 6378 6387 +9
==========================================
+ Hits 5500 5501 +1
- Misses 878 886 +8
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Issue and/or context:
Changes:
Notes for Reviewer: