Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] tiledbsoma.io.add_obs_layer helper function #3625

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

maarten-devries
Copy link

Issue and/or context:

Changes:

Notes for Reviewer:

@johnkerl johnkerl changed the title add_obs_layer helper function [python] tiledbsoma.io.add_obs_layer helper function Jan 24, 2025
Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very interesting -- thanks for sharing, @maarten-devries !!

Let me think about this a bit 🙏

@@ -1811,6 +1811,52 @@ def update_matrix(
_util.format_elapsed(s, f"FINISH UPDATING {soma_ndarray.uri}"),
)

def add_obs_layer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The add_X_layer function can be used to write a new component in the X collection -- e.g. if X["raw"] exists, you can create X["norm"] -- as X is a Collection of SparseNDArray.

Here, there's just one obs -- since obs is a DataFrame, not a Collection of DataFrame.

This is really a variant of update_obs.

What this does that tiledbsoma.io.update_obs doesn't:

  • It accepts a pyarrow.Table (which is awesome!)

What tiledbsoma.io.update_obs does that this doesn't:

  • Check old and new schema, and evolve the schema if necessary
    • That's fine here as long as it's clear, but it might benefit us to assert that the old and new schema match -- rather than deferring to DataFrame.write to raise an exception.
  • Check that the old and new row-count are the same -- this allows row-count changes
    • If the row-count is increasing here, that's probably a good thing (although it needs a warning label that this may require a referential-integrity update on X, obsm, and obsp matrices -- caveat emptor)
    • If the row-count is decreasing here -- say from 100 rows to 98 -- then those last two rows from the previous write will still be visible. We'd need to write through the TileDB Core deletion API (which is solid and well-tested, simply not yet wired into tiledbsoma) to mark them as deleted. And in this case, too, a warning label about referential integrity.

Copy link

codecov bot commented Feb 11, 2025

Codecov Report

Attention: Patch coverage is 11.11111% with 8 lines in your changes missing coverage. Please review.

Project coverage is 86.12%. Comparing base (fe8e64d) to head (248c657).
Report is 27 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3625      +/-   ##
==========================================
- Coverage   86.23%   86.12%   -0.11%     
==========================================
  Files          55       55              
  Lines        6378     6387       +9     
==========================================
+ Hits         5500     5501       +1     
- Misses        878      886       +8     
Flag Coverage Δ
python 86.12% <11.11%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 86.12% <11.11%> (-0.11%) ⬇️
libtiledbsoma ∅ <ø> (∅)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants