Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ Re-purpose BaseCurator as Curator and introduce CatCurator #2416

Merged
merged 21 commits into from
Feb 2, 2025

Conversation

falexwolf
Copy link
Member

@falexwolf falexwolf commented Feb 2, 2025

Is part of a sequence of PRs that refactors the curators:

Consolidate under Curator

Rename BaseCurator to Curator and eliminate the previous Curator 1080aed

The private attributes of Curator enable the most basic curation functionality.

class Curator:
    def __init__(self, dataset: Any):
        self._dataset: Any = dataset  # pass the dataset as a UPathStr or data object
        self._artifact: Artifact = None  # pass the dataset as a (non-curated) artifact
        self._cat_curator: CatCurator = None
        self._validated: bool = False

Consolidate under CatCurator 99e145b

Rename .fields and ._fields to .categoricals and ._categoricals, respectively

The private attributes of CatCurator extend Curator to enable curating categoricals.

class CatCurator(Curator):
    def __init__(
        self, *, dataset, categoricals, sources, organism, exclude, columns_field=None
    ):
        super().__init__(dataset=dataset)
        self._categoricals = categoricals or {}
        self._non_validated = None
        self._organism = organism
        self._sources = sources or {}
        self._exclude = exclude or {}
        self._columns_field = columns_field
        self._validate_category_error_messages: str = ""

DataFrameCatCurator, AnnDataCatCurator, and MuDataCatCurator now all leverage these fields. They also all leverage CatCurator.save_artifact().

TiledbsomaCatCurator and SpatialDataCatCurator both have more custom logic that should be consolidated in another PR.

@falexwolf falexwolf changed the title Classstructure ♻️ Re-purpose BaseCurator as Curator and enforce properly inherited signatures Feb 2, 2025
Copy link

github-actions bot commented Feb 2, 2025

@github-actions github-actions bot temporarily deployed to pull request February 2, 2025 06:31 Inactive
@falexwolf falexwolf changed the title ♻️ Re-purpose BaseCurator as Curator and enforce properly inherited signatures ♻️ Re-purpose BaseCurator as Curator, introduce CatCurator and consolidate shared logic under CatCurator Feb 2, 2025
Copy link

codecov bot commented Feb 2, 2025

Codecov Report

Attention: Patch coverage is 95.83333% with 7 lines in your changes missing coverage. Please review.

Project coverage is 91.54%. Comparing base (d503387) to head (cc65789).
Report is 35 commits behind head on main.

Files with missing lines Patch % Lines
lamindb/curators/__init__.py 95.80% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2416      +/-   ##
==========================================
- Coverage   91.71%   91.54%   -0.17%     
==========================================
  Files          62       62              
  Lines        9138     9572     +434     
==========================================
+ Hits         8381     8763     +382     
- Misses        757      809      +52     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot temporarily deployed to pull request February 2, 2025 11:57 Inactive
@github-actions github-actions bot temporarily deployed to pull request February 2, 2025 12:12 Inactive
@falexwolf falexwolf merged commit 23d1d35 into main Feb 2, 2025
12 checks passed
@falexwolf falexwolf deleted the classstructure branch February 2, 2025 12:57
@github-actions github-actions bot temporarily deployed to pull request February 2, 2025 13:08 Inactive
@falexwolf falexwolf changed the title ♻️ Re-purpose BaseCurator as Curator, introduce CatCurator and consolidate shared logic under CatCurator ♻️ Re-purpose BaseCurator as Curator and introduce CatCurator Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant