Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement scverse datastucture #356

Merged
merged 173 commits into from
Apr 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
173 commits
Select commit Hold shift + click to select a range
a34afc1
Create Awkward AnnData instead of putting everything in obs
grst Aug 13, 2022
e65c138
add todo
grst Aug 16, 2022
e1646b7
Get chain indices for primary and secondary chains
grst Aug 18, 2022
799f611
WIP get module
grst Aug 18, 2022
da0b096
Implement ir.get.airr
grst Aug 19, 2022
1a3e4d7
Clean up AirrCell
grst Aug 19, 2022
c7280b5
WIP restructure IO module
grst Aug 19, 2022
6b36de0
fix imports
grst Aug 19, 2022
d13c254
Add helper function for unit tests
grst Aug 19, 2022
f3d82fb
tl.chain_qc successfully runs on the new datastructure
grst Aug 19, 2022
966d17d
Update convert anndata
grst Oct 3, 2022
4e20248
Merge branch 'master' into scverse_datastructure
grst Oct 10, 2022
6be7e85
Merge branch 'master' into scverse_datastructure
grst Jan 4, 2023
423f447
switch to obsm-based data structure
grst Jan 4, 2023
6924d2a
update get module
grst Jan 4, 2023
bd6bb86
Update anndata schema check and _make_adata util function.
grst Jan 4, 2023
7fe9ac9
fix _make_adata
grst Jan 4, 2023
2dadc53
update fixtures
grst Jan 4, 2023
b906fce
Fix a couple of tests
grst Jan 4, 2023
fe150e0
Re-add to_airr_cells
grst Jan 4, 2023
5ba4891
Fix couple more tests
grst Jan 4, 2023
bc859c4
Fix more IO tests
grst Jan 5, 2023
cad4733
More IO tests [skip ci]
grst Jan 5, 2023
08341a4
Merge remote-tracking branch 'origin/master' into scverse_datastructure
grst Jan 5, 2023
6204a09
Cleanup has_ir
grst Jan 5, 2023
6ccf781
WIP fix clonotype neighbors [skip ci]
grst Jan 5, 2023
9a091f2
WIP fix distance tests
grst Jan 16, 2023
5a185d1
WIP fix clonotype cluster tests
grst Jan 16, 2023
3ba6051
Fix spectratype functions [skip ci]
grst Jan 16, 2023
f1c63de
Fix more tests
grst Jan 17, 2023
7803969
Fix IR dist tests [skip ci]
grst Jan 17, 2023
855b02e
Fix tests for ir dist
grst Jan 17, 2023
920ba4f
Fix spectratype test [skip ci]
grst Jan 17, 2023
39cd7d0
Tests for new upgrade_schema function [skip ci]
grst Jan 17, 2023
e5fb9af
Workaround for group_abundance plot without has_ir column
grst Jan 19, 2023
0fc159b
Cleanup has_ir
grst Jan 19, 2023
767c977
Clean multi_chain [skip ci]
grst Jan 19, 2023
90d8fd3
stub new index_chains function
grst Jan 19, 2023
5b2666e
WIP index_chain function [skip ci]
grst Jan 19, 2023
b410494
Add stub test for index_chains
grst Jan 23, 2023
98d044b
Stub second test for index_chains
grst Jan 23, 2023
f3694d5
Complete second test for index_chains [skip ci]
grst Jan 23, 2023
3978984
index_chains tests
grst Jan 26, 2023
171db6d
Update target version to v0.13 [skip ci]
grst Jan 27, 2023
a1f2aa5
Merge branch 'master' into scverse_datastructure
grst Jan 27, 2023
541377c
Merge remote-tracking branch 'origin/master' into scverse_datastructu…
grst Jan 27, 2023
f731677
add isort and autoflake
grst Jan 27, 2023
15467a8
Fix circular import
grst Jan 27, 2023
e38f801
Fix multichain handling (implement get._has_ir)
grst Jan 27, 2023
3925111
re-add fixtures
grst Jan 27, 2023
b20c7bb
isort on tests [skip ci]
grst Jan 27, 2023
5abd0d1
fix remaining IO tests
grst Jan 27, 2023
7b6071b
update todo flags [skip ci]
grst Jan 27, 2023
9203af0
_is_na input sanitization already in AirrCell module [skip ci]
grst Jan 29, 2023
a026a59
Fix issue with plotting; get rid of merge_with_ir [skip ci]
grst Jan 29, 2023
bcd3c50
Remove test for merge_with_ir [skip ci]
grst Jan 29, 2023
1dabb61
Ensure consistent ordering or chains in merge_airr
grst Feb 2, 2023
0fa0dd8
Complete unit tests for merge_airr [skip ci]
grst Feb 2, 2023
9392fb6
Use pre-commit.ci for black formatting
grst Feb 14, 2023
d52f912
Merge branch 'pre-commit' into scverse_datastructure
grst Feb 14, 2023
f63735e
Bump minimum python version to 3.8
grst Feb 14, 2023
b78687b
Bump minimum python version to 3.8
grst Feb 14, 2023
a866d86
bump python version in CI tests
grst Feb 14, 2023
88bc0ae
update imports of Literal
grst Feb 14, 2023
c925490
Merge branch 'pre-commit' into scverse_datastructure
grst Feb 14, 2023
c4ffc67
update pre-commit config [skip ci]
grst Feb 14, 2023
9adb9ac
fix compat
grst Feb 14, 2023
a749f71
WIP new chain_indices format
grst Feb 14, 2023
ba59d4e
Fix get module
grst Feb 14, 2023
09fd846
WIP fix tests
grst Feb 14, 2023
b581dba
Fix tests [skip ci]
grst Feb 14, 2023
3f356fb
Fix dandelion tests
grst Feb 14, 2023
a455209
Update workflow tests
grst Feb 15, 2023
e00aa11
update min anndata version
grst Feb 15, 2023
23c9610
Deprecate include_fields parameter and pass kwargs to from_airr_cells…
grst Feb 15, 2023
46a2562
WIP update example datasets
grst Feb 15, 2023
bcdeca9
update wu dataset generation
grst Feb 16, 2023
538fb30
Update wu2020 dataset to mudata (preliminary)
grst Feb 17, 2023
938d13b
First attempt to make tutorial work with mudata
grst Feb 17, 2023
b44ae5c
fix issue with slicing awkward array when slice mask is empty
grst Feb 17, 2023
0456bff
Change clonotype calling behavior for missing cdr3 sequences
grst Feb 17, 2023
cfa77ce
fix awkward type conversion in index_chains
grst Feb 17, 2023
a364630
Get rid of tqdm workaround which is not needed anymore
grst Feb 17, 2023
2bb2570
Update API in tutorial to what it *should* look like in the future
grst Feb 25, 2023
83a15cd
Stub parameter validation [skip ci]
grst Feb 25, 2023
d845557
implement params check class
grst Feb 26, 2023
dbd11cb
update API docs
grst Feb 26, 2023
5515c89
Apply new params check to first function
grst Feb 26, 2023
6029b2b
document params check
grst Feb 26, 2023
da7f235
Remove anndata version check decorators
grst Feb 26, 2023
a89bd3f
Restructure to fix cirular import [skip ci]
grst Feb 26, 2023
0d93df7
Unit tests for parms check
grst Feb 26, 2023
d96ded6
Fix notebook pairing
grst Feb 26, 2023
2007624
Params check in index_chains
grst Feb 26, 2023
e5363ff
update ir_dist with paramscheck [skip ci]
grst Feb 26, 2023
4a8335d
Merge remote-tracking branch 'origin/master' into scverse_datastructure
grst Feb 26, 2023
d6bb7c8
Apply pre-commit hooks to all files [skip ci]
grst Feb 26, 2023
2e43d19
Refactor ParamsCheck class
grst Mar 5, 2023
f869a12
Refactor chain_qc
grst Mar 5, 2023
e55f28f
WIP implement param checks
grst Mar 7, 2023
8a71854
Update type hints
grst Mar 7, 2023
fade4c4
Improve _ParamsCheck class [skip ci]
grst Mar 7, 2023
66cf588
Fix typing in a couple of files.
grst Mar 8, 2023
e24a336
Iterate on tutorial [skip ci]
grst Mar 8, 2023
c8ba45f
Iterate on tutorial
grst Mar 8, 2023
7ca6be1
Rename _ParamsCheck to DataHandler
grst Mar 8, 2023
92738b8
Implement get_obs in DataHandler
grst Mar 8, 2023
236bf21
WIP fix clonotype_network
grst Mar 8, 2023
334090f
Fix clonotype_network plot [skip ci]
grst Mar 8, 2023
ce72f7d
Update clonal_expansion
grst Mar 9, 2023
4de2301
Fix alpha diversity
grst Mar 9, 2023
eeab3f8
Fix repertoire overlap and spectratype
grst Mar 9, 2023
e829f57
Fix clonotype modularity
grst Mar 9, 2023
3bd28ec
Fix ir_query [skip ci]
grst Mar 9, 2023
682f741
Fix clonotype convergence
grst Mar 9, 2023
343e683
Fix clonotype imbalance
grst Mar 9, 2023
4b01299
Fix clonotype imbalance
grst Mar 9, 2023
0a10e22
Update processing scripts for Wu2020
grst Mar 9, 2023
808f60a
Update maynard loading script
grst Mar 9, 2023
2809759
Merge branch 'scverse_datastructure' of github.com:icbi-lab/scirpy in…
grst Mar 9, 2023
6233d61
disable check for same fields in AirrCell [skip ci]
grst Mar 9, 2023
4c6bf83
Update maynard processing script
grst Mar 10, 2023
cf4974d
WIP tests with mudata
grst Mar 12, 2023
517210b
Update example datasets [skip ci]
grst Mar 12, 2023
a282229
Fix test for clonotype convergence
grst Mar 13, 2023
ddf5718
Experimental: use wrapper class for fixture
grst Mar 15, 2023
81217cb
Remove outdated TODO statements
grst Mar 16, 2023
cd0bdc2
Revert "Experimental: use wrapper class for fixture"
grst Mar 16, 2023
e43574e
Implement inplace logic in DataHandler
grst Mar 16, 2023
a37ce9f
Parametrize fixtures to represent both AnnData and MuData [skip ci]
grst Mar 16, 2023
e60b12f
Use DataHandler to write results to obs.
grst Mar 16, 2023
ea7cd34
WIP fix tests
grst Mar 20, 2023
75be783
Fix _get_colors [skip ci]
grst Mar 20, 2023
d30084d
Fix tests
grst Mar 20, 2023
eef3341
Fix test_get_color
grst Mar 20, 2023
6a9811a
Implement context managers in `get` module
grst Mar 21, 2023
e8317e1
Fix clustermap
grst Mar 21, 2023
597bff1
Fix normalize in spectratype
grst Mar 21, 2023
a04b461
Tutorial again complete :tada:
grst Mar 21, 2023
c96fa09
Fix some open TODOs
grst Mar 21, 2023
4a86605
Add tests for get context managers
grst Mar 21, 2023
d313c03
update datasets module
grst Mar 23, 2023
09b96ee
Remove function cdr_convergence, which was never publicly documented …
grst Mar 23, 2023
6b44b88
Update some docstrings
grst Mar 23, 2023
3212230
remove erroneous import [skip ci]
grst Mar 23, 2023
b720c83
WIP update docs
grst Mar 24, 2023
788e31f
Update usage principles and data structure
grst Mar 24, 2023
17e7a17
Update MuData section [skip ci]
grst Mar 24, 2023
984441e
WIP update IO tutorial
grst Mar 28, 2023
bf2b11b
Update IO tutorial
grst Mar 28, 2023
6284f6f
Update datastructure section with info about single AnnData object
grst Mar 28, 2023
e75dde5
Update main tutorial
grst Mar 29, 2023
c7aaf1d
Update API docs page
grst Mar 29, 2023
e31484c
Minor doc amendments
grst Mar 29, 2023
d81c4ad
WIP update docstrings
grst Mar 29, 2023
0a288ed
Fix docstrings
grst Mar 29, 2023
c278c05
Fix TODOs
grst Mar 29, 2023
be67fab
Fix sphinx warnings
grst Mar 29, 2023
ce30a82
update isort
grst Mar 29, 2023
8ad2e8b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 29, 2023
cd0a9fc
constrain pandas
grst Apr 6, 2023
6e19241
Pandas workarounds
grst Apr 6, 2023
0c3640e
Revert "Pandas workarounds"
grst Apr 6, 2023
718125b
pandas version
grst Apr 6, 2023
bcc5a09
Fix problem with color by gene in clonotype_network
grst Apr 6, 2023
f1831d3
fix missing import in datasets
grst Apr 6, 2023
ccf2604
cancel previous CI jobs automaticallY
grst Apr 6, 2023
b0e6807
test ci
grst Apr 6, 2023
70d7bea
Concurrency should be outside 'jobs'
grst Apr 6, 2023
04a9062
test ci
grst Apr 6, 2023
945c328
Merge remote-tracking branch 'origin/master' into scverse_datastructure
grst Apr 7, 2023
3df7f5f
Update dependencies
grst Apr 7, 2023
7c80137
Update conda dependencies
grst Apr 7, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 9 additions & 6 deletions .conda/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,30 +16,33 @@ build:

requirements:
host:
- python >=3.7
- python >=3.8
- pip!=22.1 # https://github.com/pypa/pip/issues/11110
- flit
- setuptools_scm
- pytoml
- importlib_metadata

run:
- python >=3.7
- anndata >=0.7.6
- scanpy >=1.6.0
- python >=3.8
- anndata >=0.9rc1
- awkward >=2.1.0
- mudata >=0.2.2
- scanpy >=1.9.3
- pandas >=1.5,<2
- numpy >=1.17.0
- scipy
- parasail-python
- scikit-learn
- python-levenshtein
- python-igraph !=0.10.0,!=0.10.1
- adjusttext >=0.7
- networkx >=2.5
- squarify
- tqdm >=4.44.1
- airr >=1.2
- tqdm >=4.63
- adjusttext >=0.7
- numba >=0.41.0
- pooch >=1.7.0

test:
source_files:
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ on:
pull_request:
branches: [master]

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
tests:
if: "!contains(github.event.head_commit.message, 'skip ci')"
Expand Down
15 changes: 9 additions & 6 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ on:
release:
types: [created]

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
docs:
if: "!contains(github.event.head_commit.message, 'skip ci')"
Expand All @@ -17,9 +21,10 @@ jobs:
matrix:
python-version: [3.9]
os:
- ubuntu-latest
# - macos-latest
- windows-latest
- ubuntu-latest
# - macos-latest
- windows-latest

steps:
- uses: actions/checkout@v2
with:
Expand Down Expand Up @@ -67,9 +72,7 @@ jobs:
pip install .[doc,test,rpack,dandelion]
- name: run sphinx
run: |
# cd docs && make html SPHINXOPTS="-W --keep-going"
# TODO do not ignore sphinx warnings
cd docs && make html
cd docs && make html SPHINXOPTS="-W --keep-going"

- name: Get target folder for page deploy from github ref
if: ( matrix.os == 'ubuntu-latest' ) && ( matrix.python-version == '3.8' )
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ on:
schedule:
- cron: "0 5 * * 0"

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
test:
if: "!contains(github.event.head_commit.message, 'skip ci')"
Expand Down
17 changes: 17 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,20 @@ repos:
hooks:
- id: black
language_version: python3.10
- repo: https://github.com/PyCQA/isort
rev: 5.12.0
hooks:
- id: isort
- repo: https://github.com/myint/autoflake
rev: v1.4
hooks:
- id: autoflake
args:
- --in-place
- --remove-all-unused-imports
- --remove-unused-variable
- --ignore-init-module-imports
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-merge-conflict
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ The case study from our paper is available `here <https://icbi-lab.github.io/sci

Installation
^^^^^^^^^^^^
You need to have Python 3.7 or newer installed on your system. If you don't have
You need to have Python 3.8 or newer installed on your system. If you don't have
Python installed, we recommend installing `Miniconda <https://docs.conda.io/en/latest/miniconda.html>`_.

There are several alternative options to install scirpy:
Expand Down
41 changes: 35 additions & 6 deletions docs/api.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _api:

API
===

Expand All @@ -20,10 +22,17 @@ Input/Output: `io`
.. currentmodule:: scirpy

.. note::
In scirpy v0.7.0 the way VDJ data is stored in `adata.obs` has changed to
be fully compliant with the `AIRR Rearrangement <https://docs.airr-community.org/en/latest/datarep/rearrangements.html#productive>`__
schema. Please use :func:`~scirpy.io.upgrade_schema` to make `AnnData` objects
from previous scirpy versions compatible with the most recent scirpy workflow.
**scirpy's data structure has been updated in v0.13.0.**

Previously, receptor data was expanded into columns of `adata.obs`, now they are stored as an :term:`awkward array` in `adata.obsm["airr"]`.
Moreover, we now use :class:`~mudata.MuData` to handle paired transcriptomics and :term:`AIRR` data.

:class:`~anndata.AnnData` objects created with older versions of scirpy can be upgraded with :func:`scirpy.io.upgrade_schema` to be compatible with the latest version of scirpy.

Please check out

* the `release notes <https://github.com/scverse/scirpy/releases/tag/v0.13.0>`_ for details about the changes and
* the documentation about :ref:`Scirpy's data structure <data-structure>`

.. autosummary::
:toctree: ./generated
Expand All @@ -37,6 +46,7 @@ formats.
.. autosummary::
:toctree: ./generated

io.read_h5mu
io.read_h5ad
io.read_10x_vdj
io.read_tracer
Expand Down Expand Up @@ -75,10 +85,25 @@ Preprocessing: `pp`
.. autosummary::
:toctree: ./generated

pp.merge_with_ir
pp.merge_airr_chains
pp.index_chains
pp.merge_airr
pp.ir_dist

Get: `get`
----------

The `get` module allows retrieving :term:`AIRR` data stored in `adata.obsm["airr"]` as a per-cell :class:`~pandas.DataFrame`
or :class:`~pandas.Series`.

.. module:: scirpy.get
.. currentmodule:: scirpy

.. autosummary::
:toctree: ./generated

get.airr
get.obs_context
get.airr_context

Tools: `tl`
-----------
Expand Down Expand Up @@ -211,6 +236,9 @@ Datasets: `datasets`
.. module:: scirpy.datasets
.. currentmodule:: scirpy

Example datasets
^^^^^^^^^^^^^^^^

.. autosummary::
:toctree: ./generated

Expand Down Expand Up @@ -241,6 +269,7 @@ Utility functions: `util`
.. autosummary::
:toctree: ./generated

util.DataHandler
util.graph.layout_components
util.graph.layout_fr_size_aware
util.graph.igraph_from_sparse_matrix
Expand Down
11 changes: 8 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,10 @@
sklearn=("https://scikit-learn.org/stable/", None),
networkx=("https://networkx.org/documentation/networkx-1.10/", None),
dandelion=("https://sc-dandelion.readthedocs.io/en/latest/", None),
muon=("https://muon.readthedocs.io/en/latest", None),
mudata=("https://mudata.readthedocs.io/en/latest/", None),
awkward=("https://awkward-array.org/doc/main/", None),
pooch=("https://www.fatiando.org/pooch/latest/", None),
)


Expand Down Expand Up @@ -130,7 +134,8 @@ def setup(app):
("py:class", "D.get(k,d), also set D[k]=d if k not in D"),
("py:class", "None. Update D from mapping/iterable E and F."),
("py:class", "an object providing a view on D's values"),
# Will work once scipy 1.8 is released
("py:class", "scipy.sparse.base.spmatrix"),
("py:class", "scipy.sparse.csr.csr_matrix"),
# don't know why these are not working
("py:class", "seaborn.matrix.ClusterGrid"),
("py:meth", "mudata.MuData.update"),
("py:class", "awkward.highlevel.Array"),
]
Loading