Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conditional print method for objects that contain alternative experiments #89

Open
wants to merge 148 commits into
base: master
Choose a base branch
from

Conversation

Biomiha
Copy link

@Biomiha Biomiha commented Aug 29, 2023

Hi Stefano,

I've now made the printing conditional and have amended the show() unit test to assure that the default print method works. For the unit test I've added a random normcounts matrix to the existing pbmc_small sce object and named it ADT
When printed to the console the header now looks like this:

   > pbmc_small
    `# A SingleCellExperiment-tibble abstraction: 80 × 17
    `# Features=230 | Cells=80 | Assays=counts, logcounts, ADT-normcounts

The print method takes any number of alternative experiments and appends the name to the assayNames contained therein, separated by a hyphen. I think this looks the tidiest. I hope you agree.
If no alternative experiments exist, the print output is as before.

I'll start working on how to pull features from the alternative experiment slots into the tibble with join_features.

@stemangiola
Copy link
Owner

stemangiola commented Aug 29, 2023

Hi Stefano,

I've now made the printing conditional and have amended the show() unit test to assure that the default print method works. For the unit test I've added a random normcounts matrix to the existing pbmc_small sce object and named it ADT When printed to the console the header now looks like this:

   > pbmc_small
    `# A SingleCellExperiment-tibble abstraction: 80 × 17
    `# Features=230 | Cells=80 | Assays=counts, logcounts, ADT-normcounts

The print method takes any number of alternative experiments and appends the name to the assayNames contained therein, separated by a hyphen. I think this looks the tidiest. I hope you agree. If no alternative experiments exist, the print output is as before.

I'll start working on how to pull features from the alternative experiment slots into the tibble with join_features.

Amazing!

So if I can imagine a complex SCE with 3 expriments

  • base: counts, logcounts
  • ADT: counts, logcounts
  • Hashtag demultiplex: counts, logcounts

we would have

`# A SingleCellExperiment-tibble abstraction: 80 × 17
`# Features=230 | Cells=80 | Assays=counts, logcounts, ADT-counts, ADT-logcounts, Hashtag demultiplex-counts, Hashtag demultiplex-logcounts

Am I interpreting correctly?

@Biomiha
Copy link
Author

Biomiha commented Aug 29, 2023

Yes, exactly. When I tested on a more complicated example I noticed that it creates a new line if the maximum line width is exceeded, but it still prints everything.

@stemangiola
Copy link
Owner

Yes, exactly.

Your solution looks great, and very intuitive, especially if experiment names are short.

Just to explore all possibilities, could you think about a solution that avoids the repetition of experiment names, so as to increase the scalability of our printout to many long experiment names?

@Biomiha
Copy link
Author

Biomiha commented Aug 29, 2023

Printing the name of the experiment instead, followed by the assayNames and separated by a semi-colon is easy enough.
That would look something like:

     `# A SingleCellExperiment-tibble abstraction: 7,865 × 3
     `# Features=33538 | Cells=7865 | Assays=base: counts, logcounts; Antibody Capture: counts, logcounts; Hashtag demultiplex: counts, logcounts

However, like I said in my (edited) comment above, the native tibble print method will break lines if the character count gets too long (I can actually see a difference in how it's printed on my laptop vs. my wide-screen display).
I think it's a matter of personal preference really. I find the colons and semi-colons a bit much and prefer the wider print but we can ask others and find a consensus. I also think we would need to find another name for base because it implies the base R vs. tidy R semantics, whereas here it's actually the main experiment and alternative experiments (and I find even this nomenclature pretty bad to be honest).

@stemangiola
Copy link
Owner

stemangiola commented Aug 29, 2023

I find the colons and semi-colons a bit much and prefer the wider print but we can ask others and find a consensus.

I like the idea. Let's present both an "easy" and "hard" scenario to the community.

I also think we would need to find another name for base because it implies the base R vs. tidy R semantics

Agree.

I think slack Bioc tidy is the best place where to ask our community. But feel free to use any tool.

@stemangiola
Copy link
Owner

stemangiola commented Aug 29, 2023

@Biomiha Thinking about it. You can leave it as is; maybe edit the example in the unit test to be the worst-case scenario. If it looks good in the worst-case scenario, there is no reason to change for the time being.

And instead, you can focus on the features such as join_feature, aggregate_cells, etc...

New, more comprehensive unit tests
@Biomiha
Copy link
Author

Biomiha commented Aug 30, 2023

Okay, I've now mocked up the expression matrices for the additional alternative experiments and added that to the unit tests. In my hands it works as expected. Will move on to joining features next.

Add altExp functionality to `get_abundance_sc_wide` and `get_abundance_sc_long`.
@Biomiha
Copy link
Author

Biomiha commented Sep 2, 2023

So, I think the join_features method is good to go. The way I envisaged it is that you can pick any feature for the long format and it automatically pulls the abundance for all the assays in that particular experiment. So if the features are from the ADT altExp it will just silently pull the values from all assays in the ADT altExp. The one thing I chose not to allow is the mixing of features from different experiments, which I think should be avoided for other reasons.
For the wide shape, you obviously need to pick which assay to pull from and that makes things pretty straightforward. To avoid pulling assays with same name from different experiments I have chosen to use the same nomenclature that I used in the print method, e.g. logcounts, ADT-logcounts or HTO-logcounts.
I have noticed however that there is a name.repair argument somewhere because at least for me features that have an illegal character get it changed to a "." in the wide format, for example my mock Ab features Ab-1 and Ab-2, etc... get changed to Ab.1, Ab.2, ... I think that could prove problematic because there are quite a few gene names that contain hyphens (like all mithchondrial genes).

Split out functions
@Biomiha
Copy link
Author

Biomiha commented Feb 4, 2024

I've now consolidated the differences since the two repos were diverging. For RNA only experiments it should produce the same results and for experiments with CITE-seq / Abseq data it should seamlessly add them to the output.
If you don't mind I was going to ask the hive mind to kick the tyres a bit and test it on their data to see if there any edge cases that I have not accounted for.
On an unrelated note - I have noticed that in cases where the parent sce object has multiple reducedDims and a lot of colData entries, it takes an exceedingly long time to even print the tidySCE object. Worth considering how to speed this up because I expect users might be turned away by this.

@stemangiola
Copy link
Owner

The first thing, is to see if your PR passes unit tests (the master branch does). Then, I will review the code to ensure it keeps a consistent style with the main repo (I might ask to change few things in case).

If you don't mind I was going to ask the hive mind to kick the tyres a bit and test it on their data to see if there any edge cases that I have not accounted for.

Sure go ahead!

On an unrelated note - I have noticed that in cases where the parent sce object has multiple reducedDims and a lot of colData entries, it takes an exceedingly long time to even print the tidySCE object. Worth considering how to speed this up because I expect users might be turned away by this.

Yes this effort is part of the challenges. But nobody has taken them. We need to optimise tidySCE and tidySE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tidySCE Functionality to see and access the altExp slot
2 participants