You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I followed the training steps listed here, but the scHPL.train.train_tree() function only returned a flat tree.
Additionally, I am using treeArches to integrate the Tabula Sapiens data with another published dataset, A cell atlas of human thymic development, to generate an integrated reference. Both data subsets are from the same tissue. However, the tree that is generated is incorrect and does not properly represent the relationships between known cell types.
My code for the integration of the two published datasets is below:
Do you have any suggestions for improving the tree generated by treeArches, either for the analysis of the single dataset only or for the integration of the two? I am able to successfully integrate the second dataset (A cell atlas of human thymic development) with a third (unpublished) dataset and the tree is also generated successfully in that case, so I believe the issue is likely related to running the analysis Tabula Sapiens dataset.
Are there any specific factors to consider when using treeArches on a large cell atlas, like Tabula Sapiens?
The text was updated successfully, but these errors were encountered:
Issue 1: Flat tree when training on one dataset
When you only use the train_tree function, no hierarchy is learned indeed. This function only trains the tree (the classifiers) you input. So when you follow the GitHub issue you mentioned, you input a flat tree, so the output will also be a flat tree. In the basic tutorial we explain how you can input a hierarchy (e.g. based on prior knowledge) as well here using the newick format. If you do this, you have to make sure that at least the names of the leaf nodes correspond exactly to cell type labels in your dataset. scHPL can only learn the hierarchy automatically when multiple datasets are used as input.
Issue 2 Incorrect tree
Did you check if the data is integrated correctly? If the integration doesn't look good, scHPL won't be able to match the cell types correctly either.
What kind of mistakes are made? Are there missing cell types or are there weird matches between cell types? Sometimes weird matches can be explained by wrong original annotations. For instance, in our original publication, we saw that cell-type labels of some populations got swapped. You could visualize marker genes for the wrongly matched cell types and see if that is the case.
If dataset 2 and 3 work well, you could also swap the order of the datasets. Usually, this does not have a big influence, but if the first dataset is the problematic one, things might improve slightly. You could also try to play around with the parameters (e.g. different number of k), but if there are weird matches I doubt that this will make a difference.
scHPL apparently breaks with pandas 2.0. In the requirements file, I added pandas < 2.0 as a requirement now. If you have pandas 2.0 installed, I would suggest downgrading it.
I am using treeArches to train a tree for a single-tissue (thymus) subset of the Tabula Sapiens dataset (https://tabula-sapiens-portal.ds.czbiohub.org/).
I followed the training steps listed here, but the scHPL.train.train_tree() function only returned a flat tree.
Additionally, I am using treeArches to integrate the Tabula Sapiens data with another published dataset, A cell atlas of human thymic development, to generate an integrated reference. Both data subsets are from the same tissue. However, the tree that is generated is incorrect and does not properly represent the relationships between known cell types.
My code for the integration of the two published datasets is below:
Do you have any suggestions for improving the tree generated by treeArches, either for the analysis of the single dataset only or for the integration of the two? I am able to successfully integrate the second dataset (A cell atlas of human thymic development) with a third (unpublished) dataset and the tree is also generated successfully in that case, so I believe the issue is likely related to running the analysis Tabula Sapiens dataset.
Are there any specific factors to consider when using treeArches on a large cell atlas, like Tabula Sapiens?
The text was updated successfully, but these errors were encountered: