Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential directions #2

Open
schlegelp opened this issue Apr 1, 2021 · 1 comment
Open

Potential directions #2

schlegelp opened this issue Apr 1, 2021 · 1 comment

Comments

@schlegelp
Copy link
Contributor

@bdpedigo et al. (I'm afraid I don't have the other's Github handles)

Just to quickly follow up on our short discussion just now.

I'm afraid I was a bit scatter-brained and didn't do a good job explaining the question I'm (personally) most keen on but you actually summed it up quite nicely into a single word: "stability". Basically: what's the highest granularity we can reach while making sure that groups/clusters can still be reliably recovered across data sets? So it's both a matching and a grouping/clustering problem.

Let's say, for example, you have 5 neurons A, B, C, D and E on FAFB left that fall into two obvious clusters (A, B, C) and (D, E). In a first step you would try to find matches for these 5 neurons in FAFB right and hemibrain (keeping in mind that it might not be a 1:1:1 matching). Once we have that, we can ask whether we see the same clusters in the other two data sets, - or whether we see e.g. (A, B) and (C, D, E) in the hemibrain and (A, D) and (B, C, E) in FAFB right. A conservation of clusters/groups supports the view that (A, B, C) and (D, E) likely represent two cell types. In case of the latter they are more likely to represent a single cell type (A, B, C, D, E). To my mind, it's critical to include all three data sets in that comparison (e.g. to have a tie-breaker).

In practical terms for potential next steps: I'd be very curious to see how the highly granular hemibrain labels behave after matching the hemibrain neurons to FAFB left and right. To illustrate with another example: let's say you have five hemibrain mPNs falling into two types (labels) - 2 x M_lvPNm25 and 3 x M_lvPNm26 - and you find matches for all 5 in FAFB left and FAFB right. When you then look at the 2 M_lvPNm25 and 3 M_lvPNm26 candidates in FAFB left and FAFB right: are they more similar to each other within type (i.e. M_lvPNm25 <-> M_lvPNm25 and M_lvPNm26 <-> M_lvPNm26) than across type, or do you see cases where a putative M_lvPNm25 match is actually more similar to a M_lvPNm26 candidate?

Rephrasing the above: leveraging not just one but three data sets, do you see any indication that e.g. M_lvPNm25 and M_lvPNm26 should really have the same label. Or conversely: maybe M_lvPNm25 actually breaks into multiple groups in FAFB left and right.

I hope this makes some sense. As Greg mentioned, in our recent preprint I used a rather naive approach with only across- but not within-dataset matches to try and address this but you guys are obviously much more experienced with that kind of thing. I also imagine that it will be difficult to get clear-cut answers to above questions but maybe you can think of a way to get a something like a "stability score" - i.e. something that describes how well a given group can be recovered in another dataset.

@bdpedigo
Copy link

bdpedigo commented Apr 1, 2021

@jovo @asaadeldin11 @tliu68 lots for us to think about above ^

Thanks @schlegelp we'll look closely at this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants