You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@bdpedigoet al. (I'm afraid I don't have the other's Github handles)
Just to quickly follow up on our short discussion just now.
I'm afraid I was a bit scatter-brained and didn't do a good job explaining the question I'm (personally) most keen on but you actually summed it up quite nicely into a single word: "stability". Basically: what's the highest granularity we can reach while making sure that groups/clusters can still be reliably recovered across data sets? So it's both a matching and a grouping/clustering problem.
Let's say, for example, you have 5 neurons A, B, C, D and E on FAFB left that fall into two obvious clusters (A, B, C) and (D, E). In a first step you would try to find matches for these 5 neurons in FAFB right and hemibrain (keeping in mind that it might not be a 1:1:1 matching). Once we have that, we can ask whether we see the same clusters in the other two data sets, - or whether we see e.g. (A, B) and (C, D, E) in the hemibrain and (A, D) and (B, C, E) in FAFB right. A conservation of clusters/groups supports the view that (A, B, C) and (D, E) likely represent two cell types. In case of the latter they are more likely to represent a single cell type (A, B, C, D, E). To my mind, it's critical to include all three data sets in that comparison (e.g. to have a tie-breaker).
In practical terms for potential next steps: I'd be very curious to see how the highly granular hemibrain labels behave after matching the hemibrain neurons to FAFB left and right. To illustrate with another example: let's say you have five hemibrain mPNs falling into two types (labels) - 2 x M_lvPNm25 and 3 x M_lvPNm26 - and you find matches for all 5 in FAFB left and FAFB right. When you then look at the 2 M_lvPNm25 and 3 M_lvPNm26 candidates in FAFB left and FAFB right: are they more similar to each other within type (i.e. M_lvPNm25 <-> M_lvPNm25 and M_lvPNm26 <-> M_lvPNm26) than across type, or do you see cases where a putative M_lvPNm25 match is actually more similar to a M_lvPNm26 candidate?
Rephrasing the above: leveraging not just one but three data sets, do you see any indication that e.g. M_lvPNm25 and M_lvPNm26 should really have the same label. Or conversely: maybe M_lvPNm25 actually breaks into multiple groups in FAFB left and right.
I hope this makes some sense. As Greg mentioned, in our recent preprint I used a rather naive approach with only across- but not within-dataset matches to try and address this but you guys are obviously much more experienced with that kind of thing. I also imagine that it will be difficult to get clear-cut answers to above questions but maybe you can think of a way to get a something like a "stability score" - i.e. something that describes how well a given group can be recovered in another dataset.
The text was updated successfully, but these errors were encountered:
@bdpedigo et al. (I'm afraid I don't have the other's Github handles)
Just to quickly follow up on our short discussion just now.
I'm afraid I was a bit scatter-brained and didn't do a good job explaining the question I'm (personally) most keen on but you actually summed it up quite nicely into a single word: "stability". Basically: what's the highest granularity we can reach while making sure that groups/clusters can still be reliably recovered across data sets? So it's both a matching and a grouping/clustering problem.
Let's say, for example, you have 5 neurons
A
,B
,C
,D
andE
on FAFB left that fall into two obvious clusters(A, B, C)
and(D, E)
. In a first step you would try to find matches for these 5 neurons in FAFB right and hemibrain (keeping in mind that it might not be a 1:1:1 matching). Once we have that, we can ask whether we see the same clusters in the other two data sets, - or whether we see e.g.(A, B)
and(C, D, E)
in the hemibrain and(A, D)
and(B, C, E)
in FAFB right. A conservation of clusters/groups supports the view that(A, B, C)
and(D, E)
likely represent two cell types. In case of the latter they are more likely to represent a single cell type(A, B, C, D, E)
. To my mind, it's critical to include all three data sets in that comparison (e.g. to have a tie-breaker).In practical terms for potential next steps: I'd be very curious to see how the highly granular hemibrain
labels
behave after matching the hemibrain neurons to FAFB left and right. To illustrate with another example: let's say you have five hemibrain mPNs falling into two types (labels) - 2 xM_lvPNm25
and 3 xM_lvPNm26
- and you find matches for all 5 in FAFB left and FAFB right. When you then look at the 2M_lvPNm25
and 3M_lvPNm26
candidates in FAFB left and FAFB right: are they more similar to each other within type (i.e.M_lvPNm25 <-> M_lvPNm25
andM_lvPNm26 <-> M_lvPNm26
) than across type, or do you see cases where a putativeM_lvPNm25
match is actually more similar to aM_lvPNm26
candidate?Rephrasing the above: leveraging not just one but three data sets, do you see any indication that e.g.
M_lvPNm25
andM_lvPNm26
should really have the same label. Or conversely: maybeM_lvPNm25
actually breaks into multiple groups in FAFB left and right.I hope this makes some sense. As Greg mentioned, in our recent preprint I used a rather naive approach with only across- but not within-dataset matches to try and address this but you guys are obviously much more experienced with that kind of thing. I also imagine that it will be difficult to get clear-cut answers to above questions but maybe you can think of a way to get a something like a "stability score" - i.e. something that describes how well a given group can be recovered in another dataset.
The text was updated successfully, but these errors were encountered: