Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LFQBenchmark experiment - multiple organisms #568

Open
brvpuyve opened this issue Aug 9, 2021 · 11 comments
Open

LFQBenchmark experiment - multiple organisms #568

brvpuyve opened this issue Aug 9, 2021 · 11 comments
Assignees

Comments

@brvpuyve
Copy link

brvpuyve commented Aug 9, 2021

Hi everyone,

I generated an updated LFQbenchmark dataset, similar to the one from Navarro et al. (https://pubmed.ncbi.nlm.nih.gov/27701404/). I was wondering how I could best annotate the mixtures (as pooled samples)? Can I mention more than one organism in the characteristics[organism] column?
Additionally, would it be beneficial to add an additional comment section to define the ratio's of the three proteomes?

Looking forward to your suggestions!

Best,

Bart Van Puyvelde

@mlocardpaulet
Copy link
Collaborator

Hi Bart, did you get any help with this?
I suspect you could use the field characteristics[pooled sample] and list in it all the samples that are pooled (SN=sample 1,sample 2, …​ sample 9 were "sample n" is the value of the corresponding sample in source name).
For the relative quantities I am not sure. Others may have better ideas. Maybe you could use the key QY= to indicate relative quantity (like in characteristics[spiked compound]), but I am not sure how to make the sample names correspond to the quantities.

@mlocardpaulet
Copy link
Collaborator

Also, I don't know how to do if one of the pooled samples is not analysed alone (so there is no .raw file associated to one of the sample names).

@ypriverol
Copy link
Member

ypriverol commented Aug 30, 2021

Hi @mlocardpaulet @brvpuyve :

First, my apologies for the late reply, I was OFF for a couple of weeks. I was discussing a some weeks ago about with @anjaf about how to represent multiplexed samples in an experiment.

We have two options here:

1- Represent each sample as an independent sample, adding a characteristics to the sample called characteristics[concentration of] and link each sample to the same data file. The characteristics[organism] will be different for each sample. This is actually a clean representation because each sample has its own row and can be represented with more characteristics. It has differences with the current pooled approach mentioned by @mlocardpaulet because in the pooled approach samples are used multiple times in their corresponding channel + in the pooled.

It will be something like:

source name characteristics[organism] characteristics[organism part] characteristics[biological replicate] characteristics[concentration of] assay name comment[technical replicate] comment[fraction identifier] comment[label] comment[data file] characteristics[concentration of]
Sample-1 homo sapiens heart 1 70% ms_run 1 1 1 label free sample file1.raw 70%
Sample-2 e coli liver 1 60% ms_run 1 1 1 label free sample file1.raw 60%

As you can see the assay name is the same meaning that the file and the label conditions are the same.

2- @anjaf mentioned before the idea of having an characteristics[organism] called mixed, then we can represent all the species in the sample in the characteristics[pooled sample] as key values pairs with concentrations.

Would be great to have your opinion @anjaf @jgriss @mvaudel @mlocardpaulet @ALL @bigbio/collaborators

@mlocardpaulet
Copy link
Collaborator

Hi @ypriverol
thanks a lot. I like option 1- very much. So to be clear: there will be duplicated file names?

@brvpuyve
Copy link
Author

Option 1 is maybe the best approach although it will be some work for me to add the extra lines :-) Let me know what is decided and I will create the SDRF's.

Thanks for the comments!

@ypriverol
Copy link
Member

Hi @ypriverol
thanks a lot. I like option 1- very much. So to be clear: there will be duplicated file names?

Yes. We have the same case when multiple samples are multiplexed in the same RAW file.

@enryH
Copy link
Contributor

enryH commented Aug 31, 2021

I guess option one is fine if the python client can identify such a case?

  • each raw file, if not unique is a mixture?
  • should concentration add to 100%? (to be valid?)

@jgriss
Copy link
Contributor

jgriss commented Aug 31, 2021

Hi all,

We already have this case covered in some sorts for isobarically labelled experiments (see PXD017799 as an example). Here, we also have mixtures of multiple, independent samples in one raw file.

I therefore strongly suggest to stay consistent with the design approach that was chosen there, which essentially is what @ypriverol mentioned as option 1.

In case of isobarically labelled experiment, this could even be extended to have multiple rows referencing the same channel in the raw file.

@enryH

  • Personally, I think that characteristics[concentration of] should be optional, but if provided must add up to 100% to be valid
  • In isobarically labelled experiments we also refer to each raw file multiple times indicating that it's a mixture. But we might not always have / need f.e. the individual sample concentrations - just to keep this case in mind as well

@mlocardpaulet
Copy link
Collaborator

Hello again, sorry it took me so long to come back to this.
I am looking at the headers that have been utilised in the SDRF generated to date and I see that characteristics[concentration of] is used to define the concentration of compounds defined in characteristics[compound]. So if we go with the option 1 (if I understood well: one row per sample in the pool, with the respective quantities annotated in characteristics[concentration of]), can you distinguish the 2 usages of characteristics[concentration of]?
Could this be an issue?

@enryH
Copy link
Contributor

enryH commented Sep 22, 2021

Hmm. If there is characteristics[organism] and characteristics[compound] then I guess it has to be ordered, but I am not 100% sure about this:

characteristics[organism] characteristics[concentration of] characteristics[compound] characteristics[concentration of]

Could you explain the type of experiment where this is an issue?

But I agree that this could be an issue if it leads to ambiguous interpretations.

@mlocardpaulet
Copy link
Collaborator

Hello,

I guess you are right, I cannot see an example where it would be used.

@ypriverol ypriverol self-assigned this May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants