Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LipidFinder Dataframe #7

Open
ksachikonye opened this issue Dec 18, 2021 · 1 comment
Open

LipidFinder Dataframe #7

ksachikonye opened this issue Dec 18, 2021 · 1 comment

Comments

@ksachikonye
Copy link

I have tried my best to understand how the PeakFilter csv file is constructed and I just cannot wrap my head around it. Lets say I extract data from an mzml file into a dataframe that looks like this:
| mz | intensity | rt | sample |
| 178.88143920898438 | 1953013.75 | 0.0054765489 |liver01 |
| 215.01097106933594 | 1146770.0 | 0.0054765489 |liver01 |
| 180.87908935546875 | 1083634.375 | 0.0054765489 |liver01 |
| 248.96165466308594 | 591902.75 | 0.0054765489 |liver01 |

This is just a small example for simplicity sake but the number of columns is way higher (including scan number, scan time, dda event index, spec index, dda rank). My questions are:

  1. Where are these "row ids" coming from ? is it a value that can be extracted from a file ?
  2. From my understanding, to come up with the required PeakFilter csv, I have to somehow transform each file that I have so that the sample name becomes a column ? If thats the case, how where the rt and mz values generate in the mz and time column ? Does that mean that every sample should always have a point that matches the time and mz values ? I am lost and I hope these questions dont sound silly, but I just cannot see how a machine as precise as a mass spec can produce such values..Please take me through the process if you will
@JAlvarezJarreta
Copy link
Collaborator

JAlvarezJarreta commented Jan 25, 2022

Hi @ksachikonye, based on your second question I guess you are asking about the input CSV file required by PeakFilter. If so:

Where are these "row ids" coming from ? is it a value that can be extracted from a file ?

These can be generated by you manually if not provided by the preprocessing tool you are using. For instance, they can be just the row number.

From my understanding, to come up with the required PeakFilter csv, I have to somehow transform each file that I have so that the sample name becomes a column ? If thats the case, how where the rt and mz values generate in the mz and time column ? Does that mean that every sample should always have a point that matches the time and mz values ? I am lost and I hope these questions dont sound silly, but I just cannot see how a machine as precise as a mass spec can produce such values..Please take me through the process if you will

With tools like XCMS, all samples are processed and a single file is generated with the relevant mz and rt elements (common peaks across samples, in a way) and their intensity in each sample (and solvent) provided, meeting the CSV format mentioned in the documentation. Furthermore, XCMS will do its bit in removing some artefacts and such to clean your dataset from non-lipid elements (but many more are left, thus the need for LipidFinder).

Hope this helps. Let me know if you have further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants