LipidFinder Dataframe #7

ksachikonye · 2021-12-18T14:43:13Z

I have tried my best to understand how the PeakFilter csv file is constructed and I just cannot wrap my head around it. Lets say I extract data from an mzml file into a dataframe that looks like this:
| mz | intensity | rt | sample |
| 178.88143920898438 | 1953013.75 | 0.0054765489 |liver01 |
| 215.01097106933594 | 1146770.0 | 0.0054765489 |liver01 |
| 180.87908935546875 | 1083634.375 | 0.0054765489 |liver01 |
| 248.96165466308594 | 591902.75 | 0.0054765489 |liver01 |

This is just a small example for simplicity sake but the number of columns is way higher (including scan number, scan time, dda event index, spec index, dda rank). My questions are:

Where are these "row ids" coming from ? is it a value that can be extracted from a file ?
From my understanding, to come up with the required PeakFilter csv, I have to somehow transform each file that I have so that the sample name becomes a column ? If thats the case, how where the rt and mz values generate in the mz and time column ? Does that mean that every sample should always have a point that matches the time and mz values ? I am lost and I hope these questions dont sound silly, but I just cannot see how a machine as precise as a mass spec can produce such values..Please take me through the process if you will

JAlvarezJarreta · 2022-01-25T18:58:15Z

Hi @ksachikonye, based on your second question I guess you are asking about the input CSV file required by PeakFilter. If so:

Where are these "row ids" coming from ? is it a value that can be extracted from a file ?

These can be generated by you manually if not provided by the preprocessing tool you are using. For instance, they can be just the row number.

From my understanding, to come up with the required PeakFilter csv, I have to somehow transform each file that I have so that the sample name becomes a column ? If thats the case, how where the rt and mz values generate in the mz and time column ? Does that mean that every sample should always have a point that matches the time and mz values ? I am lost and I hope these questions dont sound silly, but I just cannot see how a machine as precise as a mass spec can produce such values..Please take me through the process if you will

With tools like XCMS, all samples are processed and a single file is generated with the relevant mz and rt elements (common peaks across samples, in a way) and their intensity in each sample (and solvent) provided, meeting the CSV format mentioned in the documentation. Furthermore, XCMS will do its bit in removing some artefacts and such to clean your dataset from non-lipid elements (but many more are left, thus the need for LipidFinder).

Hope this helps. Let me know if you have further questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LipidFinder Dataframe #7

LipidFinder Dataframe #7

ksachikonye commented Dec 18, 2021

JAlvarezJarreta commented Jan 25, 2022 •

edited

Loading

LipidFinder Dataframe #7

LipidFinder Dataframe #7

Comments

ksachikonye commented Dec 18, 2021

JAlvarezJarreta commented Jan 25, 2022 • edited Loading

JAlvarezJarreta commented Jan 25, 2022 •

edited

Loading