You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tried my best to understand how the PeakFilter csv file is constructed and I just cannot wrap my head around it. Lets say I extract data from an mzml file into a dataframe that looks like this:
| mz | intensity | rt | sample |
| 178.88143920898438 | 1953013.75 | 0.0054765489 |liver01 |
| 215.01097106933594 | 1146770.0 | 0.0054765489 |liver01 |
| 180.87908935546875 | 1083634.375 | 0.0054765489 |liver01 |
| 248.96165466308594 | 591902.75 | 0.0054765489 |liver01 |
This is just a small example for simplicity sake but the number of columns is way higher (including scan number, scan time, dda event index, spec index, dda rank). My questions are:
Where are these "row ids" coming from ? is it a value that can be extracted from a file ?
From my understanding, to come up with the required PeakFilter csv, I have to somehow transform each file that I have so that the sample name becomes a column ? If thats the case, how where the rt and mz values generate in the mz and time column ? Does that mean that every sample should always have a point that matches the time and mz values ? I am lost and I hope these questions dont sound silly, but I just cannot see how a machine as precise as a mass spec can produce such values..Please take me through the process if you will
The text was updated successfully, but these errors were encountered:
Hi @ksachikonye, based on your second question I guess you are asking about the input CSV file required by PeakFilter. If so:
Where are these "row ids" coming from ? is it a value that can be extracted from a file ?
These can be generated by you manually if not provided by the preprocessing tool you are using. For instance, they can be just the row number.
From my understanding, to come up with the required PeakFilter csv, I have to somehow transform each file that I have so that the sample name becomes a column ? If thats the case, how where the rt and mz values generate in the mz and time column ? Does that mean that every sample should always have a point that matches the time and mz values ? I am lost and I hope these questions dont sound silly, but I just cannot see how a machine as precise as a mass spec can produce such values..Please take me through the process if you will
With tools like XCMS, all samples are processed and a single file is generated with the relevant mz and rt elements (common peaks across samples, in a way) and their intensity in each sample (and solvent) provided, meeting the CSV format mentioned in the documentation. Furthermore, XCMS will do its bit in removing some artefacts and such to clean your dataset from non-lipid elements (but many more are left, thus the need for LipidFinder).
Hope this helps. Let me know if you have further questions.
I have tried my best to understand how the PeakFilter csv file is constructed and I just cannot wrap my head around it. Lets say I extract data from an mzml file into a dataframe that looks like this:
| mz | intensity | rt | sample |
| 178.88143920898438 | 1953013.75 | 0.0054765489 |liver01 |
| 215.01097106933594 | 1146770.0 | 0.0054765489 |liver01 |
| 180.87908935546875 | 1083634.375 | 0.0054765489 |liver01 |
| 248.96165466308594 | 591902.75 | 0.0054765489 |liver01 |
This is just a small example for simplicity sake but the number of columns is way higher (including scan number, scan time, dda event index, spec index, dda rank). My questions are:
The text was updated successfully, but these errors were encountered: