Standardized bioinformatic pipeline #6

SSuominen1 · 2021-03-18T16:14:31Z

How is it best to register used bioinformatic tool/pipelines?

I understood there are some developments for this in ocean best practices, we should look into that.

Through the PacMAN project, OBIS will also be developing a pipeline, or researching how output from existing pipelines will be formatted for Dwc-A. Is there need for this from other users?

cpavloud · 2021-11-26T12:37:02Z

Could we use the term "identificationRemarks" to specify the pipeline used (along with all its relevant - user selected - parameters, separated by vertical bar space ( | )) and the "identificationReferences" term for the reference/citation/url of the pipeline?

dschigel · 2021-11-26T13:18:24Z

Looks like our DNA guide recommends identifictionRererences, see https://docs.gbif.org/publishing-dna-derived-data/1.0/en/#mapping-metabarcoding-edna-and-barcoding-data @thomasstjerne please take a look: I think the issue that we have remarks and reference, but no clear place to paste the pipeline name. One may claim that reference includes the name and number, but perhaps this is not good enough for @cpavloud?

pieterprovoost · 2021-11-26T13:30:21Z

Just thinking out loud here, but for many pipelines a run with a specific set of parameters will be defined by a custom configuration file or makefile. Perhaps the recommendation should be that this file is committed to source control (GitHub or other) and included as one of the identificationReferences. I think that would benefit reproducibility.

cpavloud · 2021-11-26T13:38:13Z

@dschigel
My issue is that
a) in the case that a pipeline is used (e.g. QIIME2), providing just the name is not enough. The parameters that were selected by the user for each step of the bioinformatic analysis should be documented, so that the analysis is replicable.
b) in the case that different individuals tools are used (one for each step of the analysis, e.g. sickle for the quality filtering, pandaseq for the merging, UCHIME for the chimera removal etc.) then the identificationReferences should contain more than name and also (again) the parameters that were selected by the user for each tool should be documented.

@pieterprovoost yes, this is a good idea and it can be used for certain pipelines. Also, maybe the sop term can be used for a full documentation of the analysis instead of the identificationReferences? In this case (again), the user/data provider should have deposited the sop in a (GitHub or other) repository.

thomasstjerne · 2021-11-26T13:59:17Z

@cpavloud in the DNA derived data extension there are dedicated fields for (at least some) individual pipeline steps.
For example the field chimera_check is supposed to have a value like uchime;v4.1;default parameters.

These fields origins from the MIxS standard and I think it would be fair to ask if e.g. the seq_quality_check field is appropriate for information about quality filtering. And also if there is a field intended for the merging.

But I think that it would always be desirable to have a link in the sop field to a structured pipeline description.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardized bioinformatic pipeline #6

Standardized bioinformatic pipeline #6

SSuominen1 commented Mar 18, 2021

cpavloud commented Nov 26, 2021

dschigel commented Nov 26, 2021

pieterprovoost commented Nov 26, 2021

cpavloud commented Nov 26, 2021 •

edited

Loading

thomasstjerne commented Nov 26, 2021 •

edited

Loading

Standardized bioinformatic pipeline #6

Standardized bioinformatic pipeline #6

Comments

SSuominen1 commented Mar 18, 2021

cpavloud commented Nov 26, 2021

dschigel commented Nov 26, 2021

pieterprovoost commented Nov 26, 2021

cpavloud commented Nov 26, 2021 • edited Loading

thomasstjerne commented Nov 26, 2021 • edited Loading

cpavloud commented Nov 26, 2021 •

edited

Loading

thomasstjerne commented Nov 26, 2021 •

edited

Loading