documentation refinement suggestions #51

vicfabienne · 2021-06-04T05:36:31Z

Introduciotn

"of all reads which are not aligned to SARS-CoV-2." counter intutive formulation - "unaligned reads" would be more clear

Workflow

we should include the flow chart
"LoFreq" instead of "LoFreg"
VEP --> sarscov2 database
"frequencies" --> "abundance"
"For documentation how to set this up, see: Prepare databases. For a better and interactive visualization of all species present in the wastewater Krona is used. Also here a small step of setting up a database is needed before running the pipeline, see: Prepare databases." - can and should be simplified

Output

Merged Report --> Summary report, overview, timeseries
what do those reports show, what are they for?
When we are so specific about SAM/BAM files, then we also should specific about everything else like the vcf files and vep output. Maybe we can make a table for that

Installation

"going to be packaged soon" ---> "IS packaged"
"from source" - but also with guix. Why? Is it possible to use anything else? How and/or Why not?
we should be provide clear step-by-step instruction on how the set up works with - and without guix if that is possible - of course also always mentioning why that is not favorable and providing the direct links on how to quick-set-up guix first on local machines if that is possible.
mention that it is ok, that setting the environment can take up to 10min
also mention that the terminal prefix has "sh-4.2$" then - this is super confusing
The vep test fails !!!
mention to use "cat" to investigate logs
or can we maybe have an editor?

Getting started

"shell variable PIGX_UNINSTALLED to any value" - Why?
"Then afterwards," - redundant
" source directory." what is the source directory?
The test should be explained way more clear. Instead of just giving the command we should explain the structure of the test data, and also that the providede settings and sample file acutally serve the purpose of beeing templates or in case of the settings.yaml - the file which would execute the default mode and everything else should be set in there. (And link to the more detailed infos further below)

Preparing the input

-[ ] ! double check and state if there are still any restirictions on the name of the read files (e.g. nothing allowed between _R1.fastq)

it should be mentioned already on the beginning for which data that pipeline actually can be used or is tested for, and that people can contribute for adjustments for other data
please let's use tables for the descibtion of the files instead of bullets
"single-end data is not yet supported", "compressed (.gz) reads are not yet supported" that should be stated way earlier - somewhere in the beginning talking about which data can be used for it

Pipeline help

"This is a pipeline for analyzing data from sequenced wastewater" - sequenced how --> goes with the "for which data can this pipeline be used" thing?

Prepare databases

"Depending on the size of the databases.." let's specify which of them (It's also only kraken, right?)
"The folder structure is suggested like this and pre-filled accordingly in the settings file." This sentence doesn't make much sense to me.
"folder structure is suggested like this" --> Link doesn't work
"First download and unpack the database in the databases/kraken_db/" This directory is not provided yet, can this be generated automatically wile building it? Otherwise this sentence could maybe be forumalted a bit differently so it'S clear that this folder is not there yet

The text was updated successfully, but these errors were encountered:

rekado · 2021-06-04T07:55:07Z

This is a weird format for discussing changes :-/

Anyway, here are some comments:

"mention that it is ok, that setting the environment can take up to 10min"

Only on the MDC NFS server. It can also take days, though, if you choose to build everything from source (including compiler toolchains). So I'd prefer not to mention time at all.

also mention that the terminal prefix has "sh-4.2$" then - this is super confusing

No. We are unsetting the prompt variable. This is the default prompt of Bash. It's not unusual, it's just unconfigured. Mentioning that environment variables are unset with --pure is sufficient.

The vep test fails !!!

It no longer fails because we include diffutils.

mention to use "cat" to investigate logs

No, users can use whatever tools they want. We only restrict the environment for development, so that we don't accidentally depend on tools at runtime that are not part of the reproducible environment. Users can set PATH to whatever value they want (it's just reset like all the other variables because of --pure), or they can use the absolute file name of whatever executable they want to use.

or can we maybe have an editor?

No, see above. Use absolute file names.

vicfabienne added the type:documentation Improvements or additions to documentation label Jun 4, 2021

al2na added the question Further information is requested label Jun 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

documentation refinement suggestions #51

documentation refinement suggestions #51

vicfabienne commented Jun 4, 2021 •

edited by mirifax

Loading

rekado commented Jun 4, 2021

documentation refinement suggestions #51

documentation refinement suggestions #51

Comments

vicfabienne commented Jun 4, 2021 • edited by mirifax Loading

Introduciotn

Workflow

Output

Installation

Getting started

Preparing the input

Pipeline help

Prepare databases

rekado commented Jun 4, 2021

vicfabienne commented Jun 4, 2021 •

edited by mirifax

Loading