Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation refinement suggestions #51

Open
17 of 30 tasks
vicfabienne opened this issue Jun 4, 2021 · 1 comment
Open
17 of 30 tasks

documentation refinement suggestions #51

vicfabienne opened this issue Jun 4, 2021 · 1 comment
Labels
question Further information is requested type:documentation Improvements or additions to documentation

Comments

@vicfabienne
Copy link
Collaborator

vicfabienne commented Jun 4, 2021

210604_documentation_feedback.md

Introduciotn

"of all reads which are not aligned to SARS-CoV-2." counter intutive formulation - "unaligned reads" would be more clear

Workflow

  • we should include the flow chart
  • "LoFreq" instead of "LoFreg"
  • VEP --> sarscov2 database
  • "frequencies" --> "abundance"
  • "For documentation how to set this up, see: Prepare databases. For a better and interactive visualization of all species present in the wastewater Krona is used. Also here a small step of setting up a database is needed before running the pipeline, see: Prepare databases." - can and should be simplified

Output

  • Merged Report --> Summary report, overview, timeseries
  • what do those reports show, what are they for?
  • When we are so specific about SAM/BAM files, then we also should specific about everything else like the vcf files and vep output. Maybe we can make a table for that

Installation

  • "going to be packaged soon" ---> "IS packaged"

  • "from source" - but also with guix. Why? Is it possible to use anything else? How and/or Why not?

  • we should be provide clear step-by-step instruction on how the set up works with - and without guix if that is possible - of course also always mentioning why that is not favorable and providing the direct links on how to quick-set-up guix first on local machines if that is possible.

  • mention that it is ok, that setting the environment can take up to 10min

  • also mention that the terminal prefix has "sh-4.2$" then - this is super confusing

  • The vep test fails !!!

  • mention to use "cat" to investigate logs

  • or can we maybe have an editor?

Getting started

  • "shell variable PIGX_UNINSTALLED to any value" - Why?
  • "Then afterwards," - redundant
  • " source directory." what is the source directory?
  • The test should be explained way more clear. Instead of just giving the command we should explain the structure of the test data, and also that the providede settings and sample file acutally serve the purpose of beeing templates or in case of the settings.yaml - the file which would execute the default mode and everything else should be set in there. (And link to the more detailed infos further below)

Preparing the input

  • "the user must supply" - we switched to an active language befor already so we should be consistent with that.

  • "type in the shell" --> "give command", "use folowing command", just smth with "command line interface" or "terminal"

  • what is a boilerplate? --- "boilerplate" is a bunch of standard unspecific text

  • can or should I do smth with that know? It is only stuff described, but not if I have to do anything with that.

  • where do I find them now?

  • !! How about the actual working directory?

  • !! I can't create a seperate working directory and use --init there - can I somehow? --- yes, you can, just don't use PIGX_UNINSTALLED

  • !! also that environment doesn't show the directory I'm in, this is confusing --- that's up to your shell settings

  • !! document where this --init sample sheet comes from, intuitively I would assume that this is already dynamically created

  • !! since it is not dynamic - can we make a template which does not contain all the 12 test samples?

-[ ] ! double check and state if there are still any restirictions on the name of the read files (e.g. nothing allowed between _R1.fastq)

  • it should be mentioned already on the beginning for which data that pipeline actually can be used or is tested for, and that people can contribute for adjustments for other data
  • please let's use tables for the descibtion of the files instead of bullets
  • "single-end data is not yet supported", "compressed (.gz) reads are not yet supported" that should be stated way earlier - somewhere in the beginning talking about which data can be used for it

Pipeline help

  • "This is a pipeline for analyzing data from sequenced wastewater" - sequenced how --> goes with the "for which data can this pipeline be used" thing?

Prepare databases

  • "Depending on the size of the databases.." let's specify which of them (It's also only kraken, right?)
  • "The folder structure is suggested like this and pre-filled accordingly in the settings file." This sentence doesn't make much sense to me.
  • "folder structure is suggested like this" --> Link doesn't work
  • "First download and unpack the database in the databases/kraken_db/" This directory is not provided yet, can this be generated automatically wile building it? Otherwise this sentence could maybe be forumalted a bit differently so it'S clear that this folder is not there yet
@vicfabienne vicfabienne added the type:documentation Improvements or additions to documentation label Jun 4, 2021
@rekado
Copy link
Member

rekado commented Jun 4, 2021

This is a weird format for discussing changes :-/

Anyway, here are some comments:

  • "mention that it is ok, that setting the environment can take up to 10min"

Only on the MDC NFS server. It can also take days, though, if you choose to build everything from source (including compiler toolchains). So I'd prefer not to mention time at all.

  • also mention that the terminal prefix has "sh-4.2$" then - this is super confusing

No. We are unsetting the prompt variable. This is the default prompt of Bash. It's not unusual, it's just unconfigured. Mentioning that environment variables are unset with --pure is sufficient.

  • The vep test fails !!!

It no longer fails because we include diffutils.

  • mention to use "cat" to investigate logs

No, users can use whatever tools they want. We only restrict the environment for development, so that we don't accidentally depend on tools at runtime that are not part of the reproducible environment. Users can set PATH to whatever value they want (it's just reset like all the other variables because of --pure), or they can use the absolute file name of whatever executable they want to use.

  • or can we maybe have an editor?

No, see above. Use absolute file names.

@al2na al2na added the question Further information is requested label Jun 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested type:documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants