Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test and document broken_paired_reader #1838

Open
2 tasks
standage opened this issue Feb 12, 2018 · 1 comment
Open
2 tasks

Test and document broken_paired_reader #1838

standage opened this issue Feb 12, 2018 · 1 comment

Comments

@standage
Copy link
Member

I am considering integrating the broken_paired_reader into kevlar (see kevlar-dev/kevlar#207). I understand at a high level what it's intended to do, but I don't understand the specifics of how it actually works or the circumstances under which it will or will not fail.

We have a couple of relevant tests in test_read_handling.py, but these test things at the functional/script level. I suggest we need:

  • documentation for the broken paired reader
    • its intended purpose
    • specifics of what will and will not work
  • some clear unit tests that invoke the broken_paired_reader function directly to enforce the advertised behavior
@standage
Copy link
Member Author

The current docstring actually documents the function pretty well.

    """Read pairs from a stream.

    A generator that yields singletons and pairs from a stream of FASTA/FASTQ
    records (yielded by 'screed_iter').  Yields (n, is_pair, r1, r2) where
    'r2' is None if is_pair is False.

    The input stream can be fully single-ended reads, interleaved paired-end
    reads, or paired-end reads with orphans, a.k.a. "broken paired".

    Usage::

       for n, is_pair, read1, read2 in broken_paired_reader(...):
          ...

    Note that 'n' behaves like enumerate() and starts at 0, but tracks
    the number of records read from the input stream, so is
    incremented by 2 for a pair of reads.

    If 'min_length' is set, all reads under this length are ignored (even
    if they are pairs).

    If 'force_single' is True, all reads are returned as singletons.
    """

I guess we can still consider integrating this into the developer docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant