Add API functions to parse and format SEQ and QUAL fields#1974
Add API functions to parse and format SEQ and QUAL fields#1974jmarshall wants to merge 5 commits intosamtools:developfrom
Conversation
The optimised COPY_MINUS_N() implementation declares its own locals. Make the simple COPY_MINUS_N() declare its own local i rather than shadowing a local of that name from its caller.
e413e21 to
9de8e4f
Compare
|
This mostly looks OK to me. It might be good to mention in the documentation exactly how big the destination buffer needs to be, especially for |
* State that bam_format_seq() and bam_format_qual() destination buffers must be at least b->core.l_qseq bytes long * Note that ASCII values follow the SAM QUAL format * Explain how sam_parse_qual() can fail * sam_parse_seq() always succeeds, so won't return a negative value on error.
These both take `size_t len` and on success return the number of bytes written, so the return type needs to be able to handle a similar magnitude. As sam_parse_seq() cannot fail, it can return size_t. As sam_parse_qual() may need to report an error, it is updated to return ssize_t. In addition, change the second COPY_MINUS_N() implementation to use size_t for its loop variable (the first one already used it).
|
Sorry to be a while getting back to this. I've taken the liberty of adding commits with some documentation updates, and to change the return types of I've added documentation to |
|
I don't particularly like the function names as they're somewhat ambiguous. We have a mix of things used currently. Typically it looks like the first part of our function names is where we're writing to if we're encoding data and reading from if we're simply returning it or processing (eg pileup). We typically don't have the name of the other encoding type anywhere in the function. So This works for However Maybe we just need to rename both functions to be explicit. Eg |
|
The other thought is maybe it's the Really what I'm getting at is simply having both |
This adds functions to the public API to pack and unpack the SEQ and QUAL fields individually, enabling third-party code to take advantage of the optimised and SIMD-optimized implementations of this functionality that HTSlib provides.
The
formatones were the motivation for this; in particular they will be immediately useful for pysam. Theparseones are perhaps of less widespread use (at least in their current form) as usually if writing to abam1_tthere’ll need to be some memory reallocating going on too. But I think pysam would benefit from accessing HTSlib’s implementations of these too.