-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error creating WisecondorFF reference #1
Comments
A look at the log file reveals the problem, which shows that all regions (when selecting for the fragment size) are masked. The selection of fragment size regions is based on both the absolute and normalized read count (in my experience a read count that is too low within a region yields unreliable fragment size estimates). The absolute read count is used here as a lower bound, whereas the normalized count may be higher (variable per sample).
In any case I pushed an update that will let you specify the below parameters, such that you can tweak the absolute and normalized cutoffs (which are naturally dependent on chosen region size and sample coverage). When using a smaller region size (250 kb), it may be necessary to set RC_CLIP_ABS to a lower value (in general I don't think you need to touch RC_CLIP_NORM). I recommend you to either set RC_CLIP_ABS lower such that sufficient regions remain to actually build a reference or use a bigger region size (the fragment size is a lot more reliable at a bigger size) such that this should no longer be an issue.
|
The error persists with the new commit. I've reduced |
Ok that's rather strange, and you did use paired end reads? What aligner was used? If you can could you send one of the NPZ files that you're using to build the reference? I can have a look at the read distributions then. |
This particular dataset was single-end reads aligned with BWA. Unfortunately I cannot send the NPZ files without going through red tape as it is confidential patient data. However, we recently received data from a new cohort (19 NK reference samples) where the reference seemed to have build without issue, even without specifying I've attached the new cohort log file where you can see that the observation threshold for removal are determined per npz file:
|
So you definitely need paired-end reads, since the method, in its current form, relies on the distance (insert size) measured between aligned reads to finally compare insert size distributions between regions. I assume the new data is paired since it actually passes the construction phase, and the log seems to be as you would expected. |
I'm having some issues building the WisecondorFF reference, but no issues converting from BAM to NPZ. I've run it with a log level for debugging. For context, I'm creating the reference from 97 normal samples (97 NPZ files). Snippet below, log file attached.
peg2_ref-n519-29682026.txt
The text was updated successfully, but these errors were encountered: