Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Could not parse name #9

Open
xiekunwhy opened this issue Apr 6, 2021 · 4 comments
Open

ERROR: Could not parse name #9

xiekunwhy opened this issue Apr 6, 2021 · 4 comments

Comments

@xiekunwhy
Copy link

Hi,

I got the following ERROR when using doppelmark(latest binary version) to deal with a ~210Gb bam (doppelmark --bam HX.clean.bam -output HX.dedup1.bam -parallelism 10 --clip-padding=1000 -scratch-dir tmp1 -disk-mate-shards 1000):
...
I0406 06:51:53.611818 56185 mark_duplicates.go:855] shard[43689] info: &{{ 0 2147483647 0 0 0 43689} 0 0 1899883503 1899883503}
E0406 06:51:56.743854 56185 optical_detector.go:124] Could not parse name: E100007937L1C015R0342816153, expected 5, 7, or 8 fields separated by ':'

Any one knows why?

Best,
Kun

@yipal
Copy link
Contributor

yipal commented Apr 6, 2021

Try using the command line argument --optical-distance=-1

Also, is there a reason you're setting --clib-padding=1000 ? How long are your reads?

@xiekunwhy
Copy link
Author

xiekunwhy commented Apr 6, 2021

Hi @yipal ,

The software worked well after using --optical-distance=-1.

The read length I am using is 150bp, but I got 5' alignment distance(150) exceeds padding(143) when using default value and got 5' alignment distance(180) exceeds padding(152) when using 152. So I use a extremely large value, will this value affect results?

Best,
Kun

@yipal
Copy link
Contributor

yipal commented Apr 7, 2021

Setting the clip-padding to 1000 should not cause wrong results, but it will cost you in computational efficiency. I'm confused to how you have 5' alignment distance of 180 when your read length is 150. Could you share the read that causes the clip-padding error?

@xiekunwhy
Copy link
Author

Hi @yipal

Finally, doppelmark told me that the largest value is 219 (2nd line in metrics file)

bio-mark-duplicates

maximum 5' alignment distance: 219

I really don't know why, gap open when mapping?

I don't know how to extract such strange reads quickly from a large bam file, do you have any suggestion?

Best,
Kun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants