-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vcztools view: INFO field computation performance #79
Comments
We already have a C extension module, so it wouldn't be that hard to update it to include computing AC and AN. |
See #77 (comment) for details on slowdown |
Will-Tyler
changed the title
INFO field computation performance
vcztools view: INFO field computation performance
Oct 2, 2024
Putting this in the initial release milestone for now, can triage out later if it's not critical. |
Removing this from initial release as sample subsetting is temporarily dropped in #120 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
When the user specifies a sample selection in vcztools view, vcztools recalculates the AC and AN INFO fields. This is consistent with bcftools' behavior. vcztools calculates these INFO fields using all of the samples in a variant-wise chunk of genotype data. The current implementation in pure Python using NumPy may be slow and create a lot of overhead. This issue is to improve the computation and memory efficiency. The solution may require calculating AC and AN in a C extension module.
The original code was added in #77.
References
The text was updated successfully, but these errors were encountered: