Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcztools view: INFO field computation performance #79

Open
Will-Tyler opened this issue Sep 9, 2024 · 4 comments
Open

vcztools view: INFO field computation performance #79

Will-Tyler opened this issue Sep 9, 2024 · 4 comments

Comments

@Will-Tyler
Copy link
Contributor

Description

When the user specifies a sample selection in vcztools view, vcztools recalculates the AC and AN INFO fields. This is consistent with bcftools' behavior. vcztools calculates these INFO fields using all of the samples in a variant-wise chunk of genotype data. The current implementation in pure Python using NumPy may be slow and create a lot of overhead. This issue is to improve the computation and memory efficiency. The solution may require calculating AC and AN in a C extension module.

The original code was added in #77.

References

@jeromekelleher
Copy link
Contributor

We already have a C extension module, so it wouldn't be that hard to update it to include computing AC and AN.

@jeromekelleher
Copy link
Contributor

See #77 (comment) for details on slowdown

@Will-Tyler Will-Tyler changed the title INFO field computation performance vcztools view: INFO field computation performance Oct 2, 2024
@jeromekelleher jeromekelleher added this to the Initial release milestone Nov 19, 2024
@jeromekelleher
Copy link
Contributor

Putting this in the initial release milestone for now, can triage out later if it's not critical.

@jeromekelleher
Copy link
Contributor

Removing this from initial release as sample subsetting is temporarily dropped in #120

@jeromekelleher jeromekelleher removed this from the Initial release milestone Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants