Skip to content

Add AC and AN after tskit to zarr conversion #386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
duncanMR opened this issue May 13, 2025 · 1 comment
Open

Add AC and AN after tskit to zarr conversion #386

duncanMR opened this issue May 13, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@duncanMR
Copy link

Currently, allele counts are not computed when converting from tskit:

Image

It would be helpful to have AC and AN available in the zarr for filtering. It seems like sgkit's count_call_alleles function would be helpful here:

from bio2zarr import tskit as ts2zarr
import numpy as np
import tskit
import sgkit

zarr_path = "data/zarr/test.zarr"
ts_path = "data/simulated/test.trees"

arg = tskit.load(ts_path)
individuals_nodes = np.zeros((arg.num_individuals, 2), dtype=int)
for individual in arg.individuals():
    id = individual.id
    individuals_nodes[id] = individual.nodes
    
ts2zarr.convert(ts_path=ts_path, zarr_path=zarr_path, individuals_nodes=individuals_nodes)
ds = sgkit.load_dataset(zarr_path, consolidated=False)

AC = sgkit.count_call_alleles(ds)["call_allele_count"].sum(dim="samples").values
@jeromekelleher
Copy link
Contributor

I think we should compute these on all conversions (unless already present, or explicitly disabled). We can add some code based on what we're using in vcztools to do this, using the same strategy as for computing local alleles

@jeromekelleher jeromekelleher added the enhancement New feature or request label May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants