GenotypeDisplay outputted to Matrix/DataFrame #1040
Replies: 2 comments 6 replies
-
Hi Zach, Calling To get the genotype data as an integer NumPy array you would call Hope that helps, |
Beta Was this translation helpful? Give feedback.
-
Hi Zach, You are absolutely right that it's better to work with integers rather than strings for downstream analysis. Sgkit datastructures use xarray, so you can use its API for manipulation, including converting final results to pandas or pydata visualization libraries. The
The mask is needed to take into account missing calls. There are more examples like this in the sgkit docs at https://pystatgen.github.io/sgkit/latest/getting_started.html, and also https://pystatgen.github.io/sgkit/latest/examples/gwas_tutorial.html. This operation to aggregate/collapse across the ploidy dimension seems fairly common. We do it in the sgkit code, but don't expose it as a user function. @timothymillar any thoughts about if this would be useful/what an API might look like if so? |
Beta Was this translation helpful? Give feedback.
-
Hi,
Is there a simple way to get the output for sgkit.display_genotype into a dataframe or numpy array? I want each variant_id as a row, each sample_id as a column, and each entry to be the genotype (in may case 0/0, 1/0, 1/1, or ./.). The display_genotype method shows me exactly what I want, but I want that outputted to a matrix. Since the output is a 'GenotypeDisplay' object, typical python packages aren't letting me work with it appropriately.
Thank you for the help,
Zach
Beta Was this translation helpful? Give feedback.
All reactions