Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I extract the list of genes names in each tissue? #39

Open
orrl16 opened this issue Nov 30, 2022 · 8 comments
Open

How do I extract the list of genes names in each tissue? #39

orrl16 opened this issue Nov 30, 2022 · 8 comments

Comments

@orrl16
Copy link

orrl16 commented Nov 30, 2022

I am working with the h5ad.

Additional question, what is the different in the pre-process phases of the raw.X and the X datasets?

@aopisco
Copy link
Contributor

aopisco commented Nov 30, 2022

The gene names are in adata.var
raw.X is normalized, .X is normalized and scaled

@orrl16
Copy link
Author

orrl16 commented Nov 30, 2022

Thanks a lot!
Where can I download the 'adata.var' file?

@aopisco
Copy link
Contributor

aopisco commented Nov 30, 2022

it's part of the file, like you access .X or .raw.X you also have .var

@orrl16
Copy link
Author

orrl16 commented Dec 1, 2022 via email

@orrl16
Copy link
Author

orrl16 commented Dec 1, 2022

my email is orr.levy et Yale.edu

@orrl16
Copy link
Author

orrl16 commented Dec 2, 2022

it's part of the file, like you access .X or .raw.X you also have .var

I have tried to look at .var in these files: but there were no information about the gene list...

https://figshare.com/articles/dataset/Processed_files_to_use_with_scanpy_/8273102/2

@aopisco
Copy link
Contributor

aopisco commented Dec 2, 2022

@orrl16 the h5ad objects follow the anndata (adata in short) structure: https://anndata.readthedocs.io/en/latest/index.html

@orrl16
Copy link
Author

orrl16 commented Dec 3, 2022

Thanks again!
I looked at 'var' and found a list of genes names in the length of 22899.
Dataset 'var'
Size: 22899
MaxSize: 22899
I can easily extract the list of genes names from that structure. However, The relations between X (the gene expression table) and the index list is still not clear to me. In both .X or .raw.X there are 33538 genes.
How do I match the 22899 indexes to 33538 genes in the gene expression table?
Best regards and thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants