Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to check which genes are in the database? #24

Open
linxy29 opened this issue Aug 25, 2022 · 4 comments
Open

How to check which genes are in the database? #24

linxy29 opened this issue Aug 25, 2022 · 4 comments

Comments

@linxy29
Copy link

linxy29 commented Aug 25, 2022

Hi,

I'm using pySCENIC to analyze human iPSC data. We are interested in some genes and have the following questions:

  1. We cannot find TBXT in the pySCENIC. Instead, we found T which is another name of TBXT, can we regard T as TBXT?
  2. We are interested in HOPX and SLIT2. We cannot find the information on these two genes either. I found out another Genes not found in cistarget database #21. I'm wondering is there any way we can get gene regulatory information about these two genes, or we can't get meaningful information even if we add these two genes to the database?

The database we used are 'hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather', 'hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather', 'motifs-v9-nr.hgnc-m0.001-o0.0.tbl'.

Thank you for your help.

Best

@ghuls
Copy link
Member

ghuls commented Sep 19, 2022

You can use:

# cd create_cisTarget_Databases

import feather_v1_or_v2


all_columns_in_ctx_db = get_all_column_names_from_feather(feather_file="hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather')

all_columns_in_ctx_db

Gene names for hg38 are HGNC symbols as linked to RefSeq r80.

@linxy29
Copy link
Author

linxy29 commented Oct 12, 2022

Hi @ghuls ,

Thank you very much for your help. I'm still having some trouble getting what I want.

  1. I tried to enter the 'create_cisTarget_databases' folder and ran the code you posted. I got the error: NameError: name 'get_all_column_names_from_feather' is not defined.

  2. Then, I tried to install the create_cisTarget_databases by following the installation guide. I got the error: ld return 1 exit status. I tried several things to debug, but I still failed to install the create_cisTarget_databases module.

1

  1. I googled HGNC and RefSeq r80, but I still have no idea whether TBXT, HOPX, and SLIT2 are in the database.

I checked the website 'https://resources.aertslab.org/cistarget/' and found out a tf_lists/allTFs_hg38.txt file。

I'm wondering 1) whether this 'allTFs_hg38.txt' file contains all the genes in the database? Or 2) what should I do to make the 'get_all_column_names_from_feather' function works?

Thank you for your help.

@ghuls
Copy link
Member

ghuls commented Dec 15, 2022

You don't need to compile Cluster-Buster to be able to check the feather databases.
You just need to create a conda environment with the python dependencies and then when you are in this cloned repo, import feather_v1_or_v2.

You can even just load the whole feather database with pandas in the worst case:

import pandas as pd

df = pd.read_feather("hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather')

df.columns

@ChenJH-scau
Copy link

ChenJH-scau commented Jul 29, 2023

Hello, I would like to ask how to obtain the gene_ID of the. feature file on a Linux terminal? I would greatly appreciate it if you could provide some suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants