Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set regional plot option: region_chromatin_files #121

Open
jacklin9703 opened this issue Dec 9, 2024 · 9 comments
Open

How to set regional plot option: region_chromatin_files #121

jacklin9703 opened this issue Dec 9, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@jacklin9703
Copy link

Hi Dr. He,
Firstly, thanks for your remarkable work on developing "gwaslab" tools! It helps me a lot in my study.
I'm trying regional plot visualization with plot_mqq(mode="r") function, but getting into trouble of one option, region_chromatin_files. From the documentation, this option would add an epigenetic track to show chromatin accessibility with a list of Roadmap 15_coreMarks_mnemonics.bed.gz files. However, when applying this option, I met a "TypeError" exception like this:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-6eb4aaf4e415> in <module>
----> 1 sumstats_AD_common_b37.plot_mqq(mode="r", region=(1, 25989606-250000, 25989606+250000), vcf_path=gl.get_path("1kg_eas_hg19"), anno=True, anno_set=["rs12410410", "rs11247857"], anno_alias={"rs12410410":"chr1:25989606:A:T_rs12410410", "rs11247857":"chr1:25944741:T:C_rs11247857"}, anno_args={"rotation":0}, anno_style="tight", fontsize=12, anno_fontsize=15, region_step=6, region_chromatin_files=["/public/group_share_data/Comp_psy/jrlin/supported_data/Roadmap/E065_15_coreMarks_mnemonics.bed.gz"], save="/public/group_share_data/zhengyan_lab/jrlin/project_data/AD-WGS/results/20241203/plot/RegionPlot.b37.rs12410410.pdf")

~/software/anaconda/anaconda3/lib/python3.9/site-packages/gwaslab/g_Sumstats.py in plot_mqq(self, build, **kwargs)
    608             build = self.meta["gwaslab"]["genome_build"]
    609 
--> 610         plot = mqqplot(self.data,
    611                        snpid=snpid,
    612                        chrom=chrom,

TypeError: mqqplot() got an unexpected keyword argument 'region_chromatin_files'

Note:
The version of "gwaslab" package is 3.5.0. The chromatin bed file was downloaded from https://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/

It seems that gwaslab is unable to recognize "region_chromatin_files" option. I'd appreciate any advice you can give me.

Thanks,
Jack Lin

@Cloufield
Copy link
Owner

Hi Jack,

Thanks for your feedback. This option is available for .plot_stacked_mqq() but not for .plot_mqq(). I should have made this clearer. Please see the example. The reason is that I plan to add more tracks to the regional plots in the future, such as fine-mapping results or other related annotations using plot_stacked_mqq().

gl.plot_stacked_mqq(objects=[gl1],
                    vcfs=[gl.get_path("1kg_eas_hg19")],
                    region=(19,46214297 - 100000, 46214297 + 100000),
                    build="19",
                    anno="SNPID",
                    fontsize=20,
                    mode="r",
                    
                    region_chromatin_files=["E098_15_coreMarks_mnemonics.bed.gz"],
                    region_chromatin_labels=["Your label"],
                    
                    cbar_fontsize=11,
                    marker_size=(50,100),
                    titles=["Male"],
                    title_args={"size":20},
                    anno_args={"rotation":0,"fontsize":12}, 
                    verbose=False, check=False)

image

@jacklin9703
Copy link
Author

Hi Dr. He,
Thanks for your quick reply! I've followed your example and achieved new progress, while several details have been raised.

  • When applying .plot_stacked_mqq() function, the option titles must be assigned explicitly, even if I don't have any title to show in the plot. That is, one should specify titles=[] to avoid the "TypeError" exception: TypeError: 'NoneType' object is not iterable. Luckily, this does not affect the whole.
  • Unfortunately, I couldn't replicate the chromatin state plot while no errors took place. The output chromatin state track remained blank. Here is my code and output plot:
In [14]: gl.plot_stacked_mqq(objects=[sumstats_AD_common_b37], 
    ...:                                   vcfs=[gl.get_path("1kg_eas_hg19")], 
    ...:                                   region=(1, 25989606-250000, 25989606+250000), 
    ...:                                   build="19", 
    ...:                                   region_ref=["rs12410410"], 
    ...:                                   region_ld_colors=["#E4E4E4","#020080","#86CEF9","#24FF02","#FDA400","#FF0000","#A034F0"], 
    ...:                                   anno=True, 
    ...:                                   anno_set=["rs12410410", "rs11247857"], 
    ...:                                   anno_alias={"rs12410410":"chr1:25989606:A:T\nrs12410410", "rs11247857":"chr1:25944741:T:C\nrs11247857"}, 
    ...:                                   fontsize=20, 
    ...:                                   mode="r", 
    ...:                                   region_chromatin_files=["E065_15_coreMarks_mnemonics.bed.gz"], 
    ...:                                   region_chromatin_labels=["E065.15\nAorta"], 
    ...:                                   cbar_fontsize=11, 
    ...:                                   titles=[""], 
    ...:                                   anno_args={"rotation":0, "fontsize":15}, 
    ...:                                   anno_style="tight", 
    ...:                                   region_anno_bbox_args={"ec":"None","fc":"None"}, 
    ...:                                   region_step=6, 
    ...:                                   save="RegionPlot.b37.rs12410410.v2.pdf")
2024/12/10 14:31:06 Start to create stacked mqq plot by iteratively calling plot_mqq:
2024/12/10 14:31:07 Start to create MQQ plot...v3.5.0:
2024/12/10 14:31:07  -Genomic coordinates version: 19...
2024/12/10 14:31:07  -Genome-wide significance level to plot is set to 5e-08 ...
2024/12/10 14:31:07  -Raw input contains 4711082 variants...
2024/12/10 14:31:07  -MQQ plot layout mode is : r
2024/12/10 14:31:07  -Region to plot : chr1:25739606-26239606.
2024/12/10 14:31:08  -Checking chromosome notations in VCF/BCF files...
2024/12/10 14:31:09  -Checking prefix for chromosomes in VCF/BCF files...
2024/12/10 14:31:09  -No prefix for chromosomes in the VCF/BCF files.
2024/12/10 14:31:13  -Extract SNPs in region : chr1:25739606-26239606...
2024/12/10 14:31:14  -Extract SNPs in specified regions: 534
2024/12/10 14:31:15 Finished loading specified columns from the sumstats.
2024/12/10 14:31:15 Start data conversion and sanity check:
2024/12/10 14:31:15  -Sanity check will be skipped.
2024/12/10 14:31:15  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
2024/12/10 14:31:15  -Sumstats P values are being converted to -log10(P)...
2024/12/10 14:31:15  -Sanity check: 0 na/inf/-inf variants will be removed...
2024/12/10 14:31:18  -Converting data above cut line...
2024/12/10 14:31:19  -Maximum -log10(P) value is 8.987871717433675 .
2024/12/10 14:31:19 Finished data conversion and sanity check.
2024/12/10 14:31:19 Start to create MQQ plot with 534 variants...
2024/12/10 14:31:19  -tabix will be used: /public/home/Comp_psy/jrlin/software/anaconda/anaconda3/bin/tabix
2024/12/10 14:31:19 Start to load reference genotype...
2024/12/10 14:31:19  -reference vcf path : /public/group_share_data/Comp_psy/jrlin/supported_data/gwaslab_reference_data/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz
2024/12/10 14:31:21  -Retrieving index...
2024/12/10 14:31:21  -Ref variants in the region: 13814
2024/12/10 14:31:21  -Matching variants using POS, NEA, EA ...
2024/12/10 14:31:24  -Reference variant ID: rs12410410 - 44682
2024/12/10 14:31:24  -Calculating Rsq...
2024/12/10 14:31:24 Finished loading reference genotype successfully!
2024/12/10 14:31:26  -Creating background plot...
2024/12/10 14:31:27  -Reference variant ID: rs12410410 - 304
2024/12/10 14:31:29  -Loading gtf files from:default
INFO:root:Extracted GTF attributes: ['gene_id', 'gene_name', 'gene_biotype']
2024/12/10 14:32:39  -plotting gene track..
2024/12/10 14:32:40  -Finished plotting gene track..
WARNING:matplotlib.font_manager:findfont: Font family 'Arial' not found.
2024/12/10 14:32:41 Finished creating MQQ plot successfully!
2024/12/10 14:32:41 Start to extract variants for annotation...
2024/12/10 14:32:42  -Found 2 specified variants to annotate...
2024/12/10 14:32:42 Finished extracting variants for annotation...
2024/12/10 14:32:42 Start to process figure arts.
2024/12/10 14:32:42  -Processing X labels...
2024/12/10 14:32:42  -Processing Y labels...
2024/12/10 14:32:42  -Processing Y tick lables...
2024/12/10 14:32:42  -Processing Y labels...
2024/12/10 14:32:42  -Processing color bar...
2024/12/10 14:32:42  -Processing lines...
2024/12/10 14:32:42 Finished processing figure arts.
2024/12/10 14:32:42 Start to annotate variants...
2024/12/10 14:32:43  -Annotating using column CHR:POS...
2024/12/10 14:32:43  -Adjusting text positions with repel_force=0.03...
2024/12/10 14:32:43  -Auto-adjusting text positions...
WARNING:matplotlib.font_manager:findfont: Font family 'Arial' not found.
2024/12/10 14:32:43 Finished annotating variants.
2024/12/10 14:32:43 Start to save figure...
2024/12/10 14:32:43  -Skip saving figure!
2024/12/10 14:32:43 Finished saving figure...
2024/12/10 14:32:46  -Loading : /public/group_share_data/Comp_psy/jrlin/supported_data/Roadmap/E065_15_coreMarks_mnemonics.bed.gz
2024/12/10 14:32:58   -Number of records in specified region: 114
2024/12/10 14:32:58 Start to save figure...
WARNING:matplotlib.font_manager:findfont: Font family 'Arial' not found.
2024/12/10 14:33:00  -Saved to /public/group_share_data/zhengyan_lab/jrlin/project_data/AD-WGS/results/20241203/plot/RegionPlot.b37.rs12410410.v2.pdf successfully! (pdf, overwrite)
2024/12/10 14:33:00 Finished saving figure...
2024/12/10 14:33:00 Finished creating stacked mqq plot by iteratively calling plot_mqq.
Out[14]: (<Figure size 3200x2400 with 5 Axes>, <gwaslab.g_Log.Log at 0x7fb0ee9e30a0>)

The chromatin state file is downloaded from the website mentioned above, and the top lines are:

zcat E065_15_coreMarks_mnemonics.bed.gz | head -5
chr10	0	119600	15_Quies
chr10	119600	120200	1_TssA
chr10	120200	122000	5_TxWk
chr10	122000	122600	1_TssA
chr10	122600	140800	15_Quies

The output plot shows below, with a blank chromatin state track.
1733815208159

Could you please help me find the problem? Thank you very much!

Sincerely,
Jack Lin

@Cloufield
Copy link
Owner

Hi Jack,
I tested your code with sumstats but couldn't replicate the problem.
Would you please let me know your python version and matplotlib version? Thanks!

image

@jacklin9703
Copy link
Author

Hi Dr. He,
The Python version is 3.9.21 (running on IPython platform 8.12.0).
The matplotlib version is 3.8.4.

Thanks!

@jacklin9703
Copy link
Author

Hi Dr.He,

Sorry to bother you again.
I'm looking forward to hearing back from you about the "blank chromatin track" problem, while the same problem still occurred on my server until now. If the crux lies in the versions of Python and matplotlib packages, could you please share your package versions that can run successfully? I'll try to use the correct packages to resolve the problem.
Thank you very much!

Sincerely,
Jack Lin

@Cloufield
Copy link
Owner

Hi Jack,

Sorry for my late reply.
I finally located the problem. This only happens when you save the figure as pdf. If you save it as png, it works well.
It might take some time to figure out how to fix it. I will let you know when I fix this. (probably within this week)

@jacklin9703
Copy link
Author

Hi Dr. He,

Your reply raises me! Looking forward to the final good news!
Thanks!

Sincerely,
Jack Lin

@Cloufield
Copy link
Owner

Hi Jack,

Sorry for my late reply.
This has already been fixed in the latest version.

Cheers!

@Cloufield Cloufield added the bug Something isn't working label Jan 2, 2025
@jacklin9703
Copy link
Author

Thank you very much, He!
Sorry for my absence for one week. The final version of gwaslab is elegant, and the tool returns the figures that I need.

Cheers!
Jack Lin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants