Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run isoquant on several samples #180

Closed
linalu1121 opened this issue May 3, 2024 · 4 comments
Closed

run isoquant on several samples #180

linalu1121 opened this issue May 3, 2024 · 4 comments
Labels
question Further information is requested weird results Something looks odd in the resulting files

Comments

@linalu1121
Copy link

Hi,
When I run 10 samples separately, the percentage of novel isoforms with CAGE support are like 50% in each samples.
But, I want to generate one GTF file. So, I run 10 samples together. The results are strange to me. The percentage of novel isoforms with CAGE support decreased a lot, to 30%.
I have no idea about it. Do you have any suggestion?
When I input 10 bam files together, isoquant precessed them as 1 sample, and reported that sample has 10 BAM files.

Thanks.
Lina

@linalu1121 linalu1121 changed the title when combine all samples run is-quant run isoquant on several samples May 3, 2024
@andrewprzh
Copy link
Collaborator

Dear @linalu1121

How do you measure CAGE support?

Could you send me some logs from your runs, both individual and joint?

Best
Andrey

@andrewprzh andrewprzh added question Further information is requested weird results Something looks odd in the resulting files labels May 7, 2024
@linalu1121
Copy link
Author

Hi Andrey,

Thank you for replying.

I use the transcript_models.gtf as input to SQANTI3 to calculate the CAGE support.

I ran 10 samples together, then I obtained the following log:
isoquant.py -d nanopore --bam_list bam_file_list.txt --read_group tag:CB --genedb genes.gtf --complete_genedb --reference genome.fa --output allsamples --prefix allsamples --threads 20 --clean_start
2024-04-19 14:21:20,860 - INFO - Running IsoQuant version 3.3.1
2024-04-19 14:21:33,937 - INFO - Novel unspliced transcripts will not be reported, set --report_novel_unspliced true to discover them
2024-04-19 14:21:33,937 - INFO - === IsoQuant pipeline started ===
2024-04-19 14:21:33,938 - INFO - Converting gene annotation file to .db format (takes a while)...
2024-04-19 14:29:08,993 - INFO - Gene database written to /allsamples/genes.db
2024-04-19 14:29:08,993 - INFO - Provide this database next time to avoid excessive conversion
2024-04-19 14:29:08,994 - INFO - Loading gene database from /allsamples/genes.db
2024-04-19 14:29:08,994 - INFO - Loading reference genome from /all/genome.fa
2024-04-19 14:29:08,996 - INFO - Processing 1 sample
2024-04-19 14:29:08,996 - INFO - Processing sample allsamples
2024-04-19 14:29:08,996 - INFO - Sample has 10 BAM files:
sample1.bam, sample2.bam, sample3.bam, sample4.bam, sample5.bam, sample6.bam, sample7.bam, sample8.bam, sample9.bam, sample10.bam

Actually, these 10 BAM files represent 10 different samples. However, I aim to generate a single GIF file. Thus, I executed the analysis for all 10 samples concurrently.

And below is the code for processing each sample individually: isoquant.py -d nanopore --bam sample1.bam --read_group tag:CB --genedb genes.gtf --complete_genedb --reference genome.fa --output sample1 --prefix sample1 --threads 20 --clean_start

@andrewprzh
Copy link
Collaborator

@linalu1121

Yes, to get a single GTF it makes sense to provide all BAMs together, so everything is correct in this part. It doesn't really matter that the log says that a sample has 10 BAM files.

Could you send me the entire log files? I'm more interested in the statistics at the end of the log, with respect to discovered transcripts.

Best
Andrey

@andrewprzh
Copy link
Collaborator

New IsoQuant 3.5 should be far more optimal in terms of RAM consumption, especially when using multiple samples.

I'll close this issue for now, please, re-open if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested weird results Something looks odd in the resulting files
Projects
None yet
Development

No branches or pull requests

2 participants