run isoquant on several samples #180

linalu1121 · 2024-05-03T21:43:38Z

Hi,
When I run 10 samples separately, the percentage of novel isoforms with CAGE support are like 50% in each samples.
But, I want to generate one GTF file. So, I run 10 samples together. The results are strange to me. The percentage of novel isoforms with CAGE support decreased a lot, to 30%.
I have no idea about it. Do you have any suggestion?
When I input 10 bam files together, isoquant precessed them as 1 sample, and reported that sample has 10 BAM files.

Thanks.
Lina

andrewprzh · 2024-05-07T23:00:07Z

Dear @linalu1121

How do you measure CAGE support?

Could you send me some logs from your runs, both individual and joint?

Best
Andrey

linalu1121 · 2024-05-07T23:23:28Z

Hi Andrey,

Thank you for replying.

I use the transcript_models.gtf as input to SQANTI3 to calculate the CAGE support.

I ran 10 samples together, then I obtained the following log:
isoquant.py -d nanopore --bam_list bam_file_list.txt --read_group tag:CB --genedb genes.gtf --complete_genedb --reference genome.fa --output allsamples --prefix allsamples --threads 20 --clean_start
2024-04-19 14:21:20,860 - INFO - Running IsoQuant version 3.3.1
2024-04-19 14:21:33,937 - INFO - Novel unspliced transcripts will not be reported, set --report_novel_unspliced true to discover them
2024-04-19 14:21:33,937 - INFO - === IsoQuant pipeline started ===
2024-04-19 14:21:33,938 - INFO - Converting gene annotation file to .db format (takes a while)...
2024-04-19 14:29:08,993 - INFO - Gene database written to /allsamples/genes.db
2024-04-19 14:29:08,993 - INFO - Provide this database next time to avoid excessive conversion
2024-04-19 14:29:08,994 - INFO - Loading gene database from /allsamples/genes.db
2024-04-19 14:29:08,994 - INFO - Loading reference genome from /all/genome.fa
2024-04-19 14:29:08,996 - INFO - Processing 1 sample
2024-04-19 14:29:08,996 - INFO - Processing sample allsamples
2024-04-19 14:29:08,996 - INFO - Sample has 10 BAM files:
sample1.bam, sample2.bam, sample3.bam, sample4.bam, sample5.bam, sample6.bam, sample7.bam, sample8.bam, sample9.bam, sample10.bam

Actually, these 10 BAM files represent 10 different samples. However, I aim to generate a single GIF file. Thus, I executed the analysis for all 10 samples concurrently.

And below is the code for processing each sample individually: isoquant.py -d nanopore --bam sample1.bam --read_group tag:CB --genedb genes.gtf --complete_genedb --reference genome.fa --output sample1 --prefix sample1 --threads 20 --clean_start

andrewprzh · 2024-05-07T23:50:17Z

@linalu1121

Yes, to get a single GTF it makes sense to provide all BAMs together, so everything is correct in this part. It doesn't really matter that the log says that a sample has 10 BAM files.

Could you send me the entire log files? I'm more interested in the statistics at the end of the log, with respect to discovered transcripts.

Best
Andrey

andrewprzh · 2024-08-03T11:14:56Z

New IsoQuant 3.5 should be far more optimal in terms of RAM consumption, especially when using multiple samples.

I'll close this issue for now, please, re-open if needed.

linalu1121 changed the title ~~when combine all samples run is-quant~~ run isoquant on several samples May 3, 2024

andrewprzh added question Further information is requested weird results Something looks odd in the resulting files labels May 7, 2024

andrewprzh closed this as completed Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run isoquant on several samples #180

run isoquant on several samples #180

linalu1121 commented May 3, 2024

andrewprzh commented May 7, 2024

linalu1121 commented May 7, 2024

andrewprzh commented May 7, 2024

andrewprzh commented Aug 3, 2024

run isoquant on several samples #180

run isoquant on several samples #180

Comments

linalu1121 commented May 3, 2024

andrewprzh commented May 7, 2024

linalu1121 commented May 7, 2024

andrewprzh commented May 7, 2024

andrewprzh commented Aug 3, 2024