Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The contents of the quant.genes.sf file and the quant.sf file are identical. #962

Open
happypiggyzjx opened this issue Sep 23, 2024 · 1 comment

Comments

@happypiggyzjx
Copy link

Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?
salmon (bulk mode)
Describe the bug
The contents of the quant.genes.sf file and the quant.sf file are identical.

The file quant.genes.sf should have been the results of genes quantification, but now it is all transcript quantification. I.e., the file quant.genes.sf is exactly the same as quant.sf (only the transcript names corresponding to the Name column are in a different order).

To Reproduce

  • Which version of salmon was used?

  • 1.10.3

  • How was salmon installed (compiled, downloaded executable, through bioconda)?

  • bioconda

  • Which reference (e.g. transcriptome) was used?
    /hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.109.gtf
    /hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.dna.primary_assembly.fa
    /hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.cdna.all.fa

  • My options :

  • generate decoys.txt:
    grep "^>" < /hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.dna.primary_assembly.fa | cut -d " " -f 1 >
    /cold_data/zhaojiaxin/ensembl/salmon/decoys.txt
    sed -i.bak -e 's/>//g' /cold_data/zhaojiaxin/ensembl/salmon/decoys.txt
    cat /hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.cdna.all.fa
    /hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.dna.primary_assembly.fa >
    /cold_data/zhaojiaxin/ensembl/salmon/salmon_gentrome.fa

  • salmon index
    /home/zhaojiaxin/anaconda3/envs/salmon/bin/salmon index
    --transcripts /cold_data/zhaojiaxin/ensembl/salmon/salmon_gentrome.fa
    --kmerLen 31
    --index /cold_data/zhaojiaxin/ensembl/salmon/transcripts_index
    --decoy /cold_data/zhaojiaxin/ensembl/salmon/decoys.txt
    --keepDuplicates
    --threads 50

*/home/zhaojiaxin/anaconda3/envs/salmon/bin/salmon quant
--libType A
--index /cold_data/zhaojiaxin/ensembl/salmon/transcripts_index
--unmatedReads /cold_data/zhaojiaxin/long_read/GRM/upstream-analysis/trim_polya/GRM/ONT_GRM_R01/ONT_GRM_R01.full_length.trim_polyA.filter.fasta
--output /cold_data/zhaojiaxin/ensembl/salmon/transcripts_quant
--seqBias
--gcBias
--posBias
--geneMap /hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.109.gtf
--auxDir aux_info
--incompatPrior 0
--threads 50

Expected behavior
I want the genes quantization results to appear normally in the file quant.genes.sf.

Desktop (please complete the following information):

  • OS : Ubuntu Linux
  • Version
    uname -a:
    Linux 1302ubuntu 6.8.0-40-generic make build more reproducible #40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:30:19 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy

Additional context
I've already found a similar question, but it didn't solve my problem very well, the link to this question is as follows: #569. As I understand it, one of the two questioners solved the problem by updating the salmon version, the other by tximport solved the problem. But first of all my salmon version is the latest 1.10.3, and secondly, I understand that tximport is an R package that handles salmon output files, which may not solve my problem. Looking forward to your professional reply and help!

@freekvh
Copy link

freekvh commented Feb 26, 2025

I just stumbled across this question and it prompted met to check, fwiw, I use the same Salmon version but my files are different:

$ head quant.genes.sf
Name    Length  EffectiveLength TPM     NumReads
ENSG00000210194.1       69      2.865   308.915 5
ENSG00000210191.1       71      2.9     0       0
ENSG00000210184.1       59      2.702   0       0
ENSG00000210176.1       69      2.865   0       0
ENSG00000198886.2       1378    1128    2093.56 13339.3
ENSG00000212907.2       297     48.689  1581.82 435.04
ENSG00000210174.1       65      2.798   0       0
ENSG00000210164.1       68      2.848   0       0
ENSG00000198938.2       784     534     8351.98 25192.4

$ head quant.sf
Name    Length  EffectiveLength TPM     NumReads
ENST00000456328.2       1657    1407.000        0.000000        0.000
ENST00000450305.2       632     382.000 0.000000        0.000
ENST00000488147.2       1380    1130.000        0.000000        0.000
ENST00000619216.1       68      2.848   0.000000        0.000
ENST00000473358.1       712     462.000 0.000000        0.000
ENST00000469289.1       535     285.000 0.000000        0.000
ENST00000607096.1       138     4.663   0.000000        0.000
ENST00000417324.1       1187    937.000 0.000000        0.000
ENST00000461467.1       590     340.000 0.000000        0.000

My GTF file looks like this:

$ head gencode.v46.annotation.gtf
##description: evidence-based annotation of the human genome (GRCh38), version 46 (Ensembl 112)
##provider: GENCODE
##contact: [email protected]
##format: gtf
##date: 2024-03-26
chr1    HAVANA  gene    11869   14409   .       +       .       gene_id "ENSG00000290825.1"; gene_type "lncRNA"; gene_name "DDX11L2"; level 2; tag "overlaps_pseudogene";
chr1    HAVANA  transcript      11869   14409   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";
chr1    HAVANA  exon    11869   12227   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";
chr1    HAVANA  exon    12613   12721   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";
chr1    HAVANA  exon    13221   14409   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";

At a quick glance I generate my indexes the same way as you did (no GTF file used when making them only in the quant step).

Maybe it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants