The contents of the quant.genes.sf file and the quant.sf file are identical. #962

happypiggyzjx · 2024-09-23T07:36:44Z

Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?
salmon (bulk mode)
Describe the bug
The contents of the quant.genes.sf file and the quant.sf file are identical.

The file quant.genes.sf should have been the results of genes quantification, but now it is all transcript quantification. I.e., the file quant.genes.sf is exactly the same as quant.sf (only the transcript names corresponding to the Name column are in a different order).

To Reproduce

Which version of salmon was used?
1.10.3
How was salmon installed (compiled, downloaded executable, through bioconda)?
bioconda
Which reference (e.g. transcriptome) was used?
/hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.109.gtf
/hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.dna.primary_assembly.fa
/hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.cdna.all.fa
My options ：
generate decoys.txt:
grep "^>" < /hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.dna.primary_assembly.fa | cut -d " " -f 1 >
/cold_data/zhaojiaxin/ensembl/salmon/decoys.txt
sed -i.bak -e 's/>//g' /cold_data/zhaojiaxin/ensembl/salmon/decoys.txt
cat /hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.cdna.all.fa
/hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.dna.primary_assembly.fa >
/cold_data/zhaojiaxin/ensembl/salmon/salmon_gentrome.fa
salmon index
/home/zhaojiaxin/anaconda3/envs/salmon/bin/salmon index
--transcripts /cold_data/zhaojiaxin/ensembl/salmon/salmon_gentrome.fa
--kmerLen 31
--index /cold_data/zhaojiaxin/ensembl/salmon/transcripts_index
--decoy /cold_data/zhaojiaxin/ensembl/salmon/decoys.txt
--keepDuplicates
--threads 50

*/home/zhaojiaxin/anaconda3/envs/salmon/bin/salmon quant
--libType A
--index /cold_data/zhaojiaxin/ensembl/salmon/transcripts_index
--unmatedReads /cold_data/zhaojiaxin/long_read/GRM/upstream-analysis/trim_polya/GRM/ONT_GRM_R01/ONT_GRM_R01.full_length.trim_polyA.filter.fasta
--output /cold_data/zhaojiaxin/ensembl/salmon/transcripts_quant
--seqBias
--gcBias
--posBias
--geneMap /hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.109.gtf
--auxDir aux_info
--incompatPrior 0
--threads 50

Expected behavior
I want the genes quantization results to appear normally in the file quant.genes.sf.

Desktop (please complete the following information):

OS : Ubuntu Linux
Version
uname -a：
Linux 1302ubuntu 6.8.0-40-generic make build more reproducible #40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:30:19 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a：
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy

Additional context
I've already found a similar question, but it didn't solve my problem very well, the link to this question is as follows: #569. As I understand it, one of the two questioners solved the problem by updating the salmon version, the other by tximport solved the problem. But first of all my salmon version is the latest 1.10.3, and secondly, I understand that tximport is an R package that handles salmon output files, which may not solve my problem. Looking forward to your professional reply and help!

freekvh · 2025-02-26T13:15:42Z

I just stumbled across this question and it prompted met to check, fwiw, I use the same Salmon version but my files are different:

$ head quant.genes.sf
Name    Length  EffectiveLength TPM     NumReads
ENSG00000210194.1       69      2.865   308.915 5
ENSG00000210191.1       71      2.9     0       0
ENSG00000210184.1       59      2.702   0       0
ENSG00000210176.1       69      2.865   0       0
ENSG00000198886.2       1378    1128    2093.56 13339.3
ENSG00000212907.2       297     48.689  1581.82 435.04
ENSG00000210174.1       65      2.798   0       0
ENSG00000210164.1       68      2.848   0       0
ENSG00000198938.2       784     534     8351.98 25192.4

$ head quant.sf
Name    Length  EffectiveLength TPM     NumReads
ENST00000456328.2       1657    1407.000        0.000000        0.000
ENST00000450305.2       632     382.000 0.000000        0.000
ENST00000488147.2       1380    1130.000        0.000000        0.000
ENST00000619216.1       68      2.848   0.000000        0.000
ENST00000473358.1       712     462.000 0.000000        0.000
ENST00000469289.1       535     285.000 0.000000        0.000
ENST00000607096.1       138     4.663   0.000000        0.000
ENST00000417324.1       1187    937.000 0.000000        0.000
ENST00000461467.1       590     340.000 0.000000        0.000

My GTF file looks like this:

$ head gencode.v46.annotation.gtf
##description: evidence-based annotation of the human genome (GRCh38), version 46 (Ensembl 112)
##provider: GENCODE
##contact: [email protected]
##format: gtf
##date: 2024-03-26
chr1    HAVANA  gene    11869   14409   .       +       .       gene_id "ENSG00000290825.1"; gene_type "lncRNA"; gene_name "DDX11L2"; level 2; tag "overlaps_pseudogene";
chr1    HAVANA  transcript      11869   14409   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";
chr1    HAVANA  exon    11869   12227   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";
chr1    HAVANA  exon    12613   12721   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";
chr1    HAVANA  exon    13221   14409   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";

At a quick glance I generate my indexes the same way as you did (no GTF file used when making them only in the quant step).

Maybe it helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The contents of the quant.genes.sf file and the quant.sf file are identical. #962

The contents of the quant.genes.sf file and the quant.sf file are identical. #962

happypiggyzjx commented Sep 23, 2024

freekvh commented Feb 26, 2025 •

edited

Loading

The contents of the quant.genes.sf file and the quant.sf file are identical. #962

The contents of the quant.genes.sf file and the quant.sf file are identical. #962

Comments

happypiggyzjx commented Sep 23, 2024

freekvh commented Feb 26, 2025 • edited Loading

freekvh commented Feb 26, 2025 •

edited

Loading