-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The contents of the quant.genes.sf file and the quant.sf file are identical. #962
Comments
I just stumbled across this question and it prompted met to check, fwiw, I use the same Salmon version but my files are different: $ head quant.genes.sf
Name Length EffectiveLength TPM NumReads
ENSG00000210194.1 69 2.865 308.915 5
ENSG00000210191.1 71 2.9 0 0
ENSG00000210184.1 59 2.702 0 0
ENSG00000210176.1 69 2.865 0 0
ENSG00000198886.2 1378 1128 2093.56 13339.3
ENSG00000212907.2 297 48.689 1581.82 435.04
ENSG00000210174.1 65 2.798 0 0
ENSG00000210164.1 68 2.848 0 0
ENSG00000198938.2 784 534 8351.98 25192.4
$ head quant.sf
Name Length EffectiveLength TPM NumReads
ENST00000456328.2 1657 1407.000 0.000000 0.000
ENST00000450305.2 632 382.000 0.000000 0.000
ENST00000488147.2 1380 1130.000 0.000000 0.000
ENST00000619216.1 68 2.848 0.000000 0.000
ENST00000473358.1 712 462.000 0.000000 0.000
ENST00000469289.1 535 285.000 0.000000 0.000
ENST00000607096.1 138 4.663 0.000000 0.000
ENST00000417324.1 1187 937.000 0.000000 0.000
ENST00000461467.1 590 340.000 0.000000 0.000 My GTF file looks like this: $ head gencode.v46.annotation.gtf
##description: evidence-based annotation of the human genome (GRCh38), version 46 (Ensembl 112)
##provider: GENCODE
##contact: [email protected]
##format: gtf
##date: 2024-03-26
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000290825.1"; gene_type "lncRNA"; gene_name "DDX11L2"; level 2; tag "overlaps_pseudogene";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 13221 14409 . + . gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1"; At a quick glance I generate my indexes the same way as you did (no GTF file used when making them only in the quant step). Maybe it helps. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?
salmon (bulk mode)
Describe the bug
The contents of the quant.genes.sf file and the quant.sf file are identical.
The file quant.genes.sf should have been the results of genes quantification, but now it is all transcript quantification. I.e., the file quant.genes.sf is exactly the same as quant.sf (only the transcript names corresponding to the Name column are in a different order).
To Reproduce
Which version of salmon was used?
1.10.3
How was salmon installed (compiled, downloaded executable, through bioconda)?
bioconda
Which reference (e.g. transcriptome) was used?
/hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.109.gtf
/hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.dna.primary_assembly.fa
/hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.cdna.all.fa
My options :
generate decoys.txt:
grep "^>" < /hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.dna.primary_assembly.fa | cut -d " " -f 1 >
/cold_data/zhaojiaxin/ensembl/salmon/decoys.txt
sed -i.bak -e 's/>//g' /cold_data/zhaojiaxin/ensembl/salmon/decoys.txt
cat /hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.cdna.all.fa
/hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.dna.primary_assembly.fa >
/cold_data/zhaojiaxin/ensembl/salmon/salmon_gentrome.fa
salmon index
/home/zhaojiaxin/anaconda3/envs/salmon/bin/salmon index
--transcripts /cold_data/zhaojiaxin/ensembl/salmon/salmon_gentrome.fa
--kmerLen 31
--index /cold_data/zhaojiaxin/ensembl/salmon/transcripts_index
--decoy /cold_data/zhaojiaxin/ensembl/salmon/decoys.txt
--keepDuplicates
--threads 50
*/home/zhaojiaxin/anaconda3/envs/salmon/bin/salmon quant
--libType A
--index /cold_data/zhaojiaxin/ensembl/salmon/transcripts_index
--unmatedReads /cold_data/zhaojiaxin/long_read/GRM/upstream-analysis/trim_polya/GRM/ONT_GRM_R01/ONT_GRM_R01.full_length.trim_polyA.filter.fasta
--output /cold_data/zhaojiaxin/ensembl/salmon/transcripts_quant
--seqBias
--gcBias
--posBias
--geneMap /hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.109.gtf
--auxDir aux_info
--incompatPrior 0
--threads 50
Expected behavior
I want the genes quantization results to appear normally in the file quant.genes.sf.
Desktop (please complete the following information):
uname -a:
Linux 1302ubuntu 6.8.0-40-generic make build more reproducible #40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:30:19 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
lsb_release -a:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy
Additional context
I've already found a similar question, but it didn't solve my problem very well, the link to this question is as follows: #569. As I understand it, one of the two questioners solved the problem by updating the salmon version, the other by tximport solved the problem. But first of all my salmon version is the latest 1.10.3, and secondly, I understand that tximport is an R package that handles salmon output files, which may not solve my problem. Looking forward to your professional reply and help!
The text was updated successfully, but these errors were encountered: