You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been trying to use ExOrthist and although I have tried several things, talking to @fedemantica included, I am still unable to make it run.
Although the title (or the error) of this issue is almost the same as the one that I opened a month ago, my current problem is different. Basically, in that previous issue I was having problems with the testing data due that I did not remove the brackets in the command. This problem was solved and I managed to run the testing data without any problems.
However, now that I am trying to use the whole genome and annotations (not subsetted), I am getting the same error (and I am writing the code properly).
executor > local (14)
[d8/0c4443] process > check_input (1) [100%] 1 of 1 ✔
[cf/4db79c] process > generate_annotations (hg38) [100%] 2 of 2 ✔
[2c/2c53dd] process > split_clusters_by_species_p... [100%] 1 of 1 ✔
[bc/450e28] process > split_clusters_in_chunks (h... [100%] 1 of 1 ✔
[59/860fab] process > parse_IPA_prot_aln (hg38-mm... [100%] 1 of 1 ✔
[02/61c530] process > split_EX_pairs_to_realign (1) [100%] 1 of 1 ✔
[83/3e953e] process > realign_EX_pairs (1) [100%] 1 of 1 ✔
[24/c2b623] process > merge_PROT_EX_INT_aln_info ... [100%] 1 of 1 ✔
[2d/a65731] process > score_EX_matches (hg38-mm10) [100%] 1 of 1 ✔
[86/669d1b] process > filter_and_select_best_EX_m... [100%] 1 of 1 ✔
[11/faabcb] process > join_filtered_EX_matches [100%] 1 of 1 ✔
[87/831d4a] process > collapse_overlapping_matches [100%] 1 of 1 ✔
[8d/174a47] process > format_EX_clusters_input [ 0%] 0 of 1
[- ] process > cluster_EXs -
[- ] process > format_EX_clusters_output -
[- ] process > recluster_genes_by_species_... -
[- ] process > recluster_EXs_by_species_pair -
Error executing process > 'format_EX_clusters_input'
Caused by:
Missing output file(s) `PART_*-cluster_input.tab` expected by process `format_EX_clusters_input`
Command executed:
if [ `echo mm10_hg38_v100_fromBroccoli.tab | grep ".gz"` ]; then
zcat mm10_hg38_v100_fromBroccoli.tab > cluster_file
D1_format_EX_clusters_input.pl cluster_file filtered_best_scored_EX_matches_by_targetgene-NoOverlap.tab 500
rm cluster_file
else
D1_format_EX_clusters_input.pl mm10_hg38_v100_fromBroccoli.tab filtered_best_scored_EX_matches_by_targetgene-NoOverlap.tab 500
fi
Command exit status:
0
Command output:
(empty)
Command error:
INFO: Convert SIF file to sandbox...
Number of parts: 0
INFO: Cleaning up image...
Work dir:
/mnt/lustre/scratch/nlsas/home/usc/gr/eer/Tools/ExOrthist/work/8d/174a47b714b384f3d6036eb6b93cd6
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
Failed to invoke `workflow.onComplete` event handler
-- Check script 'main.nf' at line: 676 or see '.nextflow.log' file for more details
--- Pipeline BIOCORE@CRG ExOrthist ---
Started at 2023-04-13T14:26:58.654647070+02:00
Finished at 2023-04-13T14:30:42.002558998+02:00
Time elapsed: 3m 43s
Execution status: failed
executor > local (14)
[d8/0c4443] process > check_input (1) [100%] 1 of 1 ✔
[cf/4db79c] process > generate_annotations (hg38) [100%] 2 of 2 ✔
[2c/2c53dd] process > split_clusters_by_species_p... [100%] 1 of 1 ✔
[bc/450e28] process > split_clusters_in_chunks (h... [100%] 1 of 1 ✔
[59/860fab] process > parse_IPA_prot_aln (hg38-mm... [100%] 1 of 1 ✔
[02/61c530] process > split_EX_pairs_to_realign (1) [100%] 1 of 1 ✔
[83/3e953e] process > realign_EX_pairs (1) [100%] 1 of 1 ✔
[24/c2b623] process > merge_PROT_EX_INT_aln_info ... [100%] 1 of 1 ✔
[2d/a65731] process > score_EX_matches (hg38-mm10) [100%] 1 of 1 ✔
[86/669d1b] process > filter_and_select_best_EX_m... [100%] 1 of 1 ✔
[11/faabcb] process > join_filtered_EX_matches [100%] 1 of 1 ✔
[87/831d4a] process > collapse_overlapping_matches [100%] 1 of 1 ✔
[8d/174a47] process > format_EX_clusters_input [100%] 1 of 1, failed: 1 ✘
[- ] process > cluster_EXs -
[- ] process > format_EX_clusters_output -
[- ] process > recluster_genes_by_species_... -
[- ] process > recluster_EXs_by_species_pair -
Error executing process > 'format_EX_clusters_input'
Caused by:
Missing output file(s) `PART_*-cluster_input.tab` expected by process `format_EX_clusters_input`
Command executed:
if [ `echo mm10_hg38_v100_fromBroccoli.tab | grep ".gz"` ]; then
zcat mm10_hg38_v100_fromBroccoli.tab > cluster_file
D1_format_EX_clusters_input.pl cluster_file filtered_best_scored_EX_matches_by_targetgene-NoOverlap.tab 500
rm cluster_file
else
D1_format_EX_clusters_input.pl mm10_hg38_v100_fromBroccoli.tab filtered_best_scored_EX_matches_by_targetgene-NoOverlap.tab 500
fi
Command exit status:
0
Command output:
(empty)
Command error:
INFO: Convert SIF file to sandbox...
Number of parts: 0
INFO: Cleaning up image...
Work dir:
/mnt/lustre/scratch/nlsas/home/usc/gr/eer/Tools/ExOrthist/work/8d/174a47b714b384f3d6036eb6b93cd6
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
module load singularity
singularity/3.6.3 unloaded
go/1.17.8 loaded
singularity/3.9.7 loaded
The following have been reloaded with a version change:
1) singularity/3.6.3 => singularity/3.9.7
And you would wonder, why did you load both? At the beginning I was just loading nextflow, but just in case, I tried to load singularity after and now I was using both.
After having talked with @fedemantica, I also tried to load mafft ( module unload cesga/2020 gcccore/system mafft/7.475-with-extensions) just in case was the problem... but it seems that it is not.
This is how the "params.config" file looks:
Note that I also tried to use NXF_VER=20.04.1 as I did for the testing data (that works): NXF_VER=20.04.1 nextflow run main.nf -with-singularity > test_log.txt
but it gives me the same error.
Also note that:
a) I am using the Ensembl release 100 for both (genome and GTF), from here:
GENOME:
Having those two files I went to R, I load them and do the following things. Remove orthogroups containing more than 20 genes, merge my orthogroups with the info file from the fasta… I kept only the genes that are protein_coding, I remove the duplicated ones (because there could be several PEP for the same gene, I do some formatting to remove the version of the gene id…) and I generated the file mm10_hg38_v100_fromBroccoli.tab with the following format:
I do not know how to continue and be able to start working with the tool. Could anybody help me, please?
Sorry if I posted too much information, but I wanted to give you the enough information to solve what it is going on. Of course, if you need to check any particular file or you need more information, let me know. What is more, if you prefer to meet through a vide ocall, feel free to contact me and we can set it up.
Thanks very much in advance
Kind Regards,
Eva
The text was updated successfully, but these errors were encountered:
I just had the same error and fixed it by creating separate conda environment with fresh mafft installation (I also needed to install hashmap R package from github).
Did you check if you don't have empty alignments in the step "parse_IPA_prot_aln". This in my case resulted in the same error as yours in the step "format_EX_clusters_input". I needed to go stepwise back and all the output files were empty (headers only)
Hello,
I have been trying to use
ExOrthist
and although I have tried several things, talking to @fedemantica included, I am still unable to make it run.Although the title (or the error) of this issue is almost the same as the one that I opened a month ago, my current problem is different. Basically, in that previous issue I was having problems with the testing data due that I did not remove the brackets in the command. This problem was solved and I managed to run the testing data without any problems.
However, now that I am trying to use the whole genome and annotations (not subsetted), I am getting the same error (and I am writing the code properly).
Find the hidden log attached (.nextflow.log).
hidden.nextflow.log
This is the script that I am using:
Note that I am working on a cluster (slurm environment) and when I load:
And you would wonder, why did you load both? At the beginning I was just loading
nextflow
, but just in case, I tried to loadsingularity
after and now I was using both.After having talked with @fedemantica, I also tried to load
mafft
(module unload cesga/2020 gcccore/system mafft/7.475-with-extensions
) just in case was the problem... but it seems that it is not.This is how the "params.config" file looks:
Note that I also tried to use
NXF_VER=20.04.1
as I did for the testing data (that works):NXF_VER=20.04.1 nextflow run main.nf -with-singularity > test_log.txt
but it gives me the same error.
Also note that:
a) I am using the Ensembl release 100 for both (genome and GTF), from here:
GENOME:
(both were renamed to hg38_gDNA.fasta.gz & mm10_gDNA.fasta.gz)
Just in case you want to know the format of each one (to compare with the GTF file):
GTF:
(both were renamed to hg38_annot.gtf.gz & mm10_annot.gtf.gz)
Format:
b) The gene orthogroups file was generated with
Broccoli
in the following way:Run broccoli with the proteome FASTA (Ensembl version 100):
HUMAN: https://ftp.ensembl.org/pub/release-100/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz
MOUSE: https://ftp.ensembl.org/pub/release-100/fasta/mus_musculus/pep/Mus_musculus.GRCm38.pep.all.fa.gz
I downloaded the file orthologous_groups.txt from dir3.
I created a txt file getting some information from the proteome fasta files --> to get the PEP, GeneID, Species and Biotype like this:
I do not know how to continue and be able to start working with the tool. Could anybody help me, please?
Sorry if I posted too much information, but I wanted to give you the enough information to solve what it is going on. Of course, if you need to check any particular file or you need more information, let me know. What is more, if you prefer to meet through a vide ocall, feel free to contact me and we can set it up.
Thanks very much in advance
Kind Regards,
Eva
The text was updated successfully, but these errors were encountered: