Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PE Data on HPI clusters #589

Open
PAguado-Ramsay opened this issue Jan 30, 2025 · 1 comment
Open

PE Data on HPI clusters #589

PAguado-Ramsay opened this issue Jan 30, 2025 · 1 comment

Comments

@PAguado-Ramsay
Copy link

Hi,

Im trying to run ipyrad on my computer with a reference genome and pair end gbs data:

------- ipyrad params file (v.0.9.103)------------------------------------------
prueba-lewinskya ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
/media/pablichu/DiscuDuroPablo2022/PRUEBA_IPYRAD_HYBSEQ/Lewinskya ## [1] [project_dir]: Project dir (made in curdir if not present)
## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files
/media/pablichu/DiscuDuroPablo2022/PRUEBA_IPYRAD_HYBSEQ/Lewinskya/lewinskya-barcodes.txt ## [3] [barcodes_path]: Location of barcodes file
/media/pablichu/DiscuDuroPablo2022/PRUEBA_IPYRAD_HYBSEQ/Lewinskya/trimmed_seqs/*.fastq.gz ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files
reference ## [5] [assembly_method]: Assembly method (denovo, reference)
/media/pablichu/DiscuDuroPablo2022/PRUEBA_IPYRAD_HYBSEQ/Lewinskya/ena_PRJEB71549_sequence.fasta ## [6] [reference_sequence]: Location of reference sequence file
pairgbs ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
TGCAG, ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)
5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
6 ## [11] [mindepth_statistical]: Min depth for statistical base calling
6 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
10000 ## [13] [maxdepth]: Max cluster depth within samples
0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly
0 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes
2 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter)
35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim
1 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
0.05 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus
0.05 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus
4 ## [21] [min_samples_locus]: Min # samples per locus for output
0.2 ## [22] [max_SNPs_locus]: Max # SNPs per locus
8 ## [23] [max_Indels_locus]: Max # of indels per locus
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus
0, 0, 0, 0 ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs)
0, 0, 0, 0 ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)

  •                                                                                           ## [27] [output_formats]: Output formats (see docs)
                                                                                              ## [28] [pop_assign_file]: Path to population assignment file
                                                                                              ## [29] [reference_as_filter]: Reads mapped to this reference are removed in step 3
    

The most common error I get (sometimes I get it, others not?) is that the program shuts down in the middle of Step 3. I think it might be space or ram? Some kind of feedback would be nice.

As im getting this error, im trying to run the data on a cluster (v.0.9.95). But now im getting this one:

Step 1: Loading sorted fastq data to Samples
No PE fastq pairs detected based on filenames, assuming SE data.
fastq file name (L.firma_morf2_RDLF021_Tanzania_GBS_R2_.fastq.gz) has a filename that suggests it may be an R2 read, but its paired R1 file could not be found. Paired files should have matching names except for _1 _2, _R1 R2, or any of these followed by a '' or '.'.

I cant find the source of the problem: ls /lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/*.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_PAR052_Goflag_R1_.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_PAR052_Goflag_R2_.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_RDLF056b_GBS_R1_.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_RDLF056b_GBS_R2_.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_RDP011_ToLprobe_R1_.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_RDP011_ToLprobe_R2_.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_WA01_Goflag_R1_.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_WA01_Goflag_R2_.fastq.gz

barcode file:
L.acuminata_PAR052_Goflag CGGT
L.acuminata_RDLF056b_GBS TGCG
L.acuminata_RDP011_ToLprobe GTAT
L.acuminata_WA01_Goflag AACCA
L.affinis_PAR040_Goflag CCACG
L.affinis_PAR041_Goflag TATAA

Found someone with the same error: https://community.france-bioinformatique.fr/t/probleme-ipyrad/5051
I think it is also space or ram?

Thaks in advanced

@isaacovercast
Copy link
Collaborator

Hello, for the first error if it is intermittent it does sound like a resource issue, either RAM or running out of disk space. If you post the exact error message I can tell you which one it is, but both are common in step 3 so I couldn't guess without seeing the error message.

For the second error, please update to the most recent version of ipyrad (0.9.104) as this problem has already been solved in a more recent version than you are running on the HPC. Let me know if this doesn't fix it for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants