PE Data on HPI clusters #589

PAguado-Ramsay · 2025-01-30T12:01:02Z

Hi,

Im trying to run ipyrad on my computer with a reference genome and pair end gbs data:

------- ipyrad params file (v.0.9.103)------------------------------------------
prueba-lewinskya ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
/media/pablichu/DiscuDuroPablo2022/PRUEBA_IPYRAD_HYBSEQ/Lewinskya ## [1] [project_dir]: Project dir (made in curdir if not present)
## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files
/media/pablichu/DiscuDuroPablo2022/PRUEBA_IPYRAD_HYBSEQ/Lewinskya/lewinskya-barcodes.txt ## [3] [barcodes_path]: Location of barcodes file
/media/pablichu/DiscuDuroPablo2022/PRUEBA_IPYRAD_HYBSEQ/Lewinskya/trimmed_seqs/*.fastq.gz ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files
reference ## [5] [assembly_method]: Assembly method (denovo, reference)
/media/pablichu/DiscuDuroPablo2022/PRUEBA_IPYRAD_HYBSEQ/Lewinskya/ena_PRJEB71549_sequence.fasta ## [6] [reference_sequence]: Location of reference sequence file
pairgbs ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
TGCAG, ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)
5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
6 ## [11] [mindepth_statistical]: Min depth for statistical base calling
6 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
10000 ## [13] [maxdepth]: Max cluster depth within samples
0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly
0 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes
2 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter)
35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim
1 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
0.05 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus
0.05 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus
4 ## [21] [min_samples_locus]: Min # samples per locus for output
0.2 ## [22] [max_SNPs_locus]: Max # SNPs per locus
8 ## [23] [max_Indels_locus]: Max # of indels per locus
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus
0, 0, 0, 0 ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs)
0, 0, 0, 0 ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)

                                                                                          ## [27] [output_formats]: Output formats (see docs)
                                                                                          ## [28] [pop_assign_file]: Path to population assignment file
                                                                                          ## [29] [reference_as_filter]: Reads mapped to this reference are removed in step 3

The most common error I get (sometimes I get it, others not?) is that the program shuts down in the middle of Step 3. I think it might be space or ram? Some kind of feedback would be nice.

As im getting this error, im trying to run the data on a cluster (v.0.9.95). But now im getting this one:

Step 1: Loading sorted fastq data to Samples
No PE fastq pairs detected based on filenames, assuming SE data.
fastq file name (L.firma_morf2_RDLF021_Tanzania_GBS_R2_.fastq.gz) has a filename that suggests it may be an R2 read, but its paired R1 file could not be found. Paired files should have matching names except for _1 _2, _R1 R2, or any of these followed by a '' or '.'.

I cant find the source of the problem: ls /lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/*.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_PAR052_Goflag_R1_.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_PAR052_Goflag_R2_.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_RDLF056b_GBS_R1_.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_RDLF056b_GBS_R2_.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_RDP011_ToLprobe_R1_.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_RDP011_ToLprobe_R2_.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_WA01_Goflag_R1_.fastq.gz
/lustre/proyectos/plantphylo/paguram/lewinskya/trimmed_seqs/L.acuminata_WA01_Goflag_R2_.fastq.gz

barcode file:
L.acuminata_PAR052_Goflag CGGT
L.acuminata_RDLF056b_GBS TGCG
L.acuminata_RDP011_ToLprobe GTAT
L.acuminata_WA01_Goflag AACCA
L.affinis_PAR040_Goflag CCACG
L.affinis_PAR041_Goflag TATAA

Found someone with the same error: https://community.france-bioinformatique.fr/t/probleme-ipyrad/5051
I think it is also space or ram?

Thaks in advanced

The text was updated successfully, but these errors were encountered:

isaacovercast · 2025-01-30T14:14:35Z

Hello, for the first error if it is intermittent it does sound like a resource issue, either RAM or running out of disk space. If you post the exact error message I can tell you which one it is, but both are common in step 3 so I couldn't guess without seeing the error message.

For the second error, please update to the most recent version of ipyrad (0.9.104) as this problem has already been solved in a more recent version than you are running on the HPC. Let me know if this doesn't fix it for you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PE Data on HPI clusters #589

PE Data on HPI clusters #589

PAguado-Ramsay commented Jan 30, 2025

isaacovercast commented Jan 30, 2025

PE Data on HPI clusters #589

PE Data on HPI clusters #589

Comments

PAguado-Ramsay commented Jan 30, 2025

isaacovercast commented Jan 30, 2025