What codes are run automatically following the step performed in terminal for Seq2Pring #3

mason-sweat1 · 2025-03-15T00:52:33Z

Hello,

Great software, I am so excited to try it on my data, thank you very much for your hard work in developing it.

I have made it to the following step:
"
import torch
from torch.cuda.amp import GradScaler

Patch torch.amp to include GradScaler

torch.amp.GradScaler = GradScaler

for path in ['temp','model']:
if not os.path.exists(os.path.join(work_dir, path)):
os.makedirs(os.path.join(work_dir, path))

for sample in samples:
scp.tl.launch_seq2print(model_config_path=f'{work_dir}/configs/PBMC_bulkATAC_{sample}_fold{fold}.JSON',
temp_dir=f'{work_dir}/temp',
model_dir=f'{work_dir}/model',
data_dir=work_dir,
gpus=samples.index(sample),
wandb_project='scPrinter_seq_PBMC_bulkATAC', # wandb helps you manage loggins
verbose=True,
launch=False # launch=True, this command would launch the scripts directly,
# otherwise, it will just display the commands, you should copy them and run them.
)
"

Since I am using a SLURM-managed system, I ran the code using sbatch for two samples, aCM1final_tsv and aCM2final_tsv.

Finally the job completed (exit code 0) but I cannot complete the next step on Jupyter, with the following error:

"
FileNotFoundError Traceback (most recent call last)
Cell In[14], line 4
2 adata_tfbs = {}
3 for sample_ind, sample in enumerate(samples):
----> 4 adata_tfbs[sample] = scp.tl.seq_tfbs_seq2print(seq_attr_count=None,
5 seq_attr_footprint=None,
6 genome=printer.genome,
7 region_path=f'{work_dir}/regions_test.bed',
8 gpus=[1], # change it to the available gpus
9 model_type='seq2print',
10 model_path=model_path_dict[sample], # For now we just run on one fold but you can provide a list of paths to all 5 folds
11 lora_config=json.load(open(f'{work_dir}/configs/PBMC_bulkATAC_{sample}fold0.JSON', 'r')),
12 group_names=[sample],
13 verbose=False,
14 launch=True,
15 return_adata=True, # turn this as True
16 overwrite_seqattr=True,
17 post_normalize=False,
18 save_key=f'PBMC_bulkATAC{sample}', # and input a save_key
19 save_path=work_dir)

File /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter/scprinter/tools.py:2430, in seq_tfbs_seq2print(seq_attr_count, seq_attr_footprint, genome, region_path, gpus, model_type, model_path, lora_config, group_names, save_group_names, save_path, overwrite_seqattr, post_normalize, verbose, launch, return_adata, save_key)
2428 regions = regionparser(region_path, printer=None, width=800)
2429 region_identifiers = df2regionidentifier(regions)
-> 2430 results = np.load(f"{save_key}_TFBS.npz")["tfbs"]
2432 print("obs=groups, var=regions")
2433 lora_ids_str = save_group_names

File /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/conda/lib/python3.9/site-packages/numpy/lib/npyio.py:427, in load(file, mmap_mode, allow_pickle, fix_imports, encoding, max_header_size)
425 own_fid = False
426 else:
--> 427 fid = stack.enter_context(open(os_fspath(file), "rb"))
428 own_fid = True
430 # Code to distinguish from NumPy binary files and pickles.

FileNotFoundError: [Errno 2] No such file or directory: '/lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter/seq2print/PBMC_bulkATAC_aCM1final_tsv_TFBS.npz'
"

When I look in the log for the job, this is the final command that I see:

"
wandb: 🚀 View run cheese-brulee-16 at: https://wandb.ai/masonsweat-boston-children-s-hospital/scPrinter_seq_PBMC_bulkATAC/runs/mzkvu5fx
wandb: ⭐️ View project at: https://wandb.ai/masonsweat-boston-children-s-hospital/scPrinter_seq_PBMC_bulkATAC
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20250313_222634-mzkvu5fx/logs

50%|█████ | 15109/30000 [1:50:15<51:26, 4.82it/s]�[AUsing preset, the following parameters would be overwritten
using wrapper: count
using nth_output: 0
using decay: 0.85
Launching the following command now (no action needed from your side)
seq2print_attr --pt /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/model/PBMC_bulkATAC_aCM1final_tsv_fold0-cheese-brulee-16.pt --peaks /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/seq2print_cleaned_narrowPeak.bed --method shap_hypo --wrapper count --nth_output 0 --gpus 0 --genome mm10 --decay 0.85 --save_key deepshap --overwrite --model_norm count --sample 30000 --save_norm
Using preset, the following parameters would be overwritten
using wrapper: just_sum
using nth_output: 0-30
using decay: 0.85
Launching the following command now (no action needed from your side)
seq2print_attr --pt /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/model/PBMC_bulkATAC_aCM1final_tsv_fold0-cheese-brulee-16.pt --peaks /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/seq2print_cleaned_narrowPeak.bed --method shap_hypo --wrapper just_sum --nth_output 0-30 --gpus 0 --genome mm10 --decay 0.85 --save_key deepshap --overwrite --model_norm footprint --sample 30000 --save_norm
count head normalization factor -0.003202724503353238 5.8794023061636835e-05 0.004055202309973542
footprint head normalization factor -13.501025199890137 -0.43559572100639343 18.708645439147986

50%|█████ | 15110/30000 [1:50:15<51:25, 4.83it/s]�[A

50%|█████ | 15111/30000 [1:50:15<51:20, 4.83it/s]�[A

50%|█████ | 15112/30000 [1:50:15<51:21, 4.83it/s]�[A

50%|█████ | 15113/30000 [1:50:16<51:18, 4.84it/s]�[A

50%|█████ | 15114/30000 [1:50:16<51:14, 4.84it/s]�[A

50%|█████ | 15115/30000 [1:50:16<51:12, 4.84it/s]�[A

50%|█████ | 15116/30000 [1:50:16<51:12, 4.84it/s]�[A

50%|█████ | 15117/30000 [1:50:16<51:10, 4.85it/s]�[A

50%|█████ | 15118/30000 [1:50:17<51:12, 4.84it/s]�[A

50%|█████ | 15119/30000 [1:50:17<51:09, 4.85it/s]�[A

50%|█████ | 15120/30000 [1:50:17<51:12, 4.84it/s]�[A
"

and thats it.

I believe the modeling finished completly, but I'm not sure how to tell.

Could you please help me determine if model training was successful, and what commands I can run through terminal to complete file creation, so I do not need to repeat the entire modeling process?

Thank you very much,

Mason

mason-sweat1 · 2025-03-15T01:06:53Z

Also, if you could please put a list of files that should appear in the models folder, so I know I have them all at the end, I would really appreciate it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What codes are run automatically following the step performed in terminal for Seq2Pring #3

What codes are run automatically following the step performed in terminal for Seq2Pring #3

mason-sweat1 commented Mar 15, 2025

mason-sweat1 commented Mar 15, 2025

What codes are run automatically following the step performed in terminal for Seq2Pring #3

What codes are run automatically following the step performed in terminal for Seq2Pring #3

Comments

mason-sweat1 commented Mar 15, 2025

Patch torch.amp to include GradScaler

mason-sweat1 commented Mar 15, 2025