Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What codes are run automatically following the step performed in terminal for Seq2Pring #3

Open
mason-sweat1 opened this issue Mar 15, 2025 · 1 comment

Comments

@mason-sweat1
Copy link

Hello,

Great software, I am so excited to try it on my data, thank you very much for your hard work in developing it.

I have made it to the following step:
"
import torch
from torch.cuda.amp import GradScaler

Patch torch.amp to include GradScaler

torch.amp.GradScaler = GradScaler

for path in ['temp','model']:
if not os.path.exists(os.path.join(work_dir, path)):
os.makedirs(os.path.join(work_dir, path))

for sample in samples:
scp.tl.launch_seq2print(model_config_path=f'{work_dir}/configs/PBMC_bulkATAC_{sample}_fold{fold}.JSON',
temp_dir=f'{work_dir}/temp',
model_dir=f'{work_dir}/model',
data_dir=work_dir,
gpus=samples.index(sample),
wandb_project='scPrinter_seq_PBMC_bulkATAC', # wandb helps you manage loggins
verbose=True,
launch=False # launch=True, this command would launch the scripts directly,
# otherwise, it will just display the commands, you should copy them and run them.
)
"

Since I am using a SLURM-managed system, I ran the code using sbatch for two samples, aCM1final_tsv and aCM2final_tsv.

Finally the job completed (exit code 0) but I cannot complete the next step on Jupyter, with the following error:

"
FileNotFoundError Traceback (most recent call last)
Cell In[14], line 4
2 adata_tfbs = {}
3 for sample_ind, sample in enumerate(samples):
----> 4 adata_tfbs[sample] = scp.tl.seq_tfbs_seq2print(seq_attr_count=None,
5 seq_attr_footprint=None,
6 genome=printer.genome,
7 region_path=f'{work_dir}/regions_test.bed',
8 gpus=[1], # change it to the available gpus
9 model_type='seq2print',
10 model_path=model_path_dict[sample], # For now we just run on one fold but you can provide a list of paths to all 5 folds
11 lora_config=json.load(open(f'{work_dir}/configs/PBMC_bulkATAC_{sample}fold0.JSON', 'r')),
12 group_names=[sample],
13 verbose=False,
14 launch=True,
15 return_adata=True, # turn this as True
16 overwrite_seqattr=True,
17 post_normalize=False,
18 save_key=f'PBMC_bulkATAC
{sample}', # and input a save_key
19 save_path=work_dir)

File /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter/scprinter/tools.py:2430, in seq_tfbs_seq2print(seq_attr_count, seq_attr_footprint, genome, region_path, gpus, model_type, model_path, lora_config, group_names, save_group_names, save_path, overwrite_seqattr, post_normalize, verbose, launch, return_adata, save_key)
2428 regions = regionparser(region_path, printer=None, width=800)
2429 region_identifiers = df2regionidentifier(regions)
-> 2430 results = np.load(f"{save_key}_TFBS.npz")["tfbs"]
2432 print("obs=groups, var=regions")
2433 lora_ids_str = save_group_names

File /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/conda/lib/python3.9/site-packages/numpy/lib/npyio.py:427, in load(file, mmap_mode, allow_pickle, fix_imports, encoding, max_header_size)
425 own_fid = False
426 else:
--> 427 fid = stack.enter_context(open(os_fspath(file), "rb"))
428 own_fid = True
430 # Code to distinguish from NumPy binary files and pickles.

FileNotFoundError: [Errno 2] No such file or directory: '/lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter/seq2print/PBMC_bulkATAC_aCM1final_tsv_TFBS.npz'
"

When I look in the log for the job, this is the final command that I see:

"
wandb: 🚀 View run cheese-brulee-16 at: https://wandb.ai/masonsweat-boston-children-s-hospital/scPrinter_seq_PBMC_bulkATAC/runs/mzkvu5fx
wandb: ⭐️ View project at: https://wandb.ai/masonsweat-boston-children-s-hospital/scPrinter_seq_PBMC_bulkATAC
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20250313_222634-mzkvu5fx/logs

50%|█████ | 15109/30000 [1:50:15<51:26, 4.82it/s]�[AUsing preset, the following parameters would be overwritten
using wrapper: count
using nth_output: 0
using decay: 0.85
Launching the following command now (no action needed from your side)
seq2print_attr --pt /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/model/PBMC_bulkATAC_aCM1final_tsv_fold0-cheese-brulee-16.pt --peaks /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/seq2print_cleaned_narrowPeak.bed --method shap_hypo --wrapper count --nth_output 0 --gpus 0 --genome mm10 --decay 0.85 --save_key deepshap --overwrite --model_norm count --sample 30000 --save_norm
Using preset, the following parameters would be overwritten
using wrapper: just_sum
using nth_output: 0-30
using decay: 0.85
Launching the following command now (no action needed from your side)
seq2print_attr --pt /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/model/PBMC_bulkATAC_aCM1final_tsv_fold0-cheese-brulee-16.pt --peaks /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/seq2print_cleaned_narrowPeak.bed --method shap_hypo --wrapper just_sum --nth_output 0-30 --gpus 0 --genome mm10 --decay 0.85 --save_key deepshap --overwrite --model_norm footprint --sample 30000 --save_norm
count head normalization factor -0.003202724503353238 5.8794023061636835e-05 0.004055202309973542
footprint head normalization factor -13.501025199890137 -0.43559572100639343 18.708645439147986

50%|█████ | 15110/30000 [1:50:15<51:25, 4.83it/s]�[A

50%|█████ | 15111/30000 [1:50:15<51:20, 4.83it/s]�[A

50%|█████ | 15112/30000 [1:50:15<51:21, 4.83it/s]�[A

50%|█████ | 15113/30000 [1:50:16<51:18, 4.84it/s]�[A

50%|█████ | 15114/30000 [1:50:16<51:14, 4.84it/s]�[A

50%|█████ | 15115/30000 [1:50:16<51:12, 4.84it/s]�[A

50%|█████ | 15116/30000 [1:50:16<51:12, 4.84it/s]�[A

50%|█████ | 15117/30000 [1:50:16<51:10, 4.85it/s]�[A

50%|█████ | 15118/30000 [1:50:17<51:12, 4.84it/s]�[A

50%|█████ | 15119/30000 [1:50:17<51:09, 4.85it/s]�[A

50%|█████ | 15120/30000 [1:50:17<51:12, 4.84it/s]�[A
"

and thats it.

I believe the modeling finished completly, but I'm not sure how to tell.

Could you please help me determine if model training was successful, and what commands I can run through terminal to complete file creation, so I do not need to repeat the entire modeling process?

Thank you very much,

Mason

@mason-sweat1
Copy link
Author

Also, if you could please put a list of files that should appear in the models folder, so I know I have them all at the end, I would really appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant