You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Great software, I am so excited to try it on my data, thank you very much for your hard work in developing it.
I have made it to the following step:
"
import torch
from torch.cuda.amp import GradScaler
Patch torch.amp to include GradScaler
torch.amp.GradScaler = GradScaler
for path in ['temp','model']:
if not os.path.exists(os.path.join(work_dir, path)):
os.makedirs(os.path.join(work_dir, path))
for sample in samples:
scp.tl.launch_seq2print(model_config_path=f'{work_dir}/configs/PBMC_bulkATAC_{sample}_fold{fold}.JSON',
temp_dir=f'{work_dir}/temp',
model_dir=f'{work_dir}/model',
data_dir=work_dir,
gpus=samples.index(sample),
wandb_project='scPrinter_seq_PBMC_bulkATAC', # wandb helps you manage loggins
verbose=True,
launch=False # launch=True, this command would launch the scripts directly,
# otherwise, it will just display the commands, you should copy them and run them.
)
"
Since I am using a SLURM-managed system, I ran the code using sbatch for two samples, aCM1final_tsv and aCM2final_tsv.
Finally the job completed (exit code 0) but I cannot complete the next step on Jupyter, with the following error:
"
FileNotFoundError Traceback (most recent call last)
Cell In[14], line 4
2 adata_tfbs = {}
3 for sample_ind, sample in enumerate(samples):
----> 4 adata_tfbs[sample] = scp.tl.seq_tfbs_seq2print(seq_attr_count=None,
5 seq_attr_footprint=None,
6 genome=printer.genome,
7 region_path=f'{work_dir}/regions_test.bed',
8 gpus=[1], # change it to the available gpus
9 model_type='seq2print',
10 model_path=model_path_dict[sample], # For now we just run on one fold but you can provide a list of paths to all 5 folds
11 lora_config=json.load(open(f'{work_dir}/configs/PBMC_bulkATAC_{sample}fold0.JSON', 'r')),
12 group_names=[sample],
13 verbose=False,
14 launch=True,
15 return_adata=True, # turn this as True
16 overwrite_seqattr=True,
17 post_normalize=False,
18 save_key=f'PBMC_bulkATAC{sample}', # and input a save_key
19 save_path=work_dir)
File /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/conda/lib/python3.9/site-packages/numpy/lib/npyio.py:427, in load(file, mmap_mode, allow_pickle, fix_imports, encoding, max_header_size)
425 own_fid = False
426 else:
--> 427 fid = stack.enter_context(open(os_fspath(file), "rb"))
428 own_fid = True
430 # Code to distinguish from NumPy binary files and pickles.
FileNotFoundError: [Errno 2] No such file or directory: '/lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter/seq2print/PBMC_bulkATAC_aCM1final_tsv_TFBS.npz'
"
When I look in the log for the job, this is the final command that I see:
50%|█████ | 15109/30000 [1:50:15<51:26, 4.82it/s]�[AUsing preset, the following parameters would be overwritten
using wrapper: count
using nth_output: 0
using decay: 0.85
Launching the following command now (no action needed from your side)
seq2print_attr --pt /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/model/PBMC_bulkATAC_aCM1final_tsv_fold0-cheese-brulee-16.pt --peaks /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/seq2print_cleaned_narrowPeak.bed --method shap_hypo --wrapper count --nth_output 0 --gpus 0 --genome mm10 --decay 0.85 --save_key deepshap --overwrite --model_norm count --sample 30000 --save_norm
Using preset, the following parameters would be overwritten
using wrapper: just_sum
using nth_output: 0-30
using decay: 0.85
Launching the following command now (no action needed from your side)
seq2print_attr --pt /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/model/PBMC_bulkATAC_aCM1final_tsv_fold0-cheese-brulee-16.pt --peaks /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/seq2print_cleaned_narrowPeak.bed --method shap_hypo --wrapper just_sum --nth_output 0-30 --gpus 0 --genome mm10 --decay 0.85 --save_key deepshap --overwrite --model_norm footprint --sample 30000 --save_norm
count head normalization factor -0.003202724503353238 5.8794023061636835e-05 0.004055202309973542
footprint head normalization factor -13.501025199890137 -0.43559572100639343 18.708645439147986
I believe the modeling finished completly, but I'm not sure how to tell.
Could you please help me determine if model training was successful, and what commands I can run through terminal to complete file creation, so I do not need to repeat the entire modeling process?
Thank you very much,
Mason
The text was updated successfully, but these errors were encountered:
Also, if you could please put a list of files that should appear in the models folder, so I know I have them all at the end, I would really appreciate it!
Hello,
Great software, I am so excited to try it on my data, thank you very much for your hard work in developing it.
I have made it to the following step:
"
import torch
from torch.cuda.amp import GradScaler
Patch torch.amp to include GradScaler
torch.amp.GradScaler = GradScaler
for path in ['temp','model']:
if not os.path.exists(os.path.join(work_dir, path)):
os.makedirs(os.path.join(work_dir, path))
for sample in samples:
scp.tl.launch_seq2print(model_config_path=f'{work_dir}/configs/PBMC_bulkATAC_{sample}_fold{fold}.JSON',
temp_dir=f'{work_dir}/temp',
model_dir=f'{work_dir}/model',
data_dir=work_dir,
gpus=samples.index(sample),
wandb_project='scPrinter_seq_PBMC_bulkATAC', # wandb helps you manage loggins
verbose=True,
launch=False # launch=True, this command would launch the scripts directly,
# otherwise, it will just display the commands, you should copy them and run them.
)
"
Since I am using a SLURM-managed system, I ran the code using sbatch for two samples, aCM1final_tsv and aCM2final_tsv.
Finally the job completed (exit code 0) but I cannot complete the next step on Jupyter, with the following error:
"
FileNotFoundError Traceback (most recent call last)
Cell In[14], line 4
2 adata_tfbs = {}
3 for sample_ind, sample in enumerate(samples):
----> 4 adata_tfbs[sample] = scp.tl.seq_tfbs_seq2print(seq_attr_count=None,
5 seq_attr_footprint=None,
6 genome=printer.genome,
7 region_path=f'{work_dir}/regions_test.bed',
8 gpus=[1], # change it to the available gpus
9 model_type='seq2print',
10 model_path=model_path_dict[sample], # For now we just run on one fold but you can provide a list of paths to all 5 folds
11 lora_config=json.load(open(f'{work_dir}/configs/PBMC_bulkATAC_{sample}fold0.JSON', 'r')),
12 group_names=[sample],
13 verbose=False,
14 launch=True,
15 return_adata=True, # turn this as True
16 overwrite_seqattr=True,
17 post_normalize=False,
18 save_key=f'PBMC_bulkATAC{sample}', # and input a save_key
19 save_path=work_dir)
File /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter/scprinter/tools.py:2430, in seq_tfbs_seq2print(seq_attr_count, seq_attr_footprint, genome, region_path, gpus, model_type, model_path, lora_config, group_names, save_group_names, save_path, overwrite_seqattr, post_normalize, verbose, launch, return_adata, save_key)
2428 regions = regionparser(region_path, printer=None, width=800)
2429 region_identifiers = df2regionidentifier(regions)
-> 2430 results = np.load(f"{save_key}_TFBS.npz")["tfbs"]
2432 print("obs=groups, var=regions")
2433 lora_ids_str = save_group_names
File /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/conda/lib/python3.9/site-packages/numpy/lib/npyio.py:427, in load(file, mmap_mode, allow_pickle, fix_imports, encoding, max_header_size)
425 own_fid = False
426 else:
--> 427 fid = stack.enter_context(open(os_fspath(file), "rb"))
428 own_fid = True
430 # Code to distinguish from NumPy binary files and pickles.
FileNotFoundError: [Errno 2] No such file or directory: '/lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter/seq2print/PBMC_bulkATAC_aCM1final_tsv_TFBS.npz'
"
When I look in the log for the job, this is the final command that I see:
"
wandb: 🚀 View run cheese-brulee-16 at: https://wandb.ai/masonsweat-boston-children-s-hospital/scPrinter_seq_PBMC_bulkATAC/runs/mzkvu5fx
wandb: ⭐️ View project at: https://wandb.ai/masonsweat-boston-children-s-hospital/scPrinter_seq_PBMC_bulkATAC
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20250313_222634-mzkvu5fx/logs
50%|█████ | 15109/30000 [1:50:15<51:26, 4.82it/s]�[AUsing preset, the following parameters would be overwritten
using wrapper: count
using nth_output: 0
using decay: 0.85
Launching the following command now (no action needed from your side)
seq2print_attr --pt /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/model/PBMC_bulkATAC_aCM1final_tsv_fold0-cheese-brulee-16.pt --peaks /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/seq2print_cleaned_narrowPeak.bed --method shap_hypo --wrapper count --nth_output 0 --gpus 0 --genome mm10 --decay 0.85 --save_key deepshap --overwrite --model_norm count --sample 30000 --save_norm
Using preset, the following parameters would be overwritten
using wrapper: just_sum
using nth_output: 0-30
using decay: 0.85
Launching the following command now (no action needed from your side)
seq2print_attr --pt /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/model/PBMC_bulkATAC_aCM1final_tsv_fold0-cheese-brulee-16.pt --peaks /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/CHD4/scPrinter//seq2print/seq2print_cleaned_narrowPeak.bed --method shap_hypo --wrapper just_sum --nth_output 0-30 --gpus 0 --genome mm10 --decay 0.85 --save_key deepshap --overwrite --model_norm footprint --sample 30000 --save_norm
count head normalization factor -0.003202724503353238 5.8794023061636835e-05 0.004055202309973542
footprint head normalization factor -13.501025199890137 -0.43559572100639343 18.708645439147986
50%|█████ | 15110/30000 [1:50:15<51:25, 4.83it/s]�[A
50%|█████ | 15111/30000 [1:50:15<51:20, 4.83it/s]�[A
50%|█████ | 15112/30000 [1:50:15<51:21, 4.83it/s]�[A
50%|█████ | 15113/30000 [1:50:16<51:18, 4.84it/s]�[A
50%|█████ | 15114/30000 [1:50:16<51:14, 4.84it/s]�[A
50%|█████ | 15115/30000 [1:50:16<51:12, 4.84it/s]�[A
50%|█████ | 15116/30000 [1:50:16<51:12, 4.84it/s]�[A
50%|█████ | 15117/30000 [1:50:16<51:10, 4.85it/s]�[A
50%|█████ | 15118/30000 [1:50:17<51:12, 4.84it/s]�[A
50%|█████ | 15119/30000 [1:50:17<51:09, 4.85it/s]�[A
50%|█████ | 15120/30000 [1:50:17<51:12, 4.84it/s]�[A
"
and thats it.
I believe the modeling finished completly, but I'm not sure how to tell.
Could you please help me determine if model training was successful, and what commands I can run through terminal to complete file creation, so I do not need to repeat the entire modeling process?
Thank you very much,
Mason
The text was updated successfully, but these errors were encountered: