Nan values in training #19

folkaholic · 2024-01-17T09:39:58Z

Hi there,

Thank you for sharing your nice work!
I met a problem when I try to train your model, it returned the nan loss and risk like below:
batch 99, loss: nan, label: 1, event_time: 14.6800, risk: nan, bag_size:
batch 199, loss: nan, label: 1, event_time: 20.1700, risk: nan, bag_size:
batch 299, loss: nan, label: 2, event_time: 29.3000, risk: nan, bag_size:

The error info are:
File "/opt/anaconda3/envs/mcat/lib/python3.11/site-packages/sksurv/metrics.py", line 214, in concordance_index_censored
event_indicator, event_time, estimate = _check_inputs(event_indicator, event_time, estimate)

File "/opt/anaconda3/envs/mcat/lib/python3.11/site-packages/sksurv/metrics.py", line 47, in _check_inputs
estimate = _check_estimate_1d(estimate, event_time)

File "/opt/anaconda3/envs/mcat/lib/python3.11/site-packages/sksurv/metrics.py", line 36, in _check_estimate_1d
estimate = check_array(estimate, ensure_2d=False, input_name="estimate")

File "/opt/anaconda3/envs/mcat/lib/python3.11/site-packages/sklearn/utils/validation.py", line 921, in check_array
_assert_all_finite(
File "/opt/anaconda3/envs/mcat/lib/python3.11/site-packages/sklearn/utils/validation.py", line 161, in _assert_all_finite
raise ValueError(msg_err)
ValueError: Input estimate contains NaN.

I checked the input and output of the model and found there are many nan values in the feature of both WSI and omic data which lead to the nan output of the hazards and S. I strictly followed the instructions you provided and really confused why this nan value would appear. If you met this problem before, could you tell me how to solve this?

Thank you!

Best.

H-Q-N · 2024-06-07T01:32:56Z

Hello, I also have the same problem. I got the same error at the 130th epoch, using the tcga_brca dataset.
{"batch_idx": 763, "batch_loss": NaN, "label": 3, "event_time": 97.4, "risk": NaN, "bag_size": 50139, "_timestamp": 1717683816.283351, "_runtime": 12853.92314505577, "_step": 100978, "epoch": 130, "train_loss_surv": 0.012389047435808234, "train_loss": 0.012389047435808234, "train_c_index": 0.9479772888573457, "_wandb": {"runtime": 12852}}

script-Yang · 2024-07-10T09:48:47Z

Hello, I also have the same problem. I got the same error at the 130th epoch, using the tcga_brca dataset. {"batch_idx": 763, "batch_loss": NaN, "label": 3, "event_time": 97.4, "risk": NaN, "bag_size": 50139, "_timestamp": 1717683816.283351, "_runtime": 12853.92314505577, "_step": 100978, "epoch": 130, "train_loss_surv": 0.012389047435808234, "train_loss": 0.012389047435808234, "train_c_index": 0.9479772888573457, "_wandb": {"runtime": 12852}}

Hi, I recently encountered a similar issue. After a thorough review of the data table, I discovered that there were NaN values present in the dataset. Once I removed these NaN values, I was able to train the model without any further instances of the loss becoming NaN. I hope this insight can be of assistance to you in resolving your issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nan values in training #19

Nan values in training #19

folkaholic commented Jan 17, 2024 •

edited

Loading

H-Q-N commented Jun 7, 2024

script-Yang commented Jul 10, 2024

Nan values in training #19

Nan values in training #19

Comments

folkaholic commented Jan 17, 2024 • edited Loading

H-Q-N commented Jun 7, 2024

script-Yang commented Jul 10, 2024

folkaholic commented Jan 17, 2024 •

edited

Loading