Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nan values in training #19

Open
folkaholic opened this issue Jan 17, 2024 · 2 comments
Open

Nan values in training #19

folkaholic opened this issue Jan 17, 2024 · 2 comments

Comments

@folkaholic
Copy link

folkaholic commented Jan 17, 2024

Hi there,

Thank you for sharing your nice work!
I met a problem when I try to train your model, it returned the nan loss and risk like below:
batch 99, loss: nan, label: 1, event_time: 14.6800, risk: nan, bag_size:
batch 199, loss: nan, label: 1, event_time: 20.1700, risk: nan, bag_size:
batch 299, loss: nan, label: 2, event_time: 29.3000, risk: nan, bag_size:

The error info are:
File "/opt/anaconda3/envs/mcat/lib/python3.11/site-packages/sksurv/metrics.py", line 214, in concordance_index_censored
event_indicator, event_time, estimate = _check_inputs(event_indicator, event_time, estimate)

File "/opt/anaconda3/envs/mcat/lib/python3.11/site-packages/sksurv/metrics.py", line 47, in _check_inputs
estimate = _check_estimate_1d(estimate, event_time)

File "/opt/anaconda3/envs/mcat/lib/python3.11/site-packages/sksurv/metrics.py", line 36, in _check_estimate_1d
estimate = check_array(estimate, ensure_2d=False, input_name="estimate")

File "/opt/anaconda3/envs/mcat/lib/python3.11/site-packages/sklearn/utils/validation.py", line 921, in check_array
_assert_all_finite(
File "/opt/anaconda3/envs/mcat/lib/python3.11/site-packages/sklearn/utils/validation.py", line 161, in _assert_all_finite
raise ValueError(msg_err)
ValueError: Input estimate contains NaN.

I checked the input and output of the model and found there are many nan values in the feature of both WSI and omic data which lead to the nan output of the hazards and S. I strictly followed the instructions you provided and really confused why this nan value would appear. If you met this problem before, could you tell me how to solve this?

Thank you!

Best.

@H-Q-N
Copy link

H-Q-N commented Jun 7, 2024

Hello, I also have the same problem. I got the same error at the 130th epoch, using the tcga_brca dataset.
{"batch_idx": 763, "batch_loss": NaN, "label": 3, "event_time": 97.4, "risk": NaN, "bag_size": 50139, "_timestamp": 1717683816.283351, "_runtime": 12853.92314505577, "_step": 100978, "epoch": 130, "train_loss_surv": 0.012389047435808234, "train_loss": 0.012389047435808234, "train_c_index": 0.9479772888573457, "_wandb": {"runtime": 12852}}

@script-Yang
Copy link

Hello, I also have the same problem. I got the same error at the 130th epoch, using the tcga_brca dataset. {"batch_idx": 763, "batch_loss": NaN, "label": 3, "event_time": 97.4, "risk": NaN, "bag_size": 50139, "_timestamp": 1717683816.283351, "_runtime": 12853.92314505577, "_step": 100978, "epoch": 130, "train_loss_surv": 0.012389047435808234, "train_loss": 0.012389047435808234, "train_c_index": 0.9479772888573457, "_wandb": {"runtime": 12852}}

Hi, I recently encountered a similar issue. After a thorough review of the data table, I discovered that there were NaN values present in the dataset. Once I removed these NaN values, I was able to train the model without any further instances of the loss becoming NaN. I hope this insight can be of assistance to you in resolving your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants