-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss dropping to 1e-6 #49
Comments
Since WTTE 1.1 the loss is clipped per default to 1e-6 so NaN is synonymous. Working from the assumption that it's a NaN-problem, there's alot of reasons for it, see for example my response and link to ragulpr/wtte-rnn-examples#2. What cures 99% of problems, use :
If we assume it's not a numerical (NaN) problem and you actually achieve zero-loss that can only be done with perfect fit. If all your samples are censored ( When to use discrete and when to use continuous loss?Continuous data is very rare. If you have very large range of time to event ex Rule of thumb is that real world data is discrete, but if a histogram of TTE is very smooth with no zeros you can consider it as continuous. I've always used discrete in real world applications |
Thanks for the detailed response. You seem to have discrete nice data with very little censoring. Here's some thoughts!:
Also, LSTM may work on long sequences (20k<<) but whether it makes sense to have as long sequences (like 576 as suggested) is a data/domain specific question sorry. NLP folks probably would say its alot, but it all depends. I usually use GRU for sequences up to 2000 timesteps, not sure how wise that is |
You can try:
Even though Relu will be inherently less stable. I suggest using TanH (or sigmoid or similar) for this layer but I think this is only small part of the problem. The problem is in the data. If Whether your definitions are suitable all depends on what you're trying to predict. Maybe you need to elaborate on what predicting the next event means. Here you defined it as the mode, or "the point where it's most likely that the event should happen". Thing is, if it's a decreasing hazard (beta<=1) that point is always 0, or "right now is the most likely". It's not very informative or useful. However you twist or turn predicting the next event is an arbitrary question. You could predict mean, median ("the point where in 50% of cases the event is predicted to have happened") or similar, or use prediction to randomly sample the current tte. What makes sense is dependent on application :) I'm not sure what you mean by "starting point for next prediction". Are you trying to generate new sequences of events? |
Also, since you don't seem to care about history <<1000 steps ago I suggest you use some dilated causal CNN instead of RNN! It's also much faster to train. This is a wavenet-ish network with a 256 timestep memory
|
Awesome, thank you very much! May I buy you a beer or something? I fixed my data, now it looks like this:
However, I still get nan-loss with and without recurrent layers. Your suggestion with convolutional layers works, but it also fails when I add an additional Conv1D with I would like to predict the events in the next 24h, or generating a new sequence of events. I am aware hat I am loosing the probability value of the tte, but that is ok. I was thinking about using the mode for this task, since it still works for beta<=0, but it gives me 84 events per day instead of 5 (in a test day) |
Hi,
thank you for your framework. I am trying to use it for charge event prediction at charging stations.
For this, I have downsampled the data to 5min steps and pass up to one week history to the network.
This is my network (copied from one of your examples)
However, if I train the network, the loss sometimes drops to 1e-6:
Once it had this loss, the loss does not change anymore. I guess there is something with the data?
Sometimes it helps to change the kind of the Loss function from discrete to continouos (btw, when should I use discrete, when continouos?). What is the general problem here? I tried to reduce the network size, but I had no success.
The text was updated successfully, but these errors were encountered: