-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different "knobs" to improve accuracy #36
Comments
The knobspace is huge. It should not be possible to do worse than a simple exponential/cox regression model (since it's basically a wtte with beta=1 and only a single linear layer to alpha). What's your baseline that your comparing to? By class imbalanced, do you mean that many sequences doesn't contain observed events? (I.e all timesteps are censored? Unfortunately you have every knob there is in the neural network world 😄 Typically feature engineering is what'll give you the extra mile (seasonality features?, helping out with countdown since events? categorical features? etc) and I'm not the best to answer about what's the latest cool thing. But do try out increasing depth, trying different activation functions for the dense layers at the top, tweaking parameters in batchnormalization and if you wanna go wild consider the Keras Phased Lstm-layer which has a countdown built into it. It's exceptionally good at learning temporal features like your dominant one. But before all these things, remove the RNN layers altogether and only use dense layers with your favorite features to get a sense of what the RNN really learns and what the improvements are. For wtte-specific things; If you're thinking about evaluation, consider #9, if you're getting right-shifted (exploding) distributions or NaNs consider #30, #33 , ideas about incorporating seasonal features check #31. Also remember performance always comes with the risk of overfitting so be wary and good luck! Let me know if you have any more questions and happy to hear your results |
My baseline is the "vanilla" model - for each time period
Each of my sequences either has 1 event (and that's the sequence end) or 0 events (censored). By class imbalance I mean that in each "snapshot" in time, most subject will "survive" this timestep (or the next 2-3 timesteps) - therefore predicting who's going to "churn" is an imbalanced prediction.
At least for me, the biggest motivation for using a RNN was to automate (at least to some degree) the issue of feature engineering. My initial model was a vanilla logisitic regression, which was OK, but since I have more than a few features and I assume they interact, I decided to try the RNN. Does that makes sense?
I'll keep trying and will update once I find something that works. :-) I'm having some troubles with NaNs when I'm trying out different layers - so far, using |
I'm not sure I'm following on the setup of the testing but I'm sure it's specific to your domain so maybe not relevant. But the question who's going to churn is dependent on how you align compare/timestep eg who's going to die next asked on a specific calendar day vs asked w.r.t alignment by age are completely different. I haven't really tried evaluating on similar metrics (eg concordance-index). You can also chose a bunch of things as the prediction to evaluate eg predicted expected value, expected median, probability of churning in
But I guess you're trying to predict when they will churn and then comparing the prediction (probability of event within 1 timestep?) from the wtte-rnn to that of your vanilla (binary classification problem?) RNNs/ANNs will usually help out with the first few miles of feature engineering for sure but for the extra miles great architecture + great feature engineering will help 😄 |
My general workflow is as follows:
Does this sounds like a reasonable way of using the model and evaluating the performance? At the moment I'm trying different architectures and hyperparameters in order to improve this precision, but I keep getting NaNs in pretty much any architecture that's not the one from the example notebook... |
That sounds very reasonable thanks for the description! Best trick to levy NaNs is to have the final layer before the linear dense 2-output-activation layer to have Tanh activations. That gives you more freedom. But NaNs and overfitting is always a risk of moving fast. Also, if you post the plots from the wtte callback I could get a better picture Would really appreciate to hear feedback on what seem to work, not work and we can discuss here. |
Small update - I decided to try a much bigger network - instead of 1-layer-3-GRUs from the example, I tried 3-layers-128-lstm architecture, and it seems to work much better. I also started using keras |
Similar to #32, I'm also trying to use wtte-rnn for prediction on real data; I'm not getting very good performance, and trying to understand what are the different "knobs" I can play with for improving prediction.
Some general info:
I've tried using more GRUs, changing them to LSTMs, adding an initial dense layer, etc, but the whole thing feels too random. Any ideas on what what to tweak and how would be appreciated.
The text was updated successfully, but these errors were encountered: