-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training Objective #3
Comments
Hi Aj, So in this project I experimented with the loss function. Indeed, I drew inspiration from the variational auto encoder.- However, in the variational auto encoder, we penalize the mu en sigma of each individual data point. As you can read in the paper. From a Bayesian modeling perspective, this makes sense. We model the generative process as starting with a sample from a unit Gaussian. During training, we maximize the lower bound, which corresponds then to the KL divergence between these samples and the unit Gaussian. I wanted to experiment with this, because I couldn't understand why every point should have zero mean. I aimed for an interpretable latent space. And all vectors at zero isn't so interesting. Hence, I designed a loss function, that on average, puts all the latent points in a Unit Gaussian around the origin. |
Hi Rob, thank you very very much for the kind explanation 👍 I think I've heard of a similar idea of sampling from the unit Gaussian sphere in GAN models, but not sure I understood it. It does sound very reasonable, allowing a non-zero bias, and seems to produce a clear latent vector separation (judging by the tSNE plot). Is there any paper which describe this in mathematical detail, I think that would help me to be sure I understand. I'm interested in multivariate behavioural and physiological streaming data, like activity levels, heart rate, temperature and sleep patterns and the like. Unfortunately there aren't many such multivariate data-sets available - but I'll try to extend your code to reconstruct multiple streams - there's a simple data-set in the UCR archive - uWaveGestureLibrary_X,Y and Z, which I'll try asap. |
Have you seen these datasets? Here and here I don't know of a paper describing this in mathematical detail, but the deep learning book features a good chapter on auto encoders |
Hi Rob, thanks for the links! I've applied for the first one of those datasets, hopefully I'll get access in a couple of days. May I ask if you've got any experience of working with MIMIC-III? This benchmark seems interesting though it's in Theano, and has some code to load the dataset. I guess for a fresh/raw unlabelled multivariate dataset, a simple sequence-to-sequence (non-variational) auto-encoder would be the simplest place to start? I'm working in PyTorch now as it's easier to debug than TF, there's a recurrent auto-encoder with attention, available here. I just need to modify it for real values, instead of discrete tokens/words, and experiment with different methods to make it's latent representation sparse/structured, like you've done in this repository. As the book says, it's not really that interesting perfectly reconstructing the data, (i.e. using a VAE loss), rather finding a good compact/sparse representation/features that are useful for semi-supervised classification are my main goal. |
No, I haven't worked with those datasets. And yes, an auto encoder is probably your best start. But what is your aim with this? |
As the data is "raw" and multi-modal and collected from individuals, the aim would be to in some sense separate/cluster the individuals in the latent space in those who remained healthy, and those who developed a disease. For example, say we've collected multivariate behavioural and physiological streaming data, like activity levels, heart rate, temperature and sleep patterns and symptom scores for some disease, over a long period say a year. As an example say the disease is major depression. We could then subset these into weekly, or longer time-series and examine what are the characteristics of onset for the disease? At what point can we find/detect clusters in the latent space that later developed the disease, which are easily detectable? In some sense it's like using exploratory data analysis for an early detection problem. Does that help - I hope it's clear? |
Wow, that sounds interesting. I think it could be influential if you can make it work. A small advise: I would advise you to also get a few labels. I know labels are hard to get in the medical world, but they might help you searching for clusters. I think the characteristics you are looking for are not dominantly present in the data. So any auto encoder trained with L2 loss might not necessarily pick up on these signals. Even though a small pattern could be present. With some labelled data points, you could train a linear classifier on top of the encoder. Training with the gradient of this classifier will push apart the data samples from the different classes. That might make your latent space more interpretable. For this, you might find the SSL with VAE paper interesting or the Ladder Network |
Hi @RobRomijnders, Am I right that you are following the VRAE approach (https://arxiv.org/abs/1412.6581) with a modification where the sampling from the latent vector happens? I still don't clearly understand the motivation. You are optimizing: 1.) For a minimal KL divergence between posterior on encoder and prior on z (which is the standard normal distribution) |
Yes, that's right. The framework for Variational Auto encoders includes both those terms in the cost function. You can read on this in the original paper auto encoding variational bayes If you find that is doesn;t train on your dataset. Then try to introduce the KL-cost term only later on in training. I think people have published blogs and papers on tips&tricks to train variational auto encoders |
Hi @AjayTalati, Thanks. |
To respond to @AjayTalati's question at the start, I would like to point out:
Code in pytorch:
For more information - See Appendix B from VAE paper: CC: @RobRomijnders @ketyi |
Hi, thanks for posting this code!
I'm not sure I understand the training objective you're using - is it a variational auto-encoder?
Is
loss_seq
the Kullback-Libeler divergence - line 141and
loss_lat_batch
a reconstruction loss - line 113If you've got a link to a paper or book which describes you code that would be really appreciated.
Thanks very much for you help,
Aj
The text was updated successfully, but these errors were encountered: