This repository contains a PyTorch implementation of Conditional Variational Autoencoders (CVAE) using Convolutional Networks, with a focus on deep generative models for subsurface modeling tasks.
A CVAE is a form of variational autoencoder that is conditioned on an observation, where in our case the observation is a function y.
The autoencoders from which variational autoencoders are derived are typically used for problems involving image reconstruction and/or dimensionality reduction.
An Autoencoders is composed of two neural netwroks Encoder and Decoder where an Encoders takes an input prededined by the user and convert it into a low dimensionality space known as latent space and This latent space is passed through decoder to convert it to the original input size.
Additionally, the decoder simultaneously learns to decode the latent space representation and reconstruct that data back to its original input.
In Varitional Autoencoders this latent space is interpreted as a set of parameters governing statistical distributions In proceeding to the decoder network, samples from the latent space (z) are randomly drawn from these distributions and fed into the decoder, therefore adding an element of variation into the process. So in this way Varitional Autoencoders are used for generating samples of input which are similar to the given input(x) and
Conditional Autoencoder is one step up in which we also add a condition that is to be fulfilled and this condition is passes through both encoder and decoder
for data_1 please refer to Starter_Notebook
for data_2 please refer to Starter Notebook1
There are two types of variations in this model-
- One with condition is passing through Encoder
- One without the condition is passing through Encoder
Neural Network | Number of CNN Layers | Number of Linear Layers | Activation Function |
---|---|---|---|
Encoder | 2 | 1 | ReLU |
Decoder | 2 | 1 | LeakyReLU (0.01) |
Neural Network | Number of CNN Layers | Number of Linear Layers | Activation Function |
---|---|---|---|
Encoder | 3 | 1 | ReLU |
Decoder | 3 | 1 | LeakyReLU (0.01) |
The Dual Encoder CVAE is an advanced model that incorporates insights from reaserch paper. It is designed with a dual encoder system to separately process different aspects of the input data, enhancing the generative capabilities of the model.
- Encoder Network (q_\phi): Processes ((y)) and their input data ((x)) to generate a latent representation ((z_q)). This encoder is responsible for capturing the input data characteristics.
- Encoder Network (r_{\theta_1}): Receives only the ((y)) and is tasked with inferring the input_data, outputting a latent representation ((z_r)).
- KL Divergence (KL): The Kullback-Leibler divergence is computed between the latent representations ((\mu_q) and (\mu_r)) from both encoders, forming a crucial component of the loss function, specifically targeting the divergence between the latent spaces.
- Decoder Network (r_{\theta_2}): Utilizes the latent space representation ((z_q)) generated by (q_\phi) and the output data ((y)) to reconstruct the input data's true parameters ((\mu_{r_2})).
The inclusion of two encoders allows the model to separate the encoding of the conditioned data from the data and the encoding of the conditioned data itself, leading to a more robust representation in the latent space. The KL divergence between the two latent representations ensures that the model learns an efficient and informative latent space, which is critical for generating high-quality data samples.
- In the testing phase, only the (r_{\theta_1}) encoder and the (r_{\theta_2}) decoder are used. The model generates new samples ((x_{samp})) by drawing from the latent space conditioned on the conditioned data, accurately modeling the true posterior (p(x|y)).
This architecture is particularly beneficial for problems where the data , and the goal is to extract clean, conditioned data. The dual encoder approach effectively from the signal, resulting in improved reconstruction and generation of data samples during testing.
We can then introduce the latent loss to measure their difference. For Loss it is the KL divergence loss between the Gaussian distributions generated by the two encoders. On the other hand, the reconstruction loss is the MSE between the x and reconstructed x and also between y and reconstructed y
Component | Number of CNN Layers | Number of Linear Layers | Activation Functions Used |
---|---|---|---|
Encoder1 | 2 | 2 | ReLu |
Encoder2 | 0 | 2 | - |
Decoder | 2 | 1 | LeakyRelu |
Refer to CVAE_example.py for model code and structure names
If you want to know more about the structure , you can go through this reaserch paper
The following table provides a comparison of four types of loss across our three models with different modicfications:
For Data_2:
Results | Reconstruction Error 0 | Reconstruction Error 1 | Linearity Score 0 | Linearity Score 1 |
---|---|---|---|---|
1 | 1.0055453777313232 | 0.7383081316947937 | 0.5882912775095918 | 0.46959004066297066 |
2 | 1.00575852394104 | 0.7738876342773438 | 0.9033710532064656 | 0.7145611655175658 |
3 | 1.0112000703811646 | 0.7002468109130859 | 0.5656934694989804 | 0.6126348065565795 |
4 | 1.0123666524887085 | 0.7670043706893921 | 0.5854454046035665 | 0.5289340156171182 |
5 | 1.0218342542648315 | 0.7545475363731384 | 0.5626903763526705 | 0.5523329465236612 |
For Data_1:
Model Name | Reconstruction Error 0 | Reconstruction Error 1 | Linearity Score 0 | Linearity Score 1 |
---|---|---|---|---|
1 | 0.9056908488273621 | 0.9132744073867798 | 0.43599866281478555 | 0.3757868429697992 |
2 | 0.9499203562736511 | 0.9159829020500183 | 0.4400628508703995 | 0.3077663393314897 |
3 | 0.960154116153717 | 0.9249348044395447 | 0.3784102558394344 | 0.35162808539496276 |
4 | 0.9263702034950256 | 0.9233485460281372 | 0.4437989663112639 | 0.34744446804415535 |
5 | 0.8241713047027588 | 0.8532126545906067 | 0.48042877309334925 | 0.33161642855826046 |
Note: You can see more results file in results folder as i have tried many modifications in thses models but i cannot include so I just included these standard models but i would advise to go through notebook.
- Python
- Pytorch
- NumPy
- matplotlib
-
CVAE_example: Contains code for the different models. Usage example:
-
CVAE_functions: contains code for training, testing and loss functions
Some of the findings that improved my model are-
-
Concatenating the input and condition data in the encoder part improved my scores and just after that changing the activation function to leaky relu was a great choice in the every model.
-
Adding one more layer of convulution also improved my scores but not that much and to avoid overfitting i just batch normalisation for each layer.
-
I was just cross entropy loss for y but found out later that it used for classification problems but it worked better than other losses.
This was tough and new kind of problem for me as it was related to geneartive ai and through this journey i got to many insights towards how autoencoders work or particulary how Conditional variational autoencoders work at generating samples and there were many hurdles towards how to make this project a better performer so I am gonna share a few and how i solved it -
-
the first problem was with tuning of hyperparameters and I was using optuna for this but i was not getting good results with the optuna so then i thought to drop the idea of hyperparameter tuning and move towards upgrading the model structure of given model and thought i would be more relelvant.
-
I just modify the given model by adding leaky relu and it greatly reduces the overall loss of the model and secondly i also added condition to the encoder part as it was firstly not passing through encoder.
-
I also modify the given structure by adding one more convulution layer and you can see this code in CVAE_example.py
-
I took on to build a model like one from a research paper, which had two encoders and one decoder. This task was very hard but exciting. I changed the KLD loss to include the loss between the latent representations. Making this model took a lot of time and effort because it was complicated. In the end, I was able to create it, and I learned a lot from the experience but it is a baseline model and i can still make it more better and I think so It would be the best model across all models if we would more modifications in this model