BioGAN is a novel generative framework that integrates Graph Neural Networks (GNNs) into a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) to generate biologically realistic synthetic transcriptomics data. The model leverages known gene-gene relationships to guide the generation process, aiming to improve the biological plausibility, fairness, and privacy of synthetic omics data.
This project accompanies the manuscript:
"BioGAN: Enhancing Transcriptomic Data Generation with Biological Knowledges" β Francesca Pia Panaccione, Sofia Mongardi, Marco Masseroli and Pietro Pinoli (under review)
Modern AI tools in genomics are limited by data scarcity, heterogeneity, acquisition costs, and privacy regulations (e.g., GDPR, HIPAA). BioGAN addresses these issues by:
- Synthesizing realistic transcriptomic profiles conditioned on biological structure.
- Incorporating prior biological knowledge through GNNs to improve fidelity.
- Providing a versatile architecture adaptable to various omics settings.
- Graph-aware generator: Integrates GNN layers to model regulatory interactions.
- High-dimensional support: Designed for transcriptomic data with tens of thousands of features.
- Robust validation: Multi-faceted evaluation pipeline including:
- Classification accuracy on downstream tasks
- Distributional similarity metrics (e.g., Wasserstein distance)
- Feature-level consistency checks
src/
βββ BIOGAN_H2GCN/ # Experimental variant of BioGAN using H2GCN architecture
βββ BioGAN_GCN/ # Core BioGAN model implementation based on GCN
βββ wpgan/ # Implementation of Wasserstein GAN with Gradient Penalty
βββ metrics/ # Evaluation metrics for synthetic data quality
βββ utils/ # Utility functions (e.g., graph processing, logging)
βββ data_loader.py # Scripts for preprocessing transcriptomic data
βββ losses.py # Custom loss functions (Wasserstein, gradient penalty, etc.)
βββ train_model.py # Main training script for the BioGAN model