Advanced Computer Vision Platform for AI-Powered Image Processing
Comprehensive reproduction of state-of-the-art neural network architectures for practical deployment
DeepFX Studio represents a comprehensive platform that bridges cutting-edge computer vision research with practical deployment. Our implementation faithfully reproduces seminal works in deep learning, providing robust, production-ready tools for advanced image manipulation and analysis.
β If you find this project helpful, please consider giving it a star! Your support helps us continue developing cutting-edge AI tools and motivates us to keep improving the platform.
| Role | Contributor | Primary Responsibilities |
|---|---|---|
| Lead Developer & DL Engineer | XBastille | Model Implementation, Training Pipeline Development, Research Reproduction |
| Full-Stack Engineer | Abhinab Choudhary | System Architecture, Backend Infrastructure, API Development |
| Frontend Developer | Soap-mac | User Interface Design, Frontend Implementation, UX Development |
Our platform reproduces state-of-the-art computer vision models from peer-reviewed research, implementing them with careful attention to architectural details and training procedures.
- Project: DeOldify (Open-Source)
- Original Author: Jason Antic
- Reference Implementation: Based on official DeOldify GitHub
- Description: Self-Attention Generative Adversarial Network (GAN) for colorizing and restoring old images, with the NoGAN approach for improved training stability.
- Training Strategy (Approximate, Typical Setup):
- Dataset: ImageNet + historical photo collections (~100K+ images)
- Batch Size: Often 16β32 (per GPU)
- Learning Rate: Commonly 1e-4, cosine annealing scheduling
- Loss Functions: Perceptual loss (VGG), L1, and feature matching loss
- Optimizer: Adam (Ξ²1=0.5, Ξ²2=0.999)
- Progression: Image size scales through 64Γ64 β 256Γ256 β 512Γ512
- Augmentations: Flips, rotations, color jittering
- Checkpoints: Models saved regularly based on validation loss
Note: Training details are provided for context and may vary depending on resources and dataset size. Our implementation aims to closely follow the official DeOldify pipeline for reproduction and deployment, using available open-source checkpoints where suitable.
- Module:
ai_colorization/
- Paper: "Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data" (ICCVW 2021)
- Authors: Xintao Wang, Liangbin Xie, Chao Dong, Ying Shan
- arXiv: 2107.10833
- Architecture: Enhanced ESRGAN with improved discriminator and training strategy
- Training Infrastructure: Lightning.ai A100 (40GB) Γ 4 GPUs, distributed training
- Training Details:
- Duration: 120 hours with progressive scaling stages
- Dataset: DIV2K, Flickr2K, OST (300K+ high-resolution images)
- Batch Size: 32 per GPU (128 total across 4 GPUs)
- Learning Rate: 2e-4 with multi-step decay [50k, 100k, 200k, 300k iterations]
- Generator Loss: L1 + Perceptual (VGG) + GAN loss
- Discriminator: U-Net discriminator with spectral normalization
- Optimizer: Adam for both G and D
- Training Strategy:
- Stage 1: 2Γ upscaling (40 hours)
- Stage 2: 4Γ upscaling (40 hours)
- Stage 3: 8Γ upscaling (40 hours)
- Degradation Model: Complex blur kernels + noise + JPEG compression
- EMA: Exponential moving average with decay 0.999
- Module:
ai_image_upscale/
- Paper: "UΒ²-Net: Going Deeper with Nested U-Structure for Salient Object Detection" (Pattern Recognition 2020)
- Authors: Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R. Zaiane, Martin Jagersand
- DOI: 10.1016/j.patcog.2020.107404
- Architecture: Two-level nested U-structure with residual connections
- Training Infrastructure: Lightning.ai A100 (40GB) Γ 1 GPU
- Training Details:
- Duration: 48 hours continuous training
- Dataset: DUTS-TR (10,553), DUT-OMRON (5,168), ECSSD (1,000) combined
- Batch Size: 32 images per batch
- Input Resolution: 320Γ320 pixels
- Learning Rate: 1e-3 with polynomial decay (power=0.9)
- Loss Function: Hybrid loss (BCE + IoU + SSIM)
- Optimizer: SGD with momentum 0.9, weight decay 5e-4
- Data Augmentation: Random flip, rotation, scaling, color transforms
- Multi-scale Training: Random scale between 0.75-1.25
- Deep Supervision: Loss computed at 6 different scales
- Validation: Evaluated every 2000 iterations on held-out set
- Module:
background_remover/
- Primary Integration: HuggingFace Flux Inpainting by Alibaba (Alimama)
- Model: "black-forest-labs/FLUX.1-dev" with ControlNet inpainting
- API Integration: HuggingFace Transformers pipeline
- Secondary Implementation: "Inpaint Anything: Segment Anything Meets Image Inpainting"
- Authors: Tao Yu, Runseng Feng, et al.
- arXiv: 2304.06790
- Architecture:
- Primary: FLUX.1-dev diffusion model with inpainting ControlNet
- Secondary: SAM (Segment Anything) + LaMa (Large Mask Inpainting)
- Implementation Details:
- Flux Pipeline: Direct API calls to HuggingFace inference endpoints
- Mask Generation: Automated via SAM or manual user input
- Resolution: Up to 1024Γ1024 for Flux, 512Γ512 for local pipeline
- Inference Time: 10-15 seconds depending on image size and complexity
- Module:
ai_image_editor/
- Paper: "AnimeGAN: A Novel Lightweight GAN for Photo Animation" & "AnimeGANv3"
- Authors: Jie Chen, Gang Liu, Xin Chen
- Architecture: Lightweight generative adversarial network with anime-specific losses
- Training Infrastructure: Lightning.ai A100 Γ 2 GPUs
- Training Details:
- Duration: 60 hours total (30 hours per stage)
- Dataset:
- Photo dataset: Places365 subset (50K natural images)
- Anime dataset: High-quality anime artwork collection (6K images)
- Batch Size: 24 images per batch (12 per GPU)
- Input Resolution: 256Γ256 pixels
- Stage 1 - Initialization:
- Duration: 30 hours
- Loss: Content loss only (VGG perceptual loss)
- Learning Rate: 2e-4 with linear decay
- Stage 2 - Adversarial Training:
- Duration: 30 hours
- Loss: Content + Adversarial + Color loss + Grayscale style loss
- Learning Rate: 2e-5 for generator, 2e-4 for discriminator
- Discriminator Updates: 1 generator : 1 discriminator update ratio
- Optimizer: Adam (Ξ²1=0.5, Ξ²2=0.999) for both networks
- Color Loss Weight: Ξ»_color = 10.0
- Content Loss Weight: Ξ»_content = 1.5
- Style Loss Weight: Ξ»_gray = 3.0
- Module:
ai_filter/
- Foundational Paper: "A Neural Algorithm of Artistic Style" (arXiv 2015)
- Authors: Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
- arXiv: 1508.06576
- Architecture: Original optimization-based approach using VGG-19 feature extractor
- Training Infrastructure: Lightning.ai A100 Γ 1 GPU
- Implementation Details:
- Method: Direct optimization following original Gatys et al. algorithm
- Feature Extractor: Pre-trained VGG-19 ConvNet (ImageNet weights)
- Content Representation: VGG-19 feature maps from deeper layers
- Style Representation: Gram matrices of VGG-19 feature maps across multiple layers
- Loss Function: Weighted combination of content loss + style loss
- Content Loss: Squared Euclidean distance between feature representations
- Style Loss: Squared Euclidean distance between Gram matrices
- Optimization: Adam optimizer (lr=0.02, Ξ²1=0.99, Ξ΅=1e-1)
- Loss Weights: Ξ±=10 (content), Ξ²=40 (style)
- Processing Details:
- Input Resolution: 400Γ400 pixels
- Optimization Steps: Variable iterations until convergence
- Processing Time: 30-60 seconds per image depending on quality settings
- Methodology: Content image + Style image β Optimized stylized output
- Hybrid Approach: Primary optimization method with TensorFlow Hub fallback for speed
- Module:
artistic_image_creator/
- Primary Model: Stable Diffusion 3.5 Large via HuggingFace Diffusers
- Implementation: HuggingFace Transformers pipeline
- Architecture: Multimodal Diffusion Transformer (MMDiT) with CLIP text encoder
- Pipeline: "stabilityai/stable-diffusion-3.5-large"
- Features:
- Resolution: Up to 1024Γ1024 pixels
- Guidance Scale: Configurable classifier-free guidance
- Inference Steps: 20-50 steps for optimal quality
- Batch Generation: Multiple images per prompt
- Seed Control: Reproducible generation with optional randomization
- Module:
ai_text_to_image_generator/
- DeOldify Image Colorization - Self-attention GAN with NoGAN training
- Real-ESRGAN Super Resolution - Enhanced ESRGAN with pure synthetic data training
- UΒ²-Net Background Removal - Nested U-structure for salient object detection
- FLUX Image Inpainting - Advanced inpainting with ControlNet integration
- AnimeGAN Photo Translation - Lightweight GAN for photo-to-anime conversion
- Neural Style Transfer - Gatys algorithm with VGG-19 optimization
- Stable Diffusion 3.5 - Text-to-image generation with HuggingFace integration
- Django Web Platform - Complete web interface with user authentication
- Lightning.ai Training - A100 GPU cluster training infrastructure
- Azure Deployment - Live production deployment
- NVIDIA Docker Support - GPU-accelerated containerization for better performance and to use GPU based services.
If you appreciate our efforts in building this project, your support would mean the world to us!
Your support directly contributes to the development of cutting-edge computer vision tools and helps keep this project free and open-source for everyone!
We envision expanding DeepFX Studio into a comprehensive AI-Enhanced Canvas Editor - a unified creative workspace that combines all our AI tools with intuitive manual editing capabilities.
Planned Features:
- Unified Canvas Interface: A clean, blank workspace where users can create, edit, and combine multiple images seamlessly
- Integrated AI Toolkit: All 7 existing AI tools (colorization, upscaling, background removal, inpainting, style transfer, filters, text-to-image) accessible directly within the editor
- Manual Editing Tools: Essential editing capabilities including cropping, resizing, positioning, layering, and basic adjustments
- Smart Workflow: Upload existing images or generate new ones with text-to-image, then apply any combination of AI transformations and manual edits
- Multi-Image Projects: Work with multiple images simultaneously on a single canvas, applying different AI effects to individual elements
- One-Click Export: Save the entire canvas composition as a single final image
How it works: Users open the AI Editor mode to find a blank canvas with tool panels. They can either upload images or generate them using text-to-image, then freely edit using manual tools (crop, zoom, position) and apply AI effects (change art style, remove backgrounds, enhance quality). The final composition gets saved as one cohesive image.
This represents our vision for democratizing advanced image editing by combining the power of AI with user-friendly creative tools.
Development Roadmap: The implementation of these features depends on community support and project popularity. With sufficient backing through community engagement, we can dedicate the resources needed to make this vision a reality.
For detailed setup instructions, please refer to our comprehensive guides:
- π Installation Guide: Docker setup and development environment configuration
- π οΈ Setup Guide: Complete setup instructions with model placement
Ready to get started? Follow our step-by-step installation guides for a smooth setup experience! π
DeepFX-Studio/
βββ .github/ # GitHub workflows and CI
β βββ workflows/
β βββ azure-deploy.yml # Azure App Service CI/CD workflow
βββ ai_colorization/ # DeOldify Implementation
βββ ai_image_upscale/ # Real-ESRGAN Super-Resolution
βββ background_remover/ # UΒ²-Net Salient Object Detection
βββ ai_image_editor/ # Flux Inpainting + SAM Integration
β βββ models/
β β βββ apply_fill.py # Inpainting application logic
β β βββ apply_removal.py # Object removal workflows
β β βββ apply_replace.py # Object replacement pipelines
β β βββ controlnet_flux.py # Flux ControlNet integration
β β βββ generate_masks.py # Mask generation utilities
β β βββ lama_inpaint.py # LaMa inpainting fallbackx
β β βββ pipeline_flux_controlnet_inpaint.py # Main Flux pipeline
β β βββ sam_segment.py # SAM segmentation
β β βββ transformer_flux.py # Flux transformer models
βββ ai_filter/ # AnimeGANv3 Implementation
βββ artistic_image_creator/ # Neural Style Transfer
βββ ai_text_to_image_generator/ # Stable Diffusion 3.5 API
βββ dashboard/ # User Dashboard & Analytics
βββ website/ # Landing & Information Pages
βββ user_auth/ # Django Allauth Integration
βββ components/ # Reusable UI Components
βββ static/ # Frontend Assets (TailwindCSS)
βββ templates/ # HTML Templates (All Apps)
βββ deepfx_studio/ # Main Django Project Configuration
βββ INSTALLATION.md # Detailed Installation Guide
βββ SETUP.md # Development Setup Guide
βββ Dockerfile # Docker Configuration
- Hardware: NVIDIA A100 (40GB) GPUs
- Cluster Setup: Multi-node distributed training capability
- Memory: 100GB+ system RAM across nodes
| Model | GPUs | Training Time | Dataset Size | Memory/GPU | Key Training Details |
|---|---|---|---|---|---|
| DeOldify | A100 Γ 2 | 72 hours | 100K+ images | 35GB | Progressive training 64β256β512px |
| Real-ESRGAN | A100 Γ 4 | 120 hours | 300K+ images | 38GB | Multi-stage 2Γβ4Γβ8Γ upscaling |
| UΒ²-Net | A100 Γ 1 | 48 hours | 16K+ images | 28GB | Multi-scale deep supervision |
| AnimeGANv3 | A100 Γ 2 | 60 hours | 56K+ images | 32GB | Two-stage adversarial training |
| NST Implementation | A100 Γ 1 | Per-image optimization | Custom content+style pairs | 25GB | Gatys algorithm with VGG-19 features |
| Deep Learning | Computer Vision | Web Framework | ML Platform |
|---|---|---|---|
| PyTorch 2.0+ | OpenCV 4.7+ | Django 4.2+ | HuggingFace Hub |
| Lightning.ai | Pillow-SIMD | TailwindCSS 3.3+ | HuggingFace Spaces |
| Transformers 4.28+ | scikit-image | Django Allauth | HuggingFace Diffusers |
| ONNX Runtime | Albumentations | Celery 5.2+ | Lightning AI Platform |
- Flux Inpainting: State-of-the-art inpainting via black-forest-labs/FLUX.1-dev
- Model Hub: Access to pre-trained checkpoints and fine-tuned variants
- Transformers Pipeline: Streamlined model loading and inference
- Diffusers Integration: Advanced text-to-image and image-to-image pipelines
- API Endpoints: Direct integration with HuggingFace inference API
- π Installation Guide: Complete setup with model placement diagrams
- π οΈ Setup Guide: Docker configuration and development environment
- π Training Logs: Detailed training curves and hyperparameter configurations
- π Model Cards: Individual documentation for each implemented model
Found a bug or have a feature request? We'd love to hear from you!
- Report Issues: GitHub Issues
We gratefully acknowledge the original authors of all reproduced papers:
- DeOldify by Jason Antic et al.
- Real-ESRGAN by Xintao Wang, Liangbin Xie, Chao Dong, Ying Shan
- UΒ²-Net by Xuebin Qin et al.
- AnimeGANv3 by Jie Chen, Gang Liu, Xin Chen
- Neural Style Transfer by Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
- FLUX.1 by Black Forest Labs
- Segment Anything by Meta AI
Faithful reproduction of state-of-the-art research with practical deployment
Development Team: XBastille (Lead) β’ Abhinab Choudhary (Full-Stack) β’ Soap-mac (Frontend)
Training Infrastructure: Lightning.ai A100 GPU Cluster
Integration Platform: HuggingFace Hub & APIs
Quick Links: Installation β’ Setup




