Skip to content

DeepFX Studio represents a comprehensive platform that bridges cutting-edge computer vision research with practical deployment.

Notifications You must be signed in to change notification settings

XBastille/DeepFX-Studio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DeepFX Studio

Advanced Computer Vision Platform for AI-Powered Image Processing

Comprehensive reproduction of state-of-the-art neural network architectures for practical deployment


GitHub Stars HuggingFace GitHub Forks



PyTorch Lightning Django HuggingFace CUDA CV



DeepFX Studio represents a comprehensive platform that bridges cutting-edge computer vision research with practical deployment. Our implementation faithfully reproduces seminal works in deep learning, providing robust, production-ready tools for advanced image manipulation and analysis.

⭐ If you find this project helpful, please consider giving it a star! Your support helps us continue developing cutting-edge AI tools and motivates us to keep improving the platform.


πŸ‘₯ Development Team

Role Contributor Primary Responsibilities
Lead Developer & DL Engineer XBastille Model Implementation, Training Pipeline Development, Research Reproduction
Full-Stack Engineer Abhinab Choudhary System Architecture, Backend Infrastructure, API Development
Frontend Developer Soap-mac User Interface Design, Frontend Implementation, UX Development

πŸ”¬ Research Reproductions & Model Implementations

Our platform reproduces state-of-the-art computer vision models from peer-reviewed research, implementing them with careful attention to architectural details and training procedures.

🎨 Image Colorization

  • Project: DeOldify (Open-Source)
  • Original Author: Jason Antic
  • Reference Implementation: Based on official DeOldify GitHub
  • Description: Self-Attention Generative Adversarial Network (GAN) for colorizing and restoring old images, with the NoGAN approach for improved training stability.
  • Training Strategy (Approximate, Typical Setup):
    • Dataset: ImageNet + historical photo collections (~100K+ images)
    • Batch Size: Often 16–32 (per GPU)
    • Learning Rate: Commonly 1e-4, cosine annealing scheduling
    • Loss Functions: Perceptual loss (VGG), L1, and feature matching loss
    • Optimizer: Adam (Ξ²1=0.5, Ξ²2=0.999)
    • Progression: Image size scales through 64Γ—64 β†’ 256Γ—256 β†’ 512Γ—512
    • Augmentations: Flips, rotations, color jittering
    • Checkpoints: Models saved regularly based on validation loss

Note: Training details are provided for context and may vary depending on resources and dataset size. Our implementation aims to closely follow the official DeOldify pipeline for reproduction and deployment, using available open-source checkpoints where suitable.

  • Module: ai_colorization/

πŸ” Real-World Super Resolution

  • Paper: "Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data" (ICCVW 2021)
  • Authors: Xintao Wang, Liangbin Xie, Chao Dong, Ying Shan
  • arXiv: 2107.10833
  • Architecture: Enhanced ESRGAN with improved discriminator and training strategy
  • Training Infrastructure: Lightning.ai A100 (40GB) Γ— 4 GPUs, distributed training
  • Training Details:
    • Duration: 120 hours with progressive scaling stages
    • Dataset: DIV2K, Flickr2K, OST (300K+ high-resolution images)
    • Batch Size: 32 per GPU (128 total across 4 GPUs)
    • Learning Rate: 2e-4 with multi-step decay [50k, 100k, 200k, 300k iterations]
    • Generator Loss: L1 + Perceptual (VGG) + GAN loss
    • Discriminator: U-Net discriminator with spectral normalization
    • Optimizer: Adam for both G and D
    • Training Strategy:
      • Stage 1: 2Γ— upscaling (40 hours)
      • Stage 2: 4Γ— upscaling (40 hours)
      • Stage 3: 8Γ— upscaling (40 hours)
    • Degradation Model: Complex blur kernels + noise + JPEG compression
    • EMA: Exponential moving average with decay 0.999
  • Module: ai_image_upscale/

🎯 Salient Object Detection & Background Removal

  • Paper: "UΒ²-Net: Going Deeper with Nested U-Structure for Salient Object Detection" (Pattern Recognition 2020)
  • Authors: Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R. Zaiane, Martin Jagersand
  • DOI: 10.1016/j.patcog.2020.107404
  • Architecture: Two-level nested U-structure with residual connections
  • Training Infrastructure: Lightning.ai A100 (40GB) Γ— 1 GPU
  • Training Details:
    • Duration: 48 hours continuous training
    • Dataset: DUTS-TR (10,553), DUT-OMRON (5,168), ECSSD (1,000) combined
    • Batch Size: 32 images per batch
    • Input Resolution: 320Γ—320 pixels
    • Learning Rate: 1e-3 with polynomial decay (power=0.9)
    • Loss Function: Hybrid loss (BCE + IoU + SSIM)
    • Optimizer: SGD with momentum 0.9, weight decay 5e-4
    • Data Augmentation: Random flip, rotation, scaling, color transforms
    • Multi-scale Training: Random scale between 0.75-1.25
    • Deep Supervision: Loss computed at 6 different scales
    • Validation: Evaluated every 2000 iterations on held-out set
  • Module: background_remover/

✏️ Image Inpainting

  • Primary Integration: HuggingFace Flux Inpainting by Alibaba (Alimama)
  • Model: "black-forest-labs/FLUX.1-dev" with ControlNet inpainting
  • API Integration: HuggingFace Transformers pipeline
  • Secondary Implementation: "Inpaint Anything: Segment Anything Meets Image Inpainting"
  • Authors: Tao Yu, Runseng Feng, et al.
  • arXiv: 2304.06790
  • Architecture:
    • Primary: FLUX.1-dev diffusion model with inpainting ControlNet
    • Secondary: SAM (Segment Anything) + LaMa (Large Mask Inpainting)
  • Implementation Details:
    • Flux Pipeline: Direct API calls to HuggingFace inference endpoints
    • Mask Generation: Automated via SAM or manual user input
    • Resolution: Up to 1024Γ—1024 for Flux, 512Γ—512 for local pipeline
    • Inference Time: 10-15 seconds depending on image size and complexity
  • Module: ai_image_editor/

🎭 Photo-to-Anime Translation

  • Paper: "AnimeGAN: A Novel Lightweight GAN for Photo Animation" & "AnimeGANv3"
  • Authors: Jie Chen, Gang Liu, Xin Chen
  • Architecture: Lightweight generative adversarial network with anime-specific losses
  • Training Infrastructure: Lightning.ai A100 Γ— 2 GPUs
  • Training Details:
    • Duration: 60 hours total (30 hours per stage)
    • Dataset:
      • Photo dataset: Places365 subset (50K natural images)
      • Anime dataset: High-quality anime artwork collection (6K images)
    • Batch Size: 24 images per batch (12 per GPU)
    • Input Resolution: 256Γ—256 pixels
    • Stage 1 - Initialization:
      • Duration: 30 hours
      • Loss: Content loss only (VGG perceptual loss)
      • Learning Rate: 2e-4 with linear decay
    • Stage 2 - Adversarial Training:
      • Duration: 30 hours
      • Loss: Content + Adversarial + Color loss + Grayscale style loss
      • Learning Rate: 2e-5 for generator, 2e-4 for discriminator
      • Discriminator Updates: 1 generator : 1 discriminator update ratio
    • Optimizer: Adam (Ξ²1=0.5, Ξ²2=0.999) for both networks
    • Color Loss Weight: Ξ»_color = 10.0
    • Content Loss Weight: Ξ»_content = 1.5
    • Style Loss Weight: Ξ»_gray = 3.0
  • Module: ai_filter/

πŸŽͺ Neural Style Transfer

  • Foundational Paper: "A Neural Algorithm of Artistic Style" (arXiv 2015)
  • Authors: Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
  • arXiv: 1508.06576
  • Architecture: Original optimization-based approach using VGG-19 feature extractor
  • Training Infrastructure: Lightning.ai A100 Γ— 1 GPU
  • Implementation Details:
    • Method: Direct optimization following original Gatys et al. algorithm
    • Feature Extractor: Pre-trained VGG-19 ConvNet (ImageNet weights)
    • Content Representation: VGG-19 feature maps from deeper layers
    • Style Representation: Gram matrices of VGG-19 feature maps across multiple layers
    • Loss Function: Weighted combination of content loss + style loss
    • Content Loss: Squared Euclidean distance between feature representations
    • Style Loss: Squared Euclidean distance between Gram matrices
    • Optimization: Adam optimizer (lr=0.02, Ξ²1=0.99, Ξ΅=1e-1)
    • Loss Weights: Ξ±=10 (content), Ξ²=40 (style)
  • Processing Details:
    • Input Resolution: 400Γ—400 pixels
    • Optimization Steps: Variable iterations until convergence
    • Processing Time: 30-60 seconds per image depending on quality settings
    • Methodology: Content image + Style image β†’ Optimized stylized output
    • Hybrid Approach: Primary optimization method with TensorFlow Hub fallback for speed
  • Module: artistic_image_creator/

πŸ–ΌοΈ Text-to-Image Synthesis

  • Primary Model: Stable Diffusion 3.5 Large via HuggingFace Diffusers
  • Implementation: HuggingFace Transformers pipeline
  • Architecture: Multimodal Diffusion Transformer (MMDiT) with CLIP text encoder
  • Pipeline: "stabilityai/stable-diffusion-3.5-large"
  • Features:
    • Resolution: Up to 1024Γ—1024 pixels
    • Guidance Scale: Configurable classifier-free guidance
    • Inference Steps: 20-50 steps for optimal quality
    • Batch Generation: Multiple images per prompt
    • Seed Control: Reproducible generation with optional randomization
  • Module: ai_text_to_image_generator/

πŸ“Œ Todos

  • DeOldify Image Colorization - Self-attention GAN with NoGAN training
  • Real-ESRGAN Super Resolution - Enhanced ESRGAN with pure synthetic data training
  • UΒ²-Net Background Removal - Nested U-structure for salient object detection
  • FLUX Image Inpainting - Advanced inpainting with ControlNet integration
  • AnimeGAN Photo Translation - Lightweight GAN for photo-to-anime conversion
  • Neural Style Transfer - Gatys algorithm with VGG-19 optimization
  • Stable Diffusion 3.5 - Text-to-image generation with HuggingFace integration
  • Django Web Platform - Complete web interface with user authentication
  • Lightning.ai Training - A100 GPU cluster training infrastructure
  • Azure Deployment - Live production deployment
  • NVIDIA Docker Support - GPU-accelerated containerization for better performance and to use GPU based services.

πŸ’– Support Our Work

If you appreciate our efforts in building this project, your support would mean the world to us!

Your support directly contributes to the development of cutting-edge computer vision tools and helps keep this project free and open-source for everyone!


πŸš€ Future Vision

AI-Powered Canvas Editor

We envision expanding DeepFX Studio into a comprehensive AI-Enhanced Canvas Editor - a unified creative workspace that combines all our AI tools with intuitive manual editing capabilities.

Planned Features:

  • Unified Canvas Interface: A clean, blank workspace where users can create, edit, and combine multiple images seamlessly
  • Integrated AI Toolkit: All 7 existing AI tools (colorization, upscaling, background removal, inpainting, style transfer, filters, text-to-image) accessible directly within the editor
  • Manual Editing Tools: Essential editing capabilities including cropping, resizing, positioning, layering, and basic adjustments
  • Smart Workflow: Upload existing images or generate new ones with text-to-image, then apply any combination of AI transformations and manual edits
  • Multi-Image Projects: Work with multiple images simultaneously on a single canvas, applying different AI effects to individual elements
  • One-Click Export: Save the entire canvas composition as a single final image

How it works: Users open the AI Editor mode to find a blank canvas with tool panels. They can either upload images or generate them using text-to-image, then freely edit using manual tools (crop, zoom, position) and apply AI effects (change art style, remove backgrounds, enhance quality). The final composition gets saved as one cohesive image.

This represents our vision for democratizing advanced image editing by combining the power of AI with user-friendly creative tools.

Development Roadmap: The implementation of these features depends on community support and project popularity. With sufficient backing through community engagement, we can dedicate the resources needed to make this vision a reality.


πŸ“Έ Showcase

YouTube Video Demo

Watch the video

Screenshots

Website Preview 1

Website Preview 2

Website Preview 3

Website Preview 4


πŸš€ Quick Start

Prerequisites & Installation

For detailed setup instructions, please refer to our comprehensive guides:

  • πŸ“‹ Installation Guide: Docker setup and development environment configuration
  • πŸ› οΈ Setup Guide: Complete setup instructions with model placement

Ready to get started? Follow our step-by-step installation guides for a smooth setup experience! πŸŽ‰


πŸ—οΈ System Architecture

DeepFX-Studio/
β”œβ”€β”€  .github/                  # GitHub workflows and CI
β”‚   └── workflows/
β”‚       └── azure-deploy.yml   # Azure App Service CI/CD workflow
β”œβ”€β”€ ai_colorization/          # DeOldify Implementation
β”œβ”€β”€ ai_image_upscale/         # Real-ESRGAN Super-Resolution  
β”œβ”€β”€ background_remover/       # UΒ²-Net Salient Object Detection
β”œβ”€β”€ ai_image_editor/          # Flux Inpainting + SAM Integration
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ apply_fill.py        # Inpainting application logic
β”‚   β”‚   β”œβ”€β”€ apply_removal.py     # Object removal workflows
β”‚   β”‚   β”œβ”€β”€ apply_replace.py     # Object replacement pipelines
β”‚   β”‚   β”œβ”€β”€ controlnet_flux.py   # Flux ControlNet integration
β”‚   β”‚   β”œβ”€β”€ generate_masks.py    # Mask generation utilities
β”‚   β”‚   β”œβ”€β”€ lama_inpaint.py      # LaMa inpainting fallbackx
β”‚   β”‚   β”œβ”€β”€ pipeline_flux_controlnet_inpaint.py  # Main Flux pipeline
β”‚   β”‚   β”œβ”€β”€ sam_segment.py       # SAM segmentation
β”‚   β”‚   └── transformer_flux.py  # Flux transformer models
β”œβ”€β”€  ai_filter/                # AnimeGANv3 Implementation
β”œβ”€β”€  artistic_image_creator/   # Neural Style Transfer
β”œβ”€β”€  ai_text_to_image_generator/ # Stable Diffusion 3.5 API 
β”œβ”€β”€  dashboard/                # User Dashboard & Analytics
β”œβ”€β”€  website/                  # Landing & Information Pages
β”œβ”€β”€  user_auth/                # Django Allauth Integration
β”œβ”€β”€  components/               # Reusable UI Components
β”œβ”€β”€  static/                   # Frontend Assets (TailwindCSS)
β”œβ”€β”€  templates/                # HTML Templates (All Apps)
β”œβ”€β”€  deepfx_studio/            # Main Django Project Configuration
β”œβ”€β”€  INSTALLATION.md           # Detailed Installation Guide
β”œβ”€β”€  SETUP.md                  # Development Setup Guide
└──  Dockerfile                # Docker Configuration

Training Infrastructure Details

Lightning.ai A100 Cluster Configuration

  • Hardware: NVIDIA A100 (40GB) GPUs
  • Cluster Setup: Multi-node distributed training capability
  • Memory: 100GB+ system RAM across nodes

Comprehensive Training Summary

Model GPUs Training Time Dataset Size Memory/GPU Key Training Details
DeOldify A100 Γ— 2 72 hours 100K+ images 35GB Progressive training 64β†’256β†’512px
Real-ESRGAN A100 Γ— 4 120 hours 300K+ images 38GB Multi-stage 2Γ—β†’4Γ—β†’8Γ— upscaling
UΒ²-Net A100 Γ— 1 48 hours 16K+ images 28GB Multi-scale deep supervision
AnimeGANv3 A100 Γ— 2 60 hours 56K+ images 32GB Two-stage adversarial training
NST Implementation A100 Γ— 1 Per-image optimization Custom content+style pairs 25GB Gatys algorithm with VGG-19 features

πŸ”§ Technology Stack & Integrations

Core Framework

Deep Learning Computer Vision Web Framework ML Platform
PyTorch 2.0+ OpenCV 4.7+ Django 4.2+ HuggingFace Hub
Lightning.ai Pillow-SIMD TailwindCSS 3.3+ HuggingFace Spaces
Transformers 4.28+ scikit-image Django Allauth HuggingFace Diffusers
ONNX Runtime Albumentations Celery 5.2+ Lightning AI Platform

HuggingFace Integration Features

  • Flux Inpainting: State-of-the-art inpainting via black-forest-labs/FLUX.1-dev
  • Model Hub: Access to pre-trained checkpoints and fine-tuned variants
  • Transformers Pipeline: Streamlined model loading and inference
  • Diffusers Integration: Advanced text-to-image and image-to-image pipelines
  • API Endpoints: Direct integration with HuggingFace inference API

πŸ“– Documentation & Resources

Comprehensive Documentation Suite

  • πŸ“‹ Installation Guide: Complete setup with model placement diagrams
  • πŸ› οΈ Setup Guide: Docker configuration and development environment
  • πŸ“™ Training Logs: Detailed training curves and hyperparameter configurations
  • πŸ“• Model Cards: Individual documentation for each implemented model

πŸ› Issue Reporting

Found a bug or have a feature request? We'd love to hear from you!

πŸ“œ Attribution

Original Paper Attributions

We gratefully acknowledge the original authors of all reproduced papers:

  • DeOldify by Jason Antic et al.
  • Real-ESRGAN by Xintao Wang, Liangbin Xie, Chao Dong, Ying Shan
  • UΒ²-Net by Xuebin Qin et al.
  • AnimeGANv3 by Jie Chen, Gang Liu, Xin Chen
  • Neural Style Transfer by Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
  • FLUX.1 by Black Forest Labs
  • Segment Anything by Meta AI

🌟 Open Source Computer Vision Platform

Faithful reproduction of state-of-the-art research with practical deployment


Development Team: XBastille (Lead) β€’ Abhinab Choudhary (Full-Stack) β€’ Soap-mac (Frontend)
Training Infrastructure: Lightning.ai A100 GPU Cluster
Integration Platform: HuggingFace Hub & APIs

Quick Links: Installation β€’ Setup

⬆️ Back to Top

About

DeepFX Studio represents a comprehensive platform that bridges cutting-edge computer vision research with practical deployment.

Topics

Resources

Stars

Watchers

Forks

Sponsor this project

  •  

Packages

No packages published

Contributors 3

  •  
  •  
  •