Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Burn Deep Learning Ecosystem #2893

Open
44 tasks
salimmghari opened this issue Mar 11, 2025 · 0 comments
Open
44 tasks

Burn Deep Learning Ecosystem #2893

salimmghari opened this issue Mar 11, 2025 · 0 comments

Comments

@salimmghari
Copy link

Feature description & motivation

(Previously posted but removed for better visibility) Rust Deep Learning has Burn, but Burn lacks in its ecosystem these four: Vision, Audio, Text, and 3D. What I suggest for us developers is to build 4 crates that will grow the Burn's ecosystem: burn-vision, burn-audio, burn-text, and finally burn-3d. This is very broad and what we should do is focus only on one by one, for now we can build the entire burn-vision crate as opposed to torchvision in PyTorch.

Feature technical details

burn-vision would provide the essential tools for deep learning in vision from the following list:

  1. Image Transforms & Preprocessing:
  • Resize – Adjust image size while maintaining aspect ratio.
  • CenterCrop / RandomCrop – Extract fixed-size regions from images.
  • Normalize – Scale pixel values based on mean and standard deviation.
  • ToTensor – Convert image data to tensor format.
  • RandomRotation / Flip / Perspective Transform – Data augmentation techniques.
  • Color Jitter – Adjust brightness, contrast, saturation, and hue.
  • Gaussian Blur / Sharpening – Apply blur and sharpening filters.
  • Convert Image Mode – Convert between RGB, Grayscale, YCbCr, etc.
  • Random Erasing – Hide random parts of images to simulate occlusion.
  • MixUp / CutMix – Advanced augmentation for improved generalization.
  1. Pretrained Vision Models:
  • ResNet (18, 34, 50, 101, 152)
  • EfficientNet (B0 - B7)
  • MobileNet (V2, V3)
  • Vision Transformer (ViT)
  • DenseNet
  • VGG (11, 13, 16, 19)
  • SqueezeNet
  • Swin Transformer
  • YOLO / SSD / Faster R-CNN – Object detection models
  • Mask R-CNN / U-Net – Segmentation models
  1. Datasets & Data Loaders:
  • CIFAR-10 / CIFAR-100
  • MNIST / FashionMNIST
  • ImageNet
  • COCO (Common Objects in Context)
  • Pascal VOC
  • Cityscapes
  • LFW (Labeled Faces in the Wild)
  • OpenImages
  1. Object Detection & Segmentation Utilities:
  • Bounding Box Utilities – Resize, convert, visualize bounding boxes.
  • IoU (Intersection over Union) – Compute overlap for object detection evaluation.
  • Mask Transformations – Convert segmentation masks to tensors.
  • Keypoint Detection – Process landmark-based annotations (e.g., face keypoints).
  1. Image I/O & Visualization:
  • Image Loading & Saving – Support PNG, JPEG, BMP, TIFF, etc.
  • Show Image Tensors – Convert tensors to displayable images.
  • Grid Visualization – Display multiple images in a grid format.
  • Draw Bounding Boxes / Masks – Overlay bounding boxes and segmentation masks.
  1. Video Processing & Streaming Support:
  • Read Video Frames – Load frames from video files.
  • Stream Processing – Process live video frames for real-time AI applications.
  • Optical Flow Estimation – Track motion between frames.
  • Frame Extraction & Augmentation – Manipulate frames like static images.
  1. Efficient Training Utilities:
  • Mixed Precision Training – Use FP16 for faster model training.
  • AutoML / Hyperparameter Optimization – Automated tuning for vision models.
  • Model Quantization – Reduce model size for deployment.
  • Model Pruning – Remove unnecessary connections for efficiency.

Feature Solution

Leveraging existing stuff from torchvision if allowed can be a helpful solution to complete one of the four crates: burn-vision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant