Instructor: Deva Ramanan
Semester: Fall 2022
This course introduces the fundamental techniques used in computer vision, that is, the analysis of patterns in visual images to reconstruct and understand the objects and scenes that generated them. Topics covered include image formation and representation, camera geometry, and calibration, computational imaging, multi-view geometry, stereo, 3D reconstruction from images, motion analysis, physics-based vision, image segmentation and object recognition. The material is based on graduate-level texts augmented with research papers, as appropriate.
Topics Covered:
- Feature Extraction based on Filter Banks
- K Means Clustering
- Visual Word Dictionary
- Scene Classification
- Hyperparameters Tuning
Description.
Visual words for three sample images from the SUN database.
Topics Covered:
- Simple Lucas & Kanade Tracker with Naive Template Update
- Lucas & Kanade Tracker with Template Correction
- Two-dimensional Tracking with a Pure Translation Warp Function
- Two-dimensional Tracking with a Plane Affine Warp Function
- Lucas & Kanade Forward Additive Approach
- Lucas & Kanade Inverse Compositional Approach
Description:
Lucas-Kanade tracking using Naive Template Update (purple) versus Template Correction (Red).
Topics Covered:
- Direct Linear Transform
- Matrix Decomposition to calculate Homography
- Limitations of Planar Homography
- FAST Detector and BRIEF Descriptors
- Feature Matching
- Compute Homography via RANSAC
- Automated Homography Estimation and Warping
- Augmented Reality Application using Homography
- Real-Time Augmented Reality with High FPS
- Panorama Generation based on Homography
Description:
Augmented reality clip, superimposing a video sequence onto a book cover - using Planar Homographies.
Topics Covered:
- Fundamental Matrix Estimation using Point Correspondence
- Metric Reconstruction
- Retrieval of Camera Matrices up to a Scale and Four-Fold Rotation Ambiguity
- Triangulation using the Homogeneous Least Squares Solution
- 3D Visualization from a Stereo-Pair by Triangulation and 3D Locations Rendering
- Bundle Adjustment
- Estimated fundamental matrix through RANSAC for noisy correspondences
- Jointly optmized reprojection error w.r.t 3D estimated points and camera matrices
- Non-linear optimization using SciPy least square optimizer
Description:
Temple (top) reconstructed in 3D (bottom).
Topics Covered:
- Manual Implementation of a Fully Connected Network
- Text Extraction from Images of Handwritten Characters
- PyTorch Implementation of a Convolutional Neural Network
- Fine Tuning of SqueezeNet in PyTorch
- Comparison between Fine Tuning and Training from Scratch
Description:
Neural network text recognition results, based on raw images (example on top).