Skip to content

Latest commit

 

History

History
17 lines (14 loc) · 2.41 KB

File metadata and controls

17 lines (14 loc) · 2.41 KB

Regularization

A technique to prevent overfitting by adding a penalty term (like L1 or L2) to the loss function, discouraging overly complex models.

controls model complexity

Penalty Term mathematical Form Key Concept Effect on Model Use Case
L1 (Lasso) Encourages sparsity by penalizing the sum of absolute values of weights Drives some weights to exactly zero, performing feature selection Useful when only a few features are important
L2 (Ridge) Penalizes large weights by summing their squares Shrinks weights smoothly but keeps them non-zero Suitable when all features have some relevance, helping with multicollinearity
Elastic Net Combines L1 and L2 regularization Balances sparsity (L1) and shrinkage (L2) Ideal for models where some features are redundant and others sparse. Works at the individual feature level
L_{21} Norm A group-based regularization where the L2 norm is applied within each group, and the L1 norm is applied across groups Encourages sparsity by selecting entire groups of features (rows in a matrix) together. If one element in a group becomes zero, the whole group may go to zero Multi-task learning, feature selection with structured dependencies among features (e.g. grouped variables)
Group Lasso Applies L1 regularization on groups of related features Selects or discards entire feature groups Useful when features are grouped and dependent
Dropout Randomly sets a fraction of weights to zero during training Introduces randomness to reduct overfitting and prevents neurons from co-adapting Wiedely used in deep learning networks
Max-Norm Constrains the magnitude of weight norms Prevents exploding weights Common in neural networks for stable learning
Total Variation (TV) Penalizes differences between neighboring parameters Smooths solutions by reducting oscillations Used in image processing tasks
Frobenius norm Penalizes large values across all elements of a matrix (e.g. weight matrices in neural networks) It encourages smaller and smoother weights in the matrix, preventing overfitting by controlling the overall scale of weights Particularly relevant in models involving matrix operations, such as regularizing weight matrices in NNs, collaborative filtering (e.g. matrix factorization models), multi-task learning, wehere each tasks parameters are organized in matrices