L1 vs L2 Regularization — The Complete Guide (with Elastic Net)

When building machine learning models, it’s easy to fall into the overfitting trap — where your model learns noise instead of real patterns.
Regularization is one of the best ways to fight this.

Two of the most widely used regularization techniques are:

L1 Regularization (Lasso)
L2 Regularization (Ridge)

Both add a penalty term to the loss function, discouraging overly complex models. Let’s break them down.

1. L1 Regularization (Lasso)

Definition:
Adds the absolute value of the weights as a penalty term to the loss function:

$Loss = Original\_Loss + \lambda \sum_{i} |w_i|$

Where:

$w_i$ = weight of the i-th feature
$\lambda$ = regularization strength (higher = more penalty)

Key Characteristics:

Encourages sparsity (many weights become exactly zero)
Naturally performs feature selection
Works best when only a subset of features is truly relevant

When to Use:

High-dimensional datasets (e.g., text classification, genetics)
When you expect many features to be irrelevant

Example:
Predicting house prices with 100 features → L1 might keep only the 10 most important ones (e.g., square footage, location) and set the rest to zero.

2. L2 Regularization (Ridge)

Definition:
Adds the squared value of the weights as a penalty term to the loss function:

$Loss = Original\_Loss + \lambda \sum_{i} w_i^2$

Key Characteristics:

Encourages small weights (closer to zero but not exactly zero)
Reduces the influence of any single feature without removing it entirely
Works best when all features are useful

When to Use:

You believe all features have some predictive power
You want to avoid overfitting but keep every feature in play
Useful for correlated features

Example:
Predicting house prices → All features (square footage, bedrooms, bathrooms, etc.) contribute, but L2 ensures no single one dominates.

3. Side-by-Side: L1 vs L2

Aspect	L1 (Lasso)	L2 (Ridge)
Penalty Term	( \lambda \sum	w_i
Effect on Weights	Many become exactly zero	All become small, non-zero
Feature Selection	✅ Yes	❌ No
Optimization	Harder (non-differentiable at zero)	Easier (fully differentiable)
Best For	Sparse models, irrelevant features	Regularizing all features

4. Elastic Net — The Best of Both Worlds

Elastic Net combines L1 and L2 penalties:

$Loss = Original\_Loss + \alpha \lambda \sum |w_i| + (1-\alpha) \lambda \sum w_i^2$

Why use it?

Retains the feature selection benefits of L1
Keeps the weight shrinkage benefits of L2
Especially helpful when features are correlated

5. Visual Intuition

L1 (Lasso): Diamond-shaped constraint → optimization often lands on corners → many weights exactly zero (sparse solution)
L2 (Ridge): Circular constraint → optimization lands inside → all weights small, none zero

6. Choosing the Right Regularization

✅ Use L1 when:

You want a sparse model
You expect many irrelevant features
You need automatic feature selection

✅ Use L2 when:

All features likely matter
You want to control coefficient size without removing features
You have multicollinearity (correlated features)

✅ Use Elastic Net when:

You want a mix of sparsity + stability
You have many correlated features
You want to avoid L1’s instability on correlated data

7. Python Implementation

from sklearn.linear_model import Lasso, Ridge, ElasticNet

# L1 Regularization (Lasso)
lasso = Lasso(alpha=0.1)  # alpha = λ
lasso.fit(X_train, y_train)

# L2 Regularization (Ridge)
ridge = Ridge(alpha=0.1)
ridge.fit(X_train, y_train)

# Elastic Net
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5)  # l1_ratio balances L1/L2
elastic_net.fit(X_train, y_train)

8. Summary Table

Regularization	Main Effect	Removes Features?	Best For
L1	Sparse weights (zeros)	✅ Yes	High-dimensional, irrelevant features
L2	Small, non-zero weights	❌ No	All features relevant, control magnitude
Elastic Net	Mix of L1 & L2 benefits	Partial	Correlated features + feature selection

💡 Takeaway:

Use L1 for feature selection
Use L2 for controlling weight magnitude
Use Elastic Net for a balanced approach

Bigdata and data science by Kartheek Dachepalli

Saturday, August 9, 2025

Regularization