Regularization¶
Regularization prevents overfitting — when a model memorizes training data instead of learning general patterns.
What is Overfitting?¶
Training accuracy: 99% ← model memorized training data
Test accuracy: 60% ← fails on new data
= OVERFITTING
Training accuracy: 85%
Test accuracy: 83% ← similar performance
= GOOD GENERALIZATION
Regularization adds a penalty to the loss function that discourages complex models.
Types of Regularization¶
L1 Regularization (Lasso)¶
Pushes some weights to exactly zero — acts as feature selection.
from neurogebra import MathForge
forge = MathForge()
l1 = forge.get("l1_regularizer")
print(l1.explain())
print(l1.eval(w=0.5, lambda_=0.01))
When to use: You suspect many features are irrelevant and want automatic feature selection.
L2 Regularization (Ridge)¶
Pushes all weights toward small values — prevents any one weight from dominating.
When to use: All features might be relevant but you want to prevent large weights.
Elastic Net¶
Combines L1 and L2 — best of both worlds:
elastic = forge.get("elastic_net")
print(elastic.explain())
print(elastic.eval(w=0.5, lambda1=0.01, lambda2=0.01))
When to use: When you want both feature selection (L1) and small weights (L2).
Comparison¶
| Type | Effect on Weights | Feature Selection | Best For |
|---|---|---|---|
| L1 (Lasso) | Some become 0 | ✅ Yes | Sparse models |
| L2 (Ridge) | All become small | ❌ No | Preventing large weights |
| Elastic Net | Mix of both | ✅ Partial | General use |
Adding Regularization to Training¶
Step-by-Step¶
import numpy as np
from neurogebra import MathForge, Expression
from neurogebra.core.trainer import Trainer
forge = MathForge()
# 1. Create model
model = Expression(
"linear_model",
"w1*x1 + w2*x2 + w3*x3 + b",
params={"w1": 0.5, "w2": 0.5, "w3": 0.5, "b": 0.0},
trainable_params=["w1", "w2", "w3", "b"]
)
# 2. Get loss and regularizer
mse = forge.get("mse")
l2 = forge.get("l2_regularizer")
# 3. Create regularized loss (manually)
# total_loss = mse_loss + lambda * sum(w^2)
lambda_reg = 0.01
# 4. Train with the combined loss
trainer = Trainer(model, mse, optimizer="adam", lr=0.01)
Regularization Strength (\(\lambda\))¶
The \(\lambda\) parameter controls how much regularization to apply:
| \(\lambda\) Value | Effect |
|---|---|
| 0.0 | No regularization (may overfit) |
| 0.001 | Light regularization (usually good start) |
| 0.01 | Moderate regularization |
| 0.1 | Strong regularization (may underfit) |
| 1.0 | Very strong (likely underfitting) |
# Experiment with different lambda values
for lam in [0.0, 0.001, 0.01, 0.1]:
penalty = l2.eval(w=0.5, lambda_=lam)
print(f"λ={lam:.3f} → L2 penalty = {penalty:.6f}")
Rule of thumb: Start with \(\lambda = 0.001\) and adjust based on validation performance.
Dropout (Concept)¶
Another form of regularization — randomly "turns off" neurons during training:
from neurogebra.builders.model_builder import ModelBuilder
builder = ModelBuilder()
model = builder.sequential([
{"type": "dense", "units": 128, "activation": "relu"},
{"type": "dropout", "rate": 0.3}, # 30% of neurons turned off randomly
{"type": "dense", "units": 64, "activation": "relu"},
{"type": "dropout", "rate": 0.2}, # 20% of neurons turned off
{"type": "dense", "units": 10, "activation": "softmax"}
])
Quick Decision Guide¶
Is your model overfitting?
├── YES → Add regularization
│ ├── Too many features? → Use L1 (Lasso)
│ ├── Weights too large? → Use L2 (Ridge)
│ ├── Not sure? → Use Elastic Net or L2
│ └── Neural network? → Use Dropout + L2
└── NO → You might not need regularization
Next: Optimization →