Regularization¶

Regularization prevents overfitting — when a model memorizes training data instead of learning general patterns.

What is Overfitting?¶

Training accuracy: 99%    ← model memorized training data
Test accuracy:     60%    ← fails on new data
                          = OVERFITTING

Training accuracy: 85%
Test accuracy:     83%    ← similar performance
                          = GOOD GENERALIZATION

Regularization adds a penalty to the loss function that discourages complex models.

Types of Regularization¶

L1 Regularization (Lasso)¶

Pushes some weights to exactly zero — acts as feature selection.

\[L_{total} = L_{data} + \lambda \sum |w_i|\]

from neurogebra import MathForge

forge = MathForge()
l1 = forge.get("l1_regularizer")

print(l1.explain())
print(l1.eval(w=0.5, lambda_=0.01))

When to use: You suspect many features are irrelevant and want automatic feature selection.

L2 Regularization (Ridge)¶

Pushes all weights toward small values — prevents any one weight from dominating.

\[L_{total} = L_{data} + \lambda \sum w_i^2\]

l2 = forge.get("l2_regularizer")

print(l2.explain())
print(l2.eval(w=0.5, lambda_=0.01))

When to use: All features might be relevant but you want to prevent large weights.

Elastic Net¶

Combines L1 and L2 — best of both worlds:

\[L_{total} = L_{data} + \lambda_1 \sum |w_i| + \lambda_2 \sum w_i^2\]

elastic = forge.get("elastic_net")

print(elastic.explain())
print(elastic.eval(w=0.5, lambda1=0.01, lambda2=0.01))

When to use: When you want both feature selection (L1) and small weights (L2).

Comparison¶

Type	Effect on Weights	Feature Selection	Best For
L1 (Lasso)	Some become 0	✅ Yes	Sparse models
L2 (Ridge)	All become small	❌ No	Preventing large weights
Elastic Net	Mix of both	✅ Partial	General use

Adding Regularization to Training¶

Step-by-Step¶

import numpy as np
from neurogebra import MathForge, Expression
from neurogebra.core.trainer import Trainer

forge = MathForge()

# 1. Create model
model = Expression(
    "linear_model",
    "w1*x1 + w2*x2 + w3*x3 + b",
    params={"w1": 0.5, "w2": 0.5, "w3": 0.5, "b": 0.0},
    trainable_params=["w1", "w2", "w3", "b"]
)

# 2. Get loss and regularizer
mse = forge.get("mse")
l2 = forge.get("l2_regularizer")

# 3. Create regularized loss (manually)
# total_loss = mse_loss + lambda * sum(w^2)
lambda_reg = 0.01

# 4. Train with the combined loss
trainer = Trainer(model, mse, optimizer="adam", lr=0.01)

Regularization Strength (\(\lambda\))¶

The \(\lambda\) parameter controls how much regularization to apply:

\(\lambda\) Value	Effect
0.0	No regularization (may overfit)
0.001	Light regularization (usually good start)
0.01	Moderate regularization
0.1	Strong regularization (may underfit)
1.0	Very strong (likely underfitting)

# Experiment with different lambda values
for lam in [0.0, 0.001, 0.01, 0.1]:
    penalty = l2.eval(w=0.5, lambda_=lam)
    print(f"λ={lam:.3f} → L2 penalty = {penalty:.6f}")

Rule of thumb: Start with \(\lambda = 0.001\) and adjust based on validation performance.

Dropout (Concept)¶

Another form of regularization — randomly "turns off" neurons during training:

from neurogebra.builders.model_builder import ModelBuilder

builder = ModelBuilder()
model = builder.sequential([
    {"type": "dense", "units": 128, "activation": "relu"},
    {"type": "dropout", "rate": 0.3},       # 30% of neurons turned off randomly
    {"type": "dense", "units": 64, "activation": "relu"},
    {"type": "dropout", "rate": 0.2},       # 20% of neurons turned off
    {"type": "dense", "units": 10, "activation": "softmax"}
])

Quick Decision Guide¶

Is your model overfitting?
├── YES → Add regularization
│   ├── Too many features? → Use L1 (Lasso)
│   ├── Weights too large? → Use L2 (Ridge)
│   ├── Not sure? → Use Elastic Net or L2
│   └── Neural network? → Use Dropout + L2
└── NO → You might not need regularization

Next: Optimization →