Loss Functions¶

A loss function measures how wrong your model's predictions are. The goal of training is to minimize the loss.

The Concept¶

Model Prediction: ŷ = 4.5
Actual Value:     y  = 5.0

Loss = some_function(ŷ, y)
      = how_wrong_is_the_prediction

If loss is high → model is bad → adjust weights
If loss is low  → model is good → we're done (or close)

Exploring Loss Functions¶

from neurogebra import MathForge

forge = MathForge()
losses = forge.list_all(category="loss")
print(losses)
# ['mse', 'mae', 'binary_crossentropy', 'huber', 'log_cosh', 'hinge', ...]

MSE — Mean Squared Error¶

The most common loss for regression tasks.

\[\text{MSE} = (y_{pred} - y_{true})^2\]

mse = forge.get("mse")

# Close prediction → small loss
print(mse.eval(y_pred=4.8, y_true=5.0))  # 0.04

# Bad prediction → large loss
print(mse.eval(y_pred=2.0, y_true=5.0))  # 9.0

# Perfect prediction → zero loss
print(mse.eval(y_pred=5.0, y_true=5.0))  # 0.0

Properties:

Squares the error → penalizes large errors heavily
Always non-negative
Smooth and differentiable → nice gradients

# The gradient tells the model HOW to adjust
grad = mse.gradient("y_pred")
print(grad.symbolic_expr)  # 2*(y_pred - y_true)
# If pred > true: gradient is positive → decrease prediction
# If pred < true: gradient is negative → increase prediction

When to use MSE

Regression tasks. Best when outliers are rare and you want to penalize large errors.

MAE — Mean Absolute Error¶

More robust to outliers than MSE.

\[\text{MAE} = |y_{pred} - y_{true}|\]

mae = forge.get("mae")

print(mae.eval(y_pred=4.8, y_true=5.0))  # 0.2
print(mae.eval(y_pred=2.0, y_true=5.0))  # 3.0
print(mae.eval(y_pred=5.0, y_true=5.0))  # 0.0

MSE vs MAE:

Error	MSE	MAE
Small (0.2)	0.04	0.2
Medium (3.0)	9.0	3.0
Large (10.0)	100.0	10.0

MSE squares errors, so it punishes outliers much more. MAE treats all errors linearly.

When to use MAE

Regression with outliers. More robust than MSE but converges slower.

Binary Cross-Entropy¶

The standard loss for binary classification (yes/no problems).

\[\text{BCE} = -[y_{true} \cdot \log(y_{pred}) + (1-y_{true}) \cdot \log(1-y_{pred})]\]

bce = forge.get("binary_crossentropy")

# Model says 0.9 probability, actual is 1 (correct!) → low loss
print(bce.eval(y_pred=0.9, y_true=1.0))  # ≈ 0.105

# Model says 0.1 probability, actual is 1 (wrong!) → high loss
print(bce.eval(y_pred=0.1, y_true=1.0))  # ≈ 2.303

# Model says 0.9, actual is 0 (wrong!) → high loss
print(bce.eval(y_pred=0.9, y_true=0.0))  # ≈ 2.303

When to use Binary Cross-Entropy

Binary classification (spam/not-spam, disease/healthy). Always pair with sigmoid output.

Huber Loss¶

Best of both worlds — MSE for small errors, MAE for large errors.

\[L_\delta = \begin{cases} \frac{1}{2}(y_{pred} - y_{true})^2 & \text{if } |y_{pred} - y_{true}| \leq \delta \\ \delta|y_{pred} - y_{true}| - \frac{1}{2}\delta^2 & \text{otherwise} \end{cases}\]

huber = forge.get("huber")

# Small error → behaves like MSE (smooth)
# Large error → behaves like MAE (robust)
print(huber.eval(y_pred=4.8, y_true=5.0))  # ≈ 0.02 (MSE-like)
print(huber.eval(y_pred=0.0, y_true=5.0))  # ≈ 4.5  (MAE-like, not 25!)

When to use Huber

Regression with occasional outliers. Great balance between MSE and MAE. Popular in reinforcement learning.

Hinge Loss¶

For SVM-style classification with margin.

\[L = \max(0, 1 - y_{true} \cdot y_{pred})\]

hinge = forge.get("hinge")

# Correct with margin → zero loss
print(hinge.eval(y_pred=2.0, y_true=1.0))   # 0 (correct & confident)

# Correct but no margin → some loss  
print(hinge.eval(y_pred=0.5, y_true=1.0))  # 0.5

# Wrong → high loss
print(hinge.eval(y_pred=-1.0, y_true=1.0))  # 2.0

Log-Cosh Loss¶

Smooth alternative to Huber Loss.

\[L = \log(\cosh(y_{pred} - y_{true}))\]

log_cosh = forge.get("log_cosh")

print(log_cosh.eval(y_pred=4.8, y_true=5.0))  # ≈ 0.020
print(log_cosh.eval(y_pred=0.0, y_true=5.0))  # ≈ 4.307

Choosing the Right Loss Function¶

What's your task?
│
├── REGRESSION (predicting a number)
│   ├── No outliers       → MSE
│   ├── Has outliers      → MAE or Huber
│   └── Smooth & robust   → Log-Cosh
│
├── BINARY CLASSIFICATION (yes/no)
│   └── → Binary Cross-Entropy + Sigmoid
│
├── MULTI-CLASS CLASSIFICATION (cat/dog/bird)
│   └── → Cross-Entropy + Softmax
│
└── MARGIN-BASED (SVM-style)
    └── → Hinge Loss

Comparing Loss Functions¶

from neurogebra import MathForge
import numpy as np

forge = MathForge()

# Different error levels
errors = [0.1, 0.5, 1.0, 2.0, 5.0, 10.0]

print(f"{'Error':>6} | {'MSE':>8} | {'MAE':>8} | {'Huber':>8}")
print("-" * 40)

for err in errors:
    mse_val = forge.get("mse").eval(y_pred=err, y_true=0)
    mae_val = forge.get("mae").eval(y_pred=err, y_true=0)
    # Huber needs delta param
    huber_val = min(0.5 * err**2, abs(err) - 0.5)  # delta=1

    print(f"{err:>6.1f} | {mse_val:>8.3f} | {mae_val:>8.3f} | {huber_val:>8.3f}")

Composing Custom Losses¶

# Weighted combination
hybrid = forge.compose("mse + 0.1*mae")

# Evaluate
result = hybrid.eval(y_pred=3.0, y_true=5.0)
print(f"Hybrid loss: {result}")

Try It Yourself!¶

Exercise

Compute MSE, MAE, and Huber loss for predictions [1, 2, 3] vs targets [1.5, 2.5, 3.5]
Which loss function is most sensitive to the error (pred=0, true=10)?
Create a custom weighted loss: 0.8*mse + 0.2*mae
Compute the gradient of MSE with respect to y_pred

Next: Gradients & Differentiation →