Training Expressions¶
This is where the magic happens — teaching expressions to learn from data.
The Concept¶
"Training" means finding the parameter values that make the expression match the data as closely as possible.
Before training: y = 0.0*x + 0.0 (random — useless)
After training: y = 2.0*x + 1.0 (learned from data!)
Step-by-Step: Training a Linear Model¶
Step 1: Create a Trainable Expression¶
from neurogebra import Expression
model = Expression(
"linear_model",
"m*x + b",
params={"m": 0.0, "b": 0.0}, # Start with zeros
trainable_params=["m", "b"] # These will be learned
)
print(f"Before training: y = {model.params['m']}*x + {model.params['b']}")
# Before training: y = 0.0*x + 0.0
Step 2: Prepare Data¶
import numpy as np
# True relationship: y = 2x + 1
np.random.seed(42)
X = np.linspace(0, 10, 100)
y = 2 * X + 1 + np.random.normal(0, 0.5, 100) # Add some noise
Step 3: Create a Trainer¶
from neurogebra.core.trainer import Trainer
trainer = Trainer(
model,
learning_rate=0.01, # How big each adjustment step is
optimizer="sgd" # Stochastic Gradient Descent
)
Step 4: Train!¶
history = trainer.fit(
X, y,
epochs=200, # How many times to loop through the data
verbose=True # Print progress
)
Output:
Epoch 0/200: Loss = 25.431200
Epoch 20/200: Loss = 3.214100
Epoch 40/200: Loss = 0.891230
Epoch 60/200: Loss = 0.412340
...
Epoch 200/200: Loss = 0.251230
Step 5: Check Results¶
print(f"After training: y = {model.params['m']:.2f}*x + {model.params['b']:.2f}")
# After training: y = 2.01*x + 0.98
# Very close to the true y = 2x + 1!
Understanding the Trainer¶
Optimizers¶
| Optimizer | Description | When to Use |
|---|---|---|
"sgd" |
Stochastic Gradient Descent | Simple, educational, basic tasks |
"adam" |
Adaptive Moment Estimation | Default choice, works almost always |
# SGD — simple but sometimes slow
trainer_sgd = Trainer(model, learning_rate=0.01, optimizer="sgd")
# Adam — adaptive learning rate, usually faster
trainer_adam = Trainer(model, learning_rate=0.01, optimizer="adam")
Learning Rate¶
The learning rate controls how much parameters change each step:
# Too high → overshoots, loss oscillates or explodes
trainer = Trainer(model, learning_rate=1.0) # Bad!
# Too low → takes forever to converge
trainer = Trainer(model, learning_rate=0.00001) # Very slow!
# Just right → smooth convergence
trainer = Trainer(model, learning_rate=0.01) # Good starting point
Loss Functions¶
# Default: MSE (mean squared error)
history = trainer.fit(X, y, loss_fn="mse")
# Alternative: MAE (mean absolute error)
history = trainer.fit(X, y, loss_fn="mae")
# Alternative: Huber (robust to outliers)
history = trainer.fit(X, y, loss_fn="huber")
Mini-Batch Training¶
# Full batch (default) — uses all data each step
history = trainer.fit(X, y, batch_size=None)
# Mini-batch — uses small chunks (faster for large datasets)
history = trainer.fit(X, y, batch_size=32)
Training History¶
The fit() method returns a history dictionary:
history = trainer.fit(X, y, epochs=100)
# Loss over time
print(history["loss"][:5]) # First 5 losses
print(history["loss"][-5:]) # Last 5 losses
# Parameters over time
print(history["params"][0]) # Parameters at epoch 0
print(history["params"][-1]) # Parameters at last epoch
Example: Training a Quadratic Model¶
import numpy as np
from neurogebra import Expression
from neurogebra.core.trainer import Trainer
# True function: y = x² - 2x + 1
X = np.linspace(-3, 3, 100)
y = X**2 - 2*X + 1 + np.random.normal(0, 0.3, 100)
# Model with unknown coefficients
model = Expression(
"quadratic",
"a*x**2 + b*x + c",
params={"a": 0.0, "b": 0.0, "c": 0.0},
trainable_params=["a", "b", "c"]
)
# Train
trainer = Trainer(model, learning_rate=0.001, optimizer="adam")
history = trainer.fit(X, y, epochs=500, verbose=True)
print(f"\nLearned: y = {model.params['a']:.2f}x² + ({model.params['b']:.2f})x + {model.params['c']:.2f}")
# Expected: y ≈ 1.00x² + (-2.00)x + 1.00
Example: Using a Callback¶
def my_callback(epoch, loss, params):
"""Called after each epoch."""
if loss < 0.5:
print(f" [Early stop possible] Epoch {epoch}: loss = {loss:.4f}")
trainer = Trainer(model, learning_rate=0.01, optimizer="adam")
history = trainer.fit(X, y, epochs=200, callback=my_callback)
Training Tips¶
Best Practices
Start with Adam optimizer — it handles most situations well.
Use learning rate 0.01 as a starting point. If loss oscillates, decrease it. If loss decreases too slowly, increase it.
Watch the loss curve:
- Smooth decrease → good
- Oscillating → learning rate too high
- Flat (no decrease) → learning rate too low or model too simple
- Sudden explosion → learning rate WAY too high
Normalize your data before training.
Use enough epochs — but not too many (overfitting risk).
Complete Training Pipeline¶
import numpy as np
from neurogebra import Expression
from neurogebra.core.trainer import Trainer
# 1. Generate data
np.random.seed(42)
X = np.linspace(-5, 5, 200)
y = 0.5 * X**2 + 2 * X - 3 + np.random.normal(0, 1, 200)
# 2. Split data (80% train, 20% test)
split = int(0.8 * len(X))
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]
# 3. Define model
model = Expression(
"polynomial",
"a*x**2 + b*x + c",
params={"a": 0.0, "b": 0.0, "c": 0.0},
trainable_params=["a", "b", "c"]
)
# 4. Train
trainer = Trainer(model, learning_rate=0.001, optimizer="adam")
history = trainer.fit(X_train, y_train, epochs=500, verbose=True)
# 5. Evaluate on test set
predictions = np.array([model.eval(x=xi) for xi in X_test])
test_mse = np.mean((predictions - y_test) ** 2)
print(f"\nLearned: y = {model.params['a']:.2f}x² + {model.params['b']:.2f}x + {model.params['c']:.2f}")
print(f"Test MSE: {test_mse:.4f}")
print(f"True: y = 0.50x² + 2.00x - 3.00")
Next: Autograd Engine →