Project 1: Linear Regression — Neurogebra vs PyTorch¶
Build a complete linear regression model to predict house prices. We'll implement the exact same thing in both Neurogebra and PyTorch so you can see the differences.
🎯 Goal¶
Given house sizes (sq ft), predict the price ($).
Step 1: Create the Dataset¶
import numpy as np
# Generate synthetic house data
np.random.seed(42)
# Features: house size in sq ft (scaled to 0-1)
X = np.random.uniform(500, 3500, 100)
X_normalized = (X - X.mean()) / X.std()
# Target: price in $1000s (true relationship: price = 200 * size + 50 + noise)
y_true = 200 * X_normalized + 50 + np.random.normal(0, 10, 100)
# Train/test split
X_train, X_test = X_normalized[:80], X_normalized[80:]
y_train, y_test = y_true[:80], y_true[80:]
print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")
print(f"X range: [{X_normalized.min():.2f}, {X_normalized.max():.2f}]")
print(f"y range: [{y_true.min():.1f}, {y_true.max():.1f}]")
import numpy as np
import torch
# Generate synthetic house data
np.random.seed(42)
X = np.random.uniform(500, 3500, 100)
X_normalized = (X - X.mean()) / X.std()
y_true = 200 * X_normalized + 50 + np.random.normal(0, 10, 100)
X_train, X_test = X_normalized[:80], X_normalized[80:]
y_train, y_test = y_true[:80], y_true[80:]
# PyTorch needs tensors
X_train_t = torch.tensor(X_train, dtype=torch.float32).unsqueeze(1)
y_train_t = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)
X_test_t = torch.tensor(X_test, dtype=torch.float32).unsqueeze(1)
y_test_t = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)
print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")
Key Difference #1
Neurogebra works directly with NumPy arrays — no conversion needed. PyTorch requires converting to torch.Tensor objects first.
Step 2: Define the Model¶
Key Difference #2
Neurogebra: You write the actual math formula w * x + b. You can read and understand it.
PyTorch: You specify nn.Linear(1, 1) — the math is hidden inside the module.
Step 3: Set Up Training¶
Key Difference #3
Neurogebra: One Trainer object handles everything.
PyTorch: You need separate criterion (loss) and optimizer objects.
Step 4: Train the Model¶
# Training loop
history = {"loss": []}
for epoch in range(200):
# Forward pass
predictions = model(X_train_t)
loss = criterion(predictions, y_train_t)
# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Record history
history["loss"].append(loss.item())
# Print progress
if epoch % 20 == 0 or epoch == 199:
print(f"Epoch {epoch:>4d}/200: Loss = {loss.item():.6f}")
print(f"\nLearned: w = {model.weight.item():.4f} (true: 200)")
print(f"Learned: b = {model.bias.item():.4f} (true: 50)")
print(f"Final loss: {history['loss'][-1]:.4f}")
Key Difference #4
Neurogebra: One line — trainer.fit(X, y, epochs=200). Everything is handled for you.
PyTorch: You write the full training loop manually:
- Forward pass
- Compute loss
- Zero gradients
- Backward pass
- Optimizer step
- Record metrics
This gives PyTorch more flexibility, but Neurogebra is much simpler to learn.
Step 5: Evaluate on Test Data¶
# Predict on test data
y_pred = np.array([model.eval(x=float(xi)) for xi in X_test])
# Calculate test MSE
test_mse = np.mean((y_pred - y_test) ** 2)
# Calculate R² score
ss_res = np.sum((y_test - y_pred) ** 2)
ss_tot = np.sum((y_test - np.mean(y_test)) ** 2)
r2 = 1 - ss_res / ss_tot
print(f"Test MSE: {test_mse:.4f}")
print(f"R² Score: {r2:.4f}")
print(f"\nSample predictions:")
for i in range(5):
print(f" x={X_test[i]:.2f} → predicted={y_pred[i]:.1f}, actual={y_test[i]:.1f}")
# Predict on test data
with torch.no_grad():
y_pred_t = model(X_test_t)
y_pred = y_pred_t.numpy().flatten()
# Calculate test MSE
test_mse = np.mean((y_pred - y_test) ** 2)
# Calculate R² score
ss_res = np.sum((y_test - y_pred) ** 2)
ss_tot = np.sum((y_test - np.mean(y_test)) ** 2)
r2 = 1 - ss_res / ss_tot
print(f"Test MSE: {test_mse:.4f}")
print(f"R² Score: {r2:.4f}")
print(f"\nSample predictions:")
for i in range(5):
print(f" x={X_test[i]:.2f} → predicted={y_pred[i]:.1f}, actual={y_test[i]:.1f}")
Key Difference #5
PyTorch requires torch.no_grad() context and converting back to NumPy.
Neurogebra evaluates directly — no special context needed.
Step 6: Visualize Results¶
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Plot 1: Training Loss
axes[0].plot(history["loss"], linewidth=2)
axes[0].set_xlabel("Epoch")
axes[0].set_ylabel("Loss (MSE)")
axes[0].set_title("Training Loss — Neurogebra")
axes[0].set_yscale("log")
axes[0].grid(True, alpha=0.3)
# Plot 2: Predictions vs Actual
x_line = np.linspace(X_test.min(), X_test.max(), 100)
y_line = np.array([model.eval(x=float(xi)) for xi in x_line])
axes[1].scatter(X_test, y_test, alpha=0.7, label="Actual", color="blue")
axes[1].plot(x_line, y_line, color="red", linewidth=2, label="Predicted")
axes[1].set_xlabel("House Size (normalized)")
axes[1].set_ylabel("Price ($1000s)")
axes[1].set_title("Predictions — Neurogebra")
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Plot 1: Training Loss
axes[0].plot(history["loss"], linewidth=2)
axes[0].set_xlabel("Epoch")
axes[0].set_ylabel("Loss (MSE)")
axes[0].set_title("Training Loss — PyTorch")
axes[0].set_yscale("log")
axes[0].grid(True, alpha=0.3)
# Plot 2: Predictions vs Actual
x_line = np.linspace(X_test.min(), X_test.max(), 100)
x_line_t = torch.tensor(x_line, dtype=torch.float32).unsqueeze(1)
with torch.no_grad():
y_line = model(x_line_t).numpy().flatten()
axes[1].scatter(X_test, y_test, alpha=0.7, label="Actual", color="blue")
axes[1].plot(x_line, y_line, color="red", linewidth=2, label="Predicted")
axes[1].set_xlabel("House Size (normalized)")
axes[1].set_ylabel("Price ($1000s)")
axes[1].set_title("Predictions — PyTorch")
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Step 7: Understand What You Built (Neurogebra Bonus)¶
This is where Neurogebra really shines — understanding and introspection:
from neurogebra import MathForge
forge = MathForge()
# Explain the model
print("=== Your Model ===")
print(f"Formula: y = {model.params['w']:.2f} * x + {model.params['b']:.2f}")
print(f"Symbolic: {model.symbolic_expr}")
print()
# Examine the gradient
grad = model.gradient("x")
print(f"Gradient dy/dx = {grad.symbolic_expr}")
print(f"This means: for every 1 unit increase in x, y increases by {model.params['w']:.2f}")
print()
# Examine the loss function
mse = forge.get("mse")
print(f"Loss function: {mse.symbolic_expr}")
print(f"Loss gradient: {mse.gradient('y_pred').symbolic_expr}")
print()
# Understand activations used
print("=== Available Activations You Could Add ===")
for name in ["relu", "sigmoid", "tanh"]:
act = forge.get(name)
print(f" {name}: {act.symbolic_expr}")
Educational Value
In PyTorch, you can't easily inspect formulas or see gradients symbolically. Neurogebra shows you exactly what's happening at every step.
Full Side-by-Side Comparison¶
| Aspect | Neurogebra | PyTorch |
|---|---|---|
| Data format | NumPy arrays | torch.Tensor |
| Model definition | Math formula: "w*x + b" |
nn.Linear(1, 1) |
| Lines for training | 1 (trainer.fit(...)) |
~10 (manual loop) |
| See the formula | ✅ model.symbolic_expr |
❌ Hidden in module |
| See gradients | ✅ model.gradient("x") |
❌ Only numerical values |
| Total lines of code | ~15 | ~35 |
| Learning curve | Gentle | Steep |
| Production ready | Educational | Production |
| GPU support | Via bridges | Native |
What You Learned¶
- Linear regression fits
y = wx + bto data - Training = adjusting w and b to minimize loss
- MSE loss measures average squared error
- Adam optimizer efficiently updates parameters
- Neurogebra lets you see and understand every step
- PyTorch gives more control but requires more code
Next Project: Image Classifier →