Project 3: Neural Network from Scratch — Neurogebra vs PyTorch¶

Build a complete neural network from scratch to solve a real classification problem — understanding every single component. This is the ultimate learning project.

🎯 Goal¶

Create a neural network that classifies points into spiral categories — a problem that cannot be solved with linear models.

Input:  (x, y) coordinate on a 2D plane
Output: Which spiral (0, 1, or 2) the point belongs to

This is a classic non-linear classification problem that truly tests neural network capability.

Step 1: Generate the Spiral Dataset¶

import numpy as np
import matplotlib.pyplot as plt

def generate_spirals(n_points=100, n_classes=3, noise=0.1):
    """Generate spiral dataset — a classic non-linear classification problem."""
    X = np.zeros((n_points * n_classes, 2))
    y = np.zeros(n_points * n_classes, dtype=int)

    for class_idx in range(n_classes):
        start = n_points * class_idx
        end = n_points * (class_idx + 1)

        r = np.linspace(0.0, 1.0, n_points)
        theta = np.linspace(
            class_idx * 4, (class_idx + 1) * 4, n_points
        ) + np.random.randn(n_points) * noise

        X[start:end, 0] = r * np.sin(theta)
        X[start:end, 1] = r * np.cos(theta)
        y[start:end] = class_idx

    return X, y

# Generate data
np.random.seed(42)
X, y = generate_spirals(n_points=100, n_classes=3, noise=0.15)

# Train/test split
indices = np.random.permutation(len(X))
split = int(0.8 * len(X))
X_train, X_test = X[indices[:split]], X[indices[split:]]
y_train, y_test = y[indices[:split]], y[indices[split:]]

# Visualize
plt.figure(figsize=(8, 8))
colors = ['#FF6B6B', '#4ECDC4', '#45B7D1']
for i in range(3):
    mask = y_train == i
    plt.scatter(X_train[mask, 0], X_train[mask, 1], 
               c=colors[i], label=f'Class {i}', alpha=0.7, s=30)
plt.title("Spiral Dataset — Can Your Neural Network Solve This?", fontsize=14)
plt.xlabel("x₁")
plt.ylabel("x₂")
plt.legend()
plt.grid(True, alpha=0.3)
plt.axis('equal')
plt.show()

print(f"Training: {len(X_train)} samples")
print(f"Testing:  {len(X_test)} samples")
print(f"Classes:  3 spirals")
print(f"Features: 2 (x, y coordinates)")

Why Spirals?

A linear model draws straight lines to separate classes. Spirals are intertwined — you need a neural network with non-linear activations to separate them. This proves your network actually works!

Step 2: Build the Neural Network¶

Neurogebra — Every Component Visible¶

from neurogebra.core.autograd import Value
import random

random.seed(42)

class Neuron:
    """A single neuron: computes w·x + b, then applies activation."""

    def __init__(self, n_inputs, activation='relu'):
        # Xavier initialization for better training
        limit = (6 / (n_inputs + 1)) ** 0.5
        self.w = [Value(random.uniform(-limit, limit)) for _ in range(n_inputs)]
        self.b = Value(0.0)
        self.activation = activation

    def __call__(self, x):
        # Linear: w·x + b
        raw = sum((wi * xi for wi, xi in zip(self.w, x)), self.b)

        # Activation
        if self.activation == 'relu':
            return raw.relu()
        elif self.activation == 'tanh':
            return raw.tanh()
        elif self.activation == 'linear':
            return raw
        return raw.relu()

    def parameters(self):
        return self.w + [self.b]


class Layer:
    """A layer of neurons."""

    def __init__(self, n_inputs, n_outputs, activation='relu'):
        self.neurons = [Neuron(n_inputs, activation) for _ in range(n_outputs)]

    def __call__(self, x):
        return [neuron(x) for neuron in self.neurons]

    def parameters(self):
        return [p for neuron in self.neurons for p in neuron.parameters()]


class NeuralNetwork:
    """Complete neural network with multiple layers."""

    def __init__(self, layer_sizes, activations=None):
        """
        Args:
            layer_sizes: [input_size, hidden1, hidden2, ..., output_size]
            activations: activation for each layer (default: relu, linear for last)
        """
        if activations is None:
            activations = ['relu'] * (len(layer_sizes) - 2) + ['linear']

        self.layers = []
        for i in range(len(layer_sizes) - 1):
            self.layers.append(
                Layer(layer_sizes[i], layer_sizes[i+1], activations[i])
            )

        n_params = len(self.parameters())
        print(f"Neural Network created:")
        print(f"  Architecture: {' → '.join(map(str, layer_sizes))}")
        print(f"  Activations:  {activations}")
        print(f"  Parameters:   {n_params:,}")

    def __call__(self, x):
        for layer in self.layers:
            x = layer(x)
        return x

    def parameters(self):
        return [p for layer in self.layers for p in layer.parameters()]


# Create neural network: 2 inputs → 16 hidden → 16 hidden → 3 outputs
nn_neuro = NeuralNetwork(
    layer_sizes=[2, 16, 16, 3],
    activations=['relu', 'relu', 'linear']
)

PyTorch — The Standard Way¶

import torch
import torch.nn as nn
import torch.optim as optim

torch.manual_seed(42)

class SpiralNet(nn.Module):
    """Neural network for spiral classification."""

    def __init__(self):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(2, 16),
            nn.ReLU(),
            nn.Linear(16, 16),
            nn.ReLU(),
            nn.Linear(16, 3)
        )

    def forward(self, x):
        return self.network(x)

nn_torch = SpiralNet()
total_params = sum(p.numel() for p in nn_torch.parameters())
print(f"\nPyTorch Neural Network:")
print(f"  Architecture: 2 → 16 → 16 → 3")
print(f"  Parameters:   {total_params:,}")

Architecture Comparison

Both networks have the same structure: 2 → 16 → 16 → 3.
Neurogebra: ~35 lines to define (you understand every line).
PyTorch: ~15 lines (uses pre-built components).

Step 3: Implement Softmax & Cross-Entropy Loss¶

NeurogebraPyTorch

def softmax_cross_entropy(scores, target):
    """
    Compute softmax probabilities and cross-entropy loss.

    This is EXACTLY what PyTorch's CrossEntropyLoss does internally,
    but here you can see every step!

    Steps:
    1. Find max score (for numerical stability)
    2. Compute exp(score - max) for each class
    3. Normalize to get probabilities (softmax)
    4. Loss = -log(probability of correct class)
    """
    # Step 1: Numerical stability — subtract max
    max_score = max(s.data for s in scores)

    # Step 2: Compute exponentials
    exp_scores = [(s - Value(max_score)).exp() for s in scores]

    # Step 3: Softmax — normalize to probabilities
    sum_exp = sum(exp_scores)
    probabilities = [e / sum_exp for e in exp_scores]

    # Step 4: Cross-entropy loss
    # -log(P(correct class))
    loss = -(probabilities[target].log())

    return loss, probabilities

# Example: verify it works
dummy_scores = [Value(2.0), Value(1.0), Value(0.1)]  # Raw scores for 3 classes
loss, probs = softmax_cross_entropy(dummy_scores, target=0)  # Target = class 0

print("=== Softmax + Cross-Entropy Demo ===")
print(f"Raw scores: [{', '.join(f'{s.data:.1f}' for s in dummy_scores)}]")
print(f"Probabilities: [{', '.join(f'{p.data:.3f}' for p in probs)}]")
print(f"Sum of probs: {sum(p.data for p in probs):.3f} (should be 1.0)")
print(f"Loss (target=class 0): {loss.data:.4f}")
print(f"(Lower loss = higher confidence in correct class)")

# PyTorch does this in ONE line:
criterion = nn.CrossEntropyLoss()

# That single line contains all the softmax + cross-entropy math!
# Convenient, but you don't see the internals.

# Demo:
dummy_scores = torch.tensor([[2.0, 1.0, 0.1]])
dummy_target = torch.tensor([0])
loss = criterion(dummy_scores, dummy_target)

probs = torch.softmax(dummy_scores, dim=1)
print(f"Raw scores:    {dummy_scores.numpy()}")
print(f"Probabilities: {probs.numpy()}")
print(f"Loss:          {loss.item():.4f}")

Step 4: Train Both Networks¶

Neurogebra Training¶

# ═══════════════════════════════════════════════
# NEUROGEBRA TRAINING — See every gradient flow!
# ═══════════════════════════════════════════════

learning_rate = 0.05
epochs = 30
batch_size = 16
neuro_history = {"loss": [], "accuracy": []}

print("Training Neurogebra Neural Network...")
print("=" * 50)

for epoch in range(epochs):
    # Shuffle training data
    perm = np.random.permutation(len(X_train))

    epoch_loss = 0.0
    correct = 0
    total = 0

    # Mini-batch training
    for start in range(0, len(X_train), batch_size):
        batch_idx = perm[start:start + batch_size]
        batch_loss = Value(0.0)

        for idx in batch_idx:
            # Convert input to Value objects
            x_input = [Value(float(X_train[idx, 0])),
                       Value(float(X_train[idx, 1]))]
            target = int(y_train[idx])

            # Forward pass — compute scores
            scores = nn_neuro(x_input)

            # Compute loss
            loss, probs = softmax_cross_entropy(scores, target)
            batch_loss = batch_loss + loss

            # Track accuracy
            predicted = max(range(3), key=lambda i: scores[i].data)
            correct += (predicted == target)
            total += 1

        # Average batch loss
        batch_loss = batch_loss / len(batch_idx)

        # ==============================
        # BACKWARD PASS — The Magic Part
        # ==============================

        # 1. Zero all gradients
        for p in nn_neuro.parameters():
            p.grad = 0.0

        # 2. Backpropagate — compute dLoss/dParam for every parameter
        batch_loss.backward()

        # 3. Update every parameter: param = param - lr * gradient
        for p in nn_neuro.parameters():
            p.data -= learning_rate * p.grad

        epoch_loss += batch_loss.data

    accuracy = correct / total
    neuro_history["loss"].append(epoch_loss)
    neuro_history["accuracy"].append(accuracy)

    if epoch % 5 == 0 or epoch == epochs - 1:
        print(f"  Epoch {epoch:>3d}/{epochs}: Loss = {epoch_loss:.4f}, Accuracy = {accuracy:.1%}")

print(f"\nFinal Training Accuracy: {neuro_history['accuracy'][-1]:.1%}")

PyTorch Training¶

# ═══════════════════════════════════════════════
# PYTORCH TRAINING — Fast and optimized
# ═══════════════════════════════════════════════

X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.long)

optimizer = optim.Adam(nn_torch.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()

torch_history = {"loss": [], "accuracy": []}

print("Training PyTorch Neural Network...")
print("=" * 50)

for epoch in range(epochs):
    perm = torch.randperm(len(X_train_t))

    epoch_loss = 0.0
    correct = 0
    total = 0

    for start in range(0, len(X_train_t), batch_size):
        X_batch = X_train_t[perm[start:start + batch_size]]
        y_batch = y_train_t[perm[start:start + batch_size]]

        # Forward pass
        outputs = nn_torch(X_batch)
        loss = criterion(outputs, y_batch)

        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        epoch_loss += loss.item()
        _, predicted = torch.max(outputs, 1)
        correct += (predicted == y_batch).sum().item()
        total += y_batch.size(0)

    accuracy = correct / total
    torch_history["loss"].append(epoch_loss)
    torch_history["accuracy"].append(accuracy)

    if epoch % 5 == 0 or epoch == epochs - 1:
        print(f"  Epoch {epoch:>3d}/{epochs}: Loss = {epoch_loss:.4f}, Accuracy = {accuracy:.1%}")

print(f"\nFinal Training Accuracy: {torch_history['accuracy'][-1]:.1%}")

Step 5: Compare Training Progress¶

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss comparison
axes[0].plot(neuro_history["loss"], label="Neurogebra", linewidth=2, color='#FF6B6B')
axes[0].plot(torch_history["loss"], label="PyTorch", linewidth=2, color='#4ECDC4')
axes[0].set_xlabel("Epoch")
axes[0].set_ylabel("Loss")
axes[0].set_title("Training Loss Comparison")
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Accuracy comparison
axes[1].plot(neuro_history["accuracy"], label="Neurogebra", linewidth=2, color='#FF6B6B')
axes[1].plot(torch_history["accuracy"], label="PyTorch", linewidth=2, color='#4ECDC4')
axes[1].set_xlabel("Epoch")
axes[1].set_ylabel("Accuracy")
axes[1].set_title("Training Accuracy Comparison")
axes[1].legend()
axes[1].grid(True, alpha=0.3)
axes[1].set_ylim(0, 1.05)

plt.tight_layout()
plt.show()

Step 6: Evaluate on Test Data¶

NeurogebraPyTorch

# Neurogebra test evaluation
neuro_correct = 0
neuro_predictions = []

for i in range(len(X_test)):
    x_input = [Value(float(X_test[i, 0])), Value(float(X_test[i, 1]))]
    scores = nn_neuro(x_input)
    predicted = max(range(3), key=lambda j: scores[j].data)
    neuro_predictions.append(predicted)
    neuro_correct += (predicted == y_test[i])

neuro_test_acc = neuro_correct / len(X_test)
print(f"Neurogebra Test Accuracy: {neuro_test_acc:.1%}")

# PyTorch test evaluation
X_test_t = torch.tensor(X_test, dtype=torch.float32)
y_test_t = torch.tensor(y_test, dtype=torch.long)

with torch.no_grad():
    outputs = nn_torch(X_test_t)
    _, torch_predictions = torch.max(outputs, 1)

torch_test_acc = (torch_predictions == y_test_t).float().mean().item()
torch_predictions = torch_predictions.numpy()
print(f"PyTorch Test Accuracy: {torch_test_acc:.1%}")

Step 7: Visualize Decision Boundaries¶

This is the most satisfying visualization — see how the network learned to separate spirals:

def plot_decision_boundary(predict_fn, X, y, title, ax):
    """Plot the decision boundary of a classifier."""
    h = 0.02  # Step size
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5

    xx, yy = np.meshgrid(
        np.arange(x_min, x_max, h),
        np.arange(y_min, y_max, h)
    )

    # Predict on grid
    grid_points = np.c_[xx.ravel(), yy.ravel()]
    Z = np.array([predict_fn(p) for p in grid_points])
    Z = Z.reshape(xx.shape)

    # Plot
    ax.contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')

    colors = ['#FF6B6B', '#4ECDC4', '#45B7D1']
    for i in range(3):
        mask = y == i
        ax.scatter(X[mask, 0], X[mask, 1], c=colors[i], 
                  label=f'Class {i}', edgecolors='k', linewidth=0.5, s=30)

    ax.set_title(title, fontsize=13, fontweight='bold')
    ax.legend(loc='upper right')
    ax.set_xlabel("x₁")
    ax.set_ylabel("x₂")

# Prediction functions
def neuro_predict(point):
    x = [Value(float(point[0])), Value(float(point[1]))]
    scores = nn_neuro(x)
    return max(range(3), key=lambda i: scores[i].data)

def torch_predict(point):
    x = torch.tensor(point, dtype=torch.float32).unsqueeze(0)
    with torch.no_grad():
        scores = nn_torch(x)
    return torch.argmax(scores).item()

# Plot both decision boundaries
fig, axes = plt.subplots(1, 2, figsize=(16, 7))

plot_decision_boundary(neuro_predict, X_test, y_test, 
                       f"Neurogebra (Acc: {neuro_test_acc:.0%})", axes[0])
plot_decision_boundary(torch_predict, X_test, y_test,
                       f"PyTorch (Acc: {torch_test_acc:.0%})", axes[1])

plt.suptitle("Decision Boundaries — Neural Network Learned to Separate Spirals!", 
             fontsize=15, fontweight='bold')
plt.tight_layout()
plt.show()

Step 8: Deep Dive — What Did the Network Learn?¶

Inspect Neurogebra's Parameters¶

from neurogebra import MathForge

forge = MathForge()

print("=" * 60)
print("INSPECTING WHAT THE NETWORK LEARNED")
print("=" * 60)

# 1. Layer-by-layer parameter statistics
for i, layer in enumerate(nn_neuro.layers):
    weights = [p.data for neuron in layer.neurons for p in neuron.w]
    biases = [neuron.b.data for neuron in layer.neurons]

    print(f"\nLayer {i+1}:")
    print(f"  Neurons:      {len(layer.neurons)}")
    print(f"  Weights:      mean={np.mean(weights):.4f}, std={np.std(weights):.4f}")
    print(f"  Weight range: [{min(weights):.4f}, {max(weights):.4f}]")
    print(f"  Biases:       mean={np.mean(biases):.4f}, std={np.std(biases):.4f}")

# 2. Understanding with MathForge
print("\n" + "=" * 60)
print("THE MATH BEHIND YOUR NETWORK")
print("=" * 60)

relu = forge.get("relu")
print(f"\nReLU activation:     {relu.symbolic_expr}")
print(f"ReLU gradient:       {relu.gradient('x').symbolic_expr}")
print(f"ReLU at x=2:         {relu.eval(x=2.0)}")
print(f"ReLU at x=-2:        {relu.eval(x=-2.0)}")
print(f"ReLU grad at x=2:    {relu.gradient('x').eval(x=2.0)}")
print(f"ReLU grad at x=-2:   {relu.gradient('x').eval(x=-2.0)}")

print(f"\nEach neuron computes:")
print(f"  output = ReLU(w₁·x₁ + w₂·x₂ + b)")
print(f"  = ReLU(weighted_sum + bias)")
print(f"  = max(0, weighted_sum + bias)")

print(f"\nThe final layer computes 3 scores (one per class).")
print(f"Softmax converts scores to probabilities.")
print(f"We predict the class with highest probability.")

Visualize Individual Neuron Activations¶

# See what each neuron in the first layer "looks at"
fig, axes = plt.subplots(2, 4, figsize=(16, 8))

h = 0.05
x_range = np.arange(-1.5, 1.5, h)
y_range = np.arange(-1.5, 1.5, h)
xx, yy = np.meshgrid(x_range, y_range)

for idx, ax in enumerate(axes.flat[:min(8, len(nn_neuro.layers[0].neurons))]):
    neuron = nn_neuro.layers[0].neurons[idx]

    # Compute activation for each point
    Z = np.zeros_like(xx)
    for i in range(len(x_range)):
        for j in range(len(y_range)):
            x_val = Value(float(xx[j, i]))
            y_val = Value(float(yy[j, i]))
            result = neuron([x_val, y_val])
            Z[j, i] = result.data

    im = ax.contourf(xx, yy, Z, levels=20, cmap='viridis')
    ax.set_title(f"Neuron {idx+1}")
    ax.set_xlabel("x₁")
    ax.set_ylabel("x₂")

plt.suptitle("What Each Neuron in Layer 1 Responds To", fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

Educational Insight

Each neuron learns a different linear boundary (a line in 2D space). The ReLU activation makes it respond only on one side of that line. By combining many such neurons across multiple layers, the network can carve out complex curved boundaries — which is how it separates the spirals!

Final Comparison¶

Code Complexity¶

Component	Neurogebra (lines)	PyTorch (lines)
Neural network definition	60	15
Softmax + cross-entropy	15	1
Training loop	35	20
Evaluation	8	5
Total	~120	~40

Understanding Gained¶

What You Understand	Neurogebra	PyTorch
How neurons compute outputs	✅ You wrote it	⚠️ Hidden in `nn.Linear`
How gradients flow backward	✅ You see `.backward()` flow	⚠️ Happens inside autograd
How softmax works	✅ You implemented it	⚠️ Inside `CrossEntropyLoss`
How weights get updated	✅ `p.data -= lr * p.grad`	⚠️ Inside `optimizer.step()`
How decision boundaries form	✅ You can inspect neurons	⚠️ Requires extra tools

When to Use Each¶

Scenario	Use Neurogebra	Use PyTorch
Learning ML concepts	✅
Understanding backprop	✅
Course assignments	✅
Research prototyping	✅	✅
Production ML systems		✅
Large-scale training		✅
GPU acceleration		✅
Pre-trained models		✅

What You Learned in This Project¶

Neural networks are layers of simple neurons stacked together
Each neuron computes: output = activation(w·x + b)
Backpropagation computes gradients by following the chain rule backward
Softmax converts raw scores to probabilities
Cross-entropy measures how wrong the predicted probabilities are
Non-linear activations (ReLU) allow networks to learn curved boundaries
Multiple layers allow increasingly complex decision boundaries
Neurogebra makes every step visible and educational
PyTorch provides speed and convenience for production

Congratulations! 🎉¶

You've completed all three projects! You now understand:

Linear Regression — the foundation of ML
Image Classification — how networks see images
Neural Networks from Scratch — how every component works

Your Learning Path from Here¶

You are here ──────────────────────────────►

Neurogebra (understanding)      PyTorch (production)
├── ✅ Expressions              ├── torchvision
├── ✅ Autograd                 ├── DataLoader
├── ✅ Training                 ├── GPU training
├── ✅ Neural Networks          ├── Pre-trained models
└── ✅ Loss & Optimization      └── Deployment

You have a solid foundation. Whether you continue with Neurogebra for deeper understanding or move to PyTorch for production work, you now know what's actually happening inside the black box.

Back to: Home | API Reference