Autograd Engine¶
Neurogebra includes a built-in automatic differentiation engine that tracks computations and computes gradients automatically — just like PyTorch's autograd, but simpler and more educational.
What is Autograd?¶
Autograd = Automatic Gradient computation.
Instead of computing derivatives by hand or symbolically, autograd:
- Records every operation you perform (builds a computation graph)
- When you call
.backward(), it walks backwards through the graph - Computes the gradient of each value using the chain rule
The Value Class¶
Value wraps a number and tracks its gradient:
from neurogebra.core.autograd import Value
# Create values
a = Value(2.0)
b = Value(3.0)
print(a) # Value(data=2.0, grad=0.0)
print(b) # Value(data=3.0, grad=0.0)
Forward Pass — Build the Graph¶
from neurogebra.core.autograd import Value
a = Value(2.0)
b = Value(3.0)
# Each operation creates a new Value and records the connection
c = a + b # c = 5.0
d = a * b # d = 6.0
e = c + d # e = 11.0
print(f"a = {a.data}") # 2.0
print(f"b = {b.data}") # 3.0
print(f"c = a + b = {c.data}") # 5.0
print(f"d = a * b = {d.data}") # 6.0
print(f"e = c + d = {e.data}") # 11.0
The computation graph looks like:
a (2.0) ──┬──[+]── c (5.0) ──┐
│ [+]── e (11.0)
b (3.0) ──┼──[+]── c │
│ │
a (2.0) ──┤ │
└──[*]── d (6.0) ──┘
b (3.0) ──┘
Backward Pass — Compute Gradients¶
# Compute de/da and de/db
e.backward()
print(f"de/da = {a.grad}") # 4.0 (from + path: 1, from * path: b=3 → total: 1+3=4)
print(f"de/db = {b.grad}") # 3.0 (from + path: 1, from * path: a=2 → total: 1+2=3)
Let's verify manually:
- \(e = (a + b) + (a \cdot b) = a + b + ab\)
- \(\frac{\partial e}{\partial a} = 1 + b = 1 + 3 = 4\) ✅
- \(\frac{\partial e}{\partial b} = 1 + a = 1 + 2 = 3\) ✅
Supported Operations¶
from neurogebra.core.autograd import Value
x = Value(2.0)
# Basic arithmetic
y = x + 3 # Addition
y = x * 3 # Multiplication
y = x ** 2 # Power
y = x - 1 # Subtraction
y = x / 2 # Division
y = -x # Negation
# Activation functions
y = x.relu() # ReLU
y = x.sigmoid() # Sigmoid
y = x.tanh() # Tanh
y = x.exp() # Exponential
y = x.log() # Natural log
Building a Neuron¶
A single neuron is: \(output = activation(w_1 x_1 + w_2 x_2 + b)\)
from neurogebra.core.autograd import Value
# Inputs
x1 = Value(2.0)
x2 = Value(3.0)
# Learnable parameters
w1 = Value(0.5)
w2 = Value(-0.3)
b = Value(0.1)
# Forward pass
z = w1 * x1 + w2 * x2 + b # Linear: 0.5*2 + (-0.3)*3 + 0.1 = 0.2
output = z.sigmoid() # Activation: σ(0.2) ≈ 0.5498
# Backward pass
output.backward()
print(f"Output: {output.data:.4f}")
print(f"Gradients:")
print(f" dout/dw1 = {w1.grad:.4f}")
print(f" dout/dw2 = {w2.grad:.4f}")
print(f" dout/db = {b.grad:.4f}")
Manual Training Loop with Autograd¶
This is how PyTorch works internally — and now you can see every step:
from neurogebra.core.autograd import Value
import random
# Training data: y = 2x + 1
data = [(1, 3), (2, 5), (3, 7), (4, 9)]
# Learnable parameters
w = Value(0.0)
b = Value(0.0)
learning_rate = 0.01
for epoch in range(100):
total_loss = Value(0.0)
for x_val, y_val in data:
# Forward pass
x = Value(x_val)
y_pred = w * x + b
# Loss (MSE for one sample)
loss = (y_pred - Value(y_val)) ** 2
total_loss = total_loss + loss
# Backward pass
total_loss.backward()
# Update parameters (gradient descent)
w.data -= learning_rate * w.grad
b.data -= learning_rate * b.grad
# Reset gradients for next epoch
w.grad = 0.0
b.grad = 0.0
if epoch % 20 == 0:
print(f"Epoch {epoch:>3}: loss = {total_loss.data:.4f}, w = {w.data:.4f}, b = {b.data:.4f}")
print(f"\nLearned: y = {w.data:.2f}x + {b.data:.2f}")
# Expected: y ≈ 2.00x + 1.00
Building a Mini Neural Network¶
from neurogebra.core.autograd import Value
import random
class Neuron:
def __init__(self, n_inputs):
self.w = [Value(random.uniform(-1, 1)) for _ in range(n_inputs)]
self.b = Value(0.0)
def __call__(self, x):
# w · x + b
act = sum((wi*xi for wi, xi in zip(self.w, x)), self.b)
return act.relu()
def parameters(self):
return self.w + [self.b]
class MLP:
def __init__(self, n_inputs, layer_sizes):
sizes = [n_inputs] + layer_sizes
self.layers = []
for i in range(len(layer_sizes)):
neurons = [Neuron(sizes[i]) for _ in range(sizes[i+1])]
self.layers.append(neurons)
def __call__(self, x):
for layer in self.layers:
x = [neuron(x) for neuron in layer]
return x[0] if len(x) == 1 else x
def parameters(self):
return [p for layer in self.layers for neuron in layer for p in neuron.parameters()]
# Create a tiny network: 2 inputs → 4 hidden → 1 output
random.seed(42)
model = MLP(2, [4, 1])
print(f"Total parameters: {len(model.parameters())}")
# (2+1)*4 + (4+1)*1 = 12 + 5 = 17 parameters
Zero Grad — Why It Matters¶
x = Value(3.0)
y = x ** 2
y.backward()
print(f"After first backward: x.grad = {x.grad}") # 6.0
# If we compute again WITHOUT zeroing:
y = x ** 2
y.backward()
print(f"After second backward: x.grad = {x.grad}") # 12.0 — WRONG! Accumulated!
# Always zero gradients between iterations:
x.zero_grad()
y = x ** 2
y.backward()
print(f"After zero + backward: x.grad = {x.grad}") # 6.0 — Correct!
Summary¶
| Step | What Happens | Code |
|---|---|---|
| Create values | Wrap numbers | x = Value(2.0) |
| Forward pass | Compute result | y = w * x + b |
| Backward pass | Compute gradients | y.backward() |
| Read gradients | See derivatives | w.grad |
| Update weights | Learn | w.data -= lr * w.grad |
| Zero gradients | Reset for next iteration | w.zero_grad() |
Next: Tensors →