๐ญ Training Observatory¶
New in v1.2.1 โ See every neuron fire. Watch every gradient flow. Understand every weight update โ in colour.
The Training Observatory is an advanced training logging and visualization system that brings unprecedented mathematical transparency to neural network training.
Quick Start¶
Add one argument to model.compile() and the Observatory is active:
from neurogebra.builders.model_builder import ModelBuilder
builder = ModelBuilder()
model = builder.Sequential([
builder.Dense(64, activation="relu"),
builder.Dense(32, activation="tanh"),
builder.Dense(1, activation="sigmoid"),
], name="my_model")
model.compile(
loss="binary_crossentropy",
optimizer="adam",
learning_rate=0.01,
log_level="expert", # โ enables the Observatory
)
model.fit(X_train, y_train, epochs=20, batch_size=32)
Log Levels¶
| Level | Value | What You See |
|---|---|---|
"silent" |
0 | Nothing (turn off all logging) |
"basic" |
1 | Epoch-level loss/accuracy, training start/end |
"detailed" |
2 | + Batch-level progress, timing information |
"expert" |
3 | + Layer-by-layer formulas, gradient norms, weight stats |
"debug" |
4 | + Every tensor shape, raw statistics, full computation trace |
Colour Coding¶
The Observatory uses colour to communicate at a glance:
- ๐ข Green โ Healthy: loss decreasing, gradients stable, metrics improving
- ๐ก Yellow โ Warning: something needs attention (high variance, early saturation)
- ๐ด Red โ Danger: vanishing/exploding gradients, diverging loss
- ๐จ White-on-Red โ Critical: NaN/Inf detected, training corrupted
- ๐ฃ Magenta โ Mathematical formulas (forward/backward equations)
- ๐ต Blue โ Informational (progress messages)
- โฌ Dim white โ Supplementary details
Preset Configurations¶
Instead of setting log_level, you can pass a full LogConfig object:
from neurogebra.logging.config import LogConfig
# Choose a preset
config = LogConfig.minimal() # Just epoch progress
config = LogConfig.standard() # + timing + health checks
config = LogConfig.verbose() # Full math depth โ every formula & gradient
config = LogConfig.research() # Everything + auto-export to files
model.compile(loss="mse", optimizer="adam", log_config=config)
Customising a preset¶
config = LogConfig.verbose()
config.health_check_interval = 5 # run diagnostics every 5 epochs
config.export_formats = ["json", "html"]
config.export_dir = "./my_logs"
model.compile(loss="mse", optimizer="adam", log_config=config)
What the Observatory Shows¶
Forward Pass Formulas¶
Forward: aโ = relu(Wโยทx + bโ) โ shape: (32, 64) โ (32, 32)
Forward: aโ = tanh(Wโยทaโ + bโ) โ shape: (32, 32) โ (32, 16)
Forward: ลท = ฯ(Wโยทaโ + bโ) โ shape: (32, 16) โ (32, 1)
Backward Pass Formulas¶
Backward: โL/โWโ = โL/โลท โ ฯ'(zโ) ยท aโแต
Backward: โL/โWโ = โL/โaโ โ tanh'(zโ) ยท aโแต
Backward: โL/โWโ = โL/โaโ โ relu'(zโ) ยท xแต
Health Diagnostics¶
๐จ [CRITICAL] NaN/Inf Detected
NaN values found in training loss!
โ Check for division by zero in your data
โ Reduce learning rate (try 1e-4)
โ Add gradient clipping
โ ๏ธ [WARNING] Overfitting Detected
Validation loss increasing while training loss decreases
โ Add dropout layers (rate 0.2-0.5)
โ Reduce model complexity
โ Increase training data
Export Formats¶
| Format | File | Contents |
|---|---|---|
| JSON | training_log.json |
Full structured event log |
| CSV | metrics.csv |
Epoch-level metrics table |
| HTML | report.html |
Self-contained report with Chart.js graphs |
| Markdown | report.md |
Human-readable training report |
config = LogConfig.research()
config.export_formats = ["json", "csv", "html", "markdown"]
config.export_dir = "./training_logs"
model.compile(loss="mse", optimizer="adam", log_config=config)
model.fit(X, y, epochs=50)
# โ Files saved to ./training_logs/
Standalone Usage¶
You can use the monitoring tools without a model:
from neurogebra.logging.monitors import GradientMonitor
from neurogebra.logging.health_checks import SmartHealthChecker
import numpy as np
# Check gradient health
gm = GradientMonitor()
stats = gm.record("my_layer", np.random.randn(64, 32) * 0.001)
print(stats["status"]) # "healthy" | "danger" | "critical"
# Diagnose training history
checker = SmartHealthChecker()
alerts = checker.run_all(
epoch=10,
train_losses=[1.0, 0.8, 0.5, 0.3, 0.15],
val_losses=[1.0, 0.9, 0.85, 0.95, 1.1],
)
for alert in alerts:
print(f"[{alert.severity}] {alert.message}")
Next Steps¶
- See the Observatory Deep Dive for advanced usage
- Run the full demo:
python examples/training_observatory_demo.py - Read the API Reference for complete method documentation