Skip to content

๐Ÿ”ญ Training Observatory

New in v1.2.1 โ€” See every neuron fire. Watch every gradient flow. Understand every weight update โ€” in colour.

The Training Observatory is an advanced training logging and visualization system that brings unprecedented mathematical transparency to neural network training.


Quick Start

Add one argument to model.compile() and the Observatory is active:

from neurogebra.builders.model_builder import ModelBuilder

builder = ModelBuilder()
model = builder.Sequential([
    builder.Dense(64, activation="relu"),
    builder.Dense(32, activation="tanh"),
    builder.Dense(1, activation="sigmoid"),
], name="my_model")

model.compile(
    loss="binary_crossentropy",
    optimizer="adam",
    learning_rate=0.01,
    log_level="expert",          # โ† enables the Observatory
)

model.fit(X_train, y_train, epochs=20, batch_size=32)

Log Levels

Level Value What You See
"silent" 0 Nothing (turn off all logging)
"basic" 1 Epoch-level loss/accuracy, training start/end
"detailed" 2 + Batch-level progress, timing information
"expert" 3 + Layer-by-layer formulas, gradient norms, weight stats
"debug" 4 + Every tensor shape, raw statistics, full computation trace

Colour Coding

The Observatory uses colour to communicate at a glance:

  • ๐ŸŸข Green โ€” Healthy: loss decreasing, gradients stable, metrics improving
  • ๐ŸŸก Yellow โ€” Warning: something needs attention (high variance, early saturation)
  • ๐Ÿ”ด Red โ€” Danger: vanishing/exploding gradients, diverging loss
  • ๐Ÿšจ White-on-Red โ€” Critical: NaN/Inf detected, training corrupted
  • ๐ŸŸฃ Magenta โ€” Mathematical formulas (forward/backward equations)
  • ๐Ÿ”ต Blue โ€” Informational (progress messages)
  • โฌœ Dim white โ€” Supplementary details

Preset Configurations

Instead of setting log_level, you can pass a full LogConfig object:

from neurogebra.logging.config import LogConfig

# Choose a preset
config = LogConfig.minimal()     # Just epoch progress
config = LogConfig.standard()    # + timing + health checks
config = LogConfig.verbose()     # Full math depth โ€” every formula & gradient
config = LogConfig.research()    # Everything + auto-export to files

model.compile(loss="mse", optimizer="adam", log_config=config)

Customising a preset

config = LogConfig.verbose()
config.health_check_interval = 5    # run diagnostics every 5 epochs
config.export_formats = ["json", "html"]
config.export_dir = "./my_logs"

model.compile(loss="mse", optimizer="adam", log_config=config)

What the Observatory Shows

Forward Pass Formulas

Forward:  aโ‚ = relu(Wโ‚ยทx + bโ‚)    โ”‚ shape: (32, 64) โ†’ (32, 32)
Forward:  aโ‚‚ = tanh(Wโ‚‚ยทaโ‚ + bโ‚‚)   โ”‚ shape: (32, 32) โ†’ (32, 16)
Forward:  ลท  = ฯƒ(Wโ‚ƒยทaโ‚‚ + bโ‚ƒ)      โ”‚ shape: (32, 16) โ†’ (32, 1)

Backward Pass Formulas

Backward: โˆ‚L/โˆ‚Wโ‚ƒ = โˆ‚L/โˆ‚ลท โŠ™ ฯƒ'(zโ‚ƒ) ยท aโ‚‚แต€
Backward: โˆ‚L/โˆ‚Wโ‚‚ = โˆ‚L/โˆ‚aโ‚‚ โŠ™ tanh'(zโ‚‚) ยท aโ‚แต€
Backward: โˆ‚L/โˆ‚Wโ‚ = โˆ‚L/โˆ‚aโ‚ โŠ™ relu'(zโ‚) ยท xแต€

Health Diagnostics

๐Ÿšจ [CRITICAL] NaN/Inf Detected
   NaN values found in training loss!
   โ†’ Check for division by zero in your data
   โ†’ Reduce learning rate (try 1e-4)
   โ†’ Add gradient clipping

โš ๏ธ  [WARNING] Overfitting Detected
   Validation loss increasing while training loss decreases
   โ†’ Add dropout layers (rate 0.2-0.5)
   โ†’ Reduce model complexity
   โ†’ Increase training data

Export Formats

Format File Contents
JSON training_log.json Full structured event log
CSV metrics.csv Epoch-level metrics table
HTML report.html Self-contained report with Chart.js graphs
Markdown report.md Human-readable training report
config = LogConfig.research()
config.export_formats = ["json", "csv", "html", "markdown"]
config.export_dir = "./training_logs"

model.compile(loss="mse", optimizer="adam", log_config=config)
model.fit(X, y, epochs=50)
# โ†’ Files saved to ./training_logs/

Standalone Usage

You can use the monitoring tools without a model:

from neurogebra.logging.monitors import GradientMonitor
from neurogebra.logging.health_checks import SmartHealthChecker
import numpy as np

# Check gradient health
gm = GradientMonitor()
stats = gm.record("my_layer", np.random.randn(64, 32) * 0.001)
print(stats["status"])   # "healthy" | "danger" | "critical"

# Diagnose training history
checker = SmartHealthChecker()
alerts = checker.run_all(
    epoch=10,
    train_losses=[1.0, 0.8, 0.5, 0.3, 0.15],
    val_losses=[1.0, 0.9, 0.85, 0.95, 1.1],
)
for alert in alerts:
    print(f"[{alert.severity}] {alert.message}")

Next Steps

  • See the Observatory Deep Dive for advanced usage
  • Run the full demo: python examples/training_observatory_demo.py
  • Read the API Reference for complete method documentation