Telemetry and Run Reports

Scalable v2.0.0 includes a deterministic run history store for manifest-driven sessions. Every run records structured telemetry for debugging, auditing, resource advising, and ML model training.

Run directory layout

Each run is recorded under .scalable/runs/:

.scalable/
  runs/
    run-<timestamp>-<project>-<hash>/
      manifest.yaml
      plan.json
      manifest.lock
      run.json
      tasks.jsonl
      resources.jsonl
      workers.jsonl
      failures.jsonl
      cache.jsonl
      artifacts.jsonl
      cost.jsonl
      summary.json

JSONL is the canonical storage format. Optional parquet snapshots are emitted when telemetry parquet support is enabled.

Event types

The telemetry system records the following event categories:

  • Task events — submission, start, completion, failure, retry

  • Worker events — launch, ready, lost, removed

  • Resource events — CPU/memory allocation and usage

  • Cache events — hit/miss for @cacheable decorated functions

  • Failure events — error classification and stack traces

  • Artifact events — output registration and storage references

  • Cost events — cloud provider cost estimates

CLI reporting

Generate a report from the most recent run:

scalable report --latest

Machine-readable report output:

scalable report --latest --format json --output report.json

Report from a specific run:

scalable report --run-id run-20260519T120000Z-project-abc

Report options:

  • --runs-dir — Custom runs directory (default: .scalable/runs)

  • --run-id — Specific run identifier

  • --latest — Use most recent run (default when no run-id given)

  • --format — Output format (text or json)

  • --output — Write to file instead of stdout

Session integration

ScalableSession automatically initializes and finalizes telemetry for manifest-driven runs:

from scalable import ScalableSession

session = ScalableSession.from_yaml("scalable.yaml", target="local")
# Telemetry is automatically recorded during the session lifecycle

# Record custom artifacts
session.record_artifact("output.csv", kind="result")

ScalableClient.submit and ScalableClient.map emit task lifecycle telemetry through future callbacks when telemetry is active.

Configuration

The telemetry system supports these environment variables:

  • SCALABLE_RUNS_DIR — Local runs directory (default: .scalable/runs)

  • SCALABLE_TELEMETRY — Enable/disable telemetry (default: 1)

  • SCALABLE_TELEMETRY_PARQUET — Emit parquet snapshots (default: 0)

  • SCALABLE_RUNS_DIR_REMOTE — Remote storage for telemetry sync (optional)

Downstream consumers

Telemetry data feeds:

  • Deterministic Resource Advising — heuristic resource recommendations from run history

  • ML Optimization — ML-backed prediction models trained on telemetry features

  • scalable diagnose — failure classification and fix suggestions

  • scalable report — run summary and cost reporting