Telemetry and Run Reports ========================= Scalable v2.0.0 includes a deterministic run history store for manifest-driven sessions. Every run records structured telemetry for debugging, auditing, resource advising, and ML model training. Run directory layout -------------------- Each run is recorded under ``.scalable/runs/``: .. code-block:: text .scalable/ runs/ run---/ manifest.yaml plan.json manifest.lock run.json tasks.jsonl resources.jsonl workers.jsonl failures.jsonl cache.jsonl artifacts.jsonl cost.jsonl summary.json JSONL is the canonical storage format. Optional parquet snapshots are emitted when telemetry parquet support is enabled. Event types ----------- The telemetry system records the following event categories: - **Task events** — submission, start, completion, failure, retry - **Worker events** — launch, ready, lost, removed - **Resource events** — CPU/memory allocation and usage - **Cache events** — hit/miss for ``@cacheable`` decorated functions - **Failure events** — error classification and stack traces - **Artifact events** — output registration and storage references - **Cost events** — cloud provider cost estimates CLI reporting ------------- Generate a report from the most recent run: .. code-block:: bash scalable report --latest Machine-readable report output: .. code-block:: bash scalable report --latest --format json --output report.json Report from a specific run: .. code-block:: bash scalable report --run-id run-20260519T120000Z-project-abc Report options: - ``--runs-dir`` — Custom runs directory (default: ``.scalable/runs``) - ``--run-id`` — Specific run identifier - ``--latest`` — Use most recent run (default when no run-id given) - ``--format`` — Output format (``text`` or ``json``) - ``--output`` — Write to file instead of stdout Session integration ------------------- ``ScalableSession`` automatically initializes and finalizes telemetry for manifest-driven runs: .. code-block:: python from scalable import ScalableSession session = ScalableSession.from_yaml("scalable.yaml", target="local") # Telemetry is automatically recorded during the session lifecycle # Record custom artifacts session.record_artifact("output.csv", kind="result") ``ScalableClient.submit`` and ``ScalableClient.map`` emit task lifecycle telemetry through future callbacks when telemetry is active. Configuration ------------- The telemetry system supports these environment variables: * ``SCALABLE_RUNS_DIR`` — Local runs directory (default: ``.scalable/runs``) * ``SCALABLE_TELEMETRY`` — Enable/disable telemetry (default: ``1``) * ``SCALABLE_TELEMETRY_PARQUET`` — Emit parquet snapshots (default: ``0``) * ``SCALABLE_RUNS_DIR_REMOTE`` — Remote storage for telemetry sync (optional) Downstream consumers -------------------- Telemetry data feeds: * :doc:`advising` — heuristic resource recommendations from run history * :doc:`ml` — ML-backed prediction models trained on telemetry features * ``scalable diagnose`` — failure classification and fix suggestions * ``scalable report`` — run summary and cost reporting