Artifact Store

The scalable.artifacts module provides a protocol-based abstraction for storing and retrieving workflow artifacts across local and remote backends.

Overview

  • ArtifactStore — protocol interface

  • LocalArtifactStore — filesystem backend

  • FsspecArtifactStore — S3/GCS/memory

  • build_artifact_store() — URI-based factory

Usage

from scalable.artifacts import build_artifact_store

# Local storage
store = build_artifact_store("./artifacts")
ref = store.put("output.csv", "runs/run-001/output.csv")
print(ref.uri, ref.digest, ref.size_bytes)

# S3 storage (requires scalable[cloud])
store = build_artifact_store("s3://my-bucket/artifacts/")
ref = store.put("model_output/", "runs/run-001/model_output")

# GCS storage
store = build_artifact_store("gs://my-bucket/artifacts/")

Manifest Integration

Set project.default_storage in your manifest to configure where artifacts are stored:

project:
  name: my-project
  default_storage: s3://my-bucket/scalable-runs/

Or override via the SCALABLE_DEFAULT_STORAGE environment variable.

Remote Cache

The artifact store layer also powers the remote cache backend. Enable it with:

export SCALABLE_CACHE_REMOTE=s3://my-bucket/cache/

When enabled, cache results are stored remotely in addition to the local diskcache, allowing cache sharing across machines.

Session Integration

Record artifacts during a session for provenance tracking:

from scalable import ScalableSession

session = ScalableSession.from_yaml("scalable.yaml", target="local")
# ... run tasks ...
session.record_artifact("output.csv", kind="result")

Artifact metadata is recorded in the artifacts.jsonl telemetry stream.

See Also