Artifact Store¶
The scalable.artifacts module provides a protocol-based abstraction
for storing and retrieving workflow artifacts across local and remote
backends.
Overview¶
ArtifactStore— protocol interfaceLocalArtifactStore— filesystem backendFsspecArtifactStore— S3/GCS/memorybuild_artifact_store()— URI-based factory
Usage¶
from scalable.artifacts import build_artifact_store
# Local storage
store = build_artifact_store("./artifacts")
ref = store.put("output.csv", "runs/run-001/output.csv")
print(ref.uri, ref.digest, ref.size_bytes)
# S3 storage (requires scalable[cloud])
store = build_artifact_store("s3://my-bucket/artifacts/")
ref = store.put("model_output/", "runs/run-001/model_output")
# GCS storage
store = build_artifact_store("gs://my-bucket/artifacts/")
Manifest Integration¶
Set project.default_storage in your manifest to configure where artifacts
are stored:
project:
name: my-project
default_storage: s3://my-bucket/scalable-runs/
Or override via the SCALABLE_DEFAULT_STORAGE environment variable.
Remote Cache¶
The artifact store layer also powers the remote cache backend. Enable it with:
export SCALABLE_CACHE_REMOTE=s3://my-bucket/cache/
When enabled, cache results are stored remotely in addition to the local diskcache, allowing cache sharing across machines.
Session Integration¶
Record artifacts during a session for provenance tracking:
from scalable import ScalableSession
session = ScalableSession.from_yaml("scalable.yaml", target="local")
# ... run tasks ...
session.record_artifact("output.csv", kind="result")
Artifact metadata is recorded in the artifacts.jsonl telemetry stream.
See Also¶
Cloud Providers — Cloud providers with remote storage support
Telemetry and Run Reports — Artifact events in run telemetry
Manifest-Driven Workflows — Configuring
project.default_storage