Getting Started with Scalable¶
This guide covers installation, baseline host requirements, and the bootstrap flow used to prepare local and HPC environments.
Installation¶
Install from PyPI using pip.
pip install scalable
For development or local source installs, clone the repository and install it from the checkout.
git clone https://github.com/JGCRI/scalable.git
pip install ./scalable
Development Install (Editable Mode)¶
For local development — where you want code changes to take effect immediately
without reinstalling — clone the repository and install in editable mode
(-e) inside a virtual environment.
# Clone the repository
git clone https://github.com/JGCRI/scalable.git
cd scalable
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # Linux / macOS
# .venv\Scripts\activate # Windows (cmd / PowerShell)
# Install in editable mode with dev/test dependencies
pip install -e ".[dev]"
The -e flag (short for --editable) creates a link from the virtual
environment’s site-packages back to your working tree so that any edits to
source files under scalable/ are reflected immediately — no reinstall
required.
Why use a virtual environment?
A virtual environment isolates project dependencies from your system Python and other projects. This prevents version conflicts and makes dependency management reproducible. Always activate the environment before working on the project:
source .venv/bin/activate # each new terminal session
After installation, verify the setup:
# Confirm the package is installed in editable mode
pip show scalable # Location should point to your clone
python -c "import scalable; print(scalable.__version__)"
# Run the test suite
pytest
Tip
If you only need to run Scalable (not develop it), a plain
pip install ./scalable inside a virtual environment is sufficient and
avoids installing test/lint tooling.
Available extras for development:
Extra |
Contents |
|---|---|
|
Everything in |
|
|
|
AI assistant dependencies ( |
|
ML optimization ( |
|
Cloud providers ( |
|
Kubernetes provider ( |
You can combine extras:
pip install -e ".[dev,ai,ml]"
Optional Extras¶
Scalable provides optional dependency groups for extended features:
# AI assistant features (init-component, diagnose, explain, compose, migrate)
pip install scalable[ai]
# Cloud providers (AWS, GCP) and remote artifact storage
pip install scalable[cloud]
# Kubernetes provider (Dask Kubernetes Operator)
pip install scalable[kubernetes]
# ML optimization (LearnedAdvisor, AdaptiveScaler)
pip install scalable[ml]
# All optional dependencies
pip install scalable[all]
If installation reports that the scripts directory is not in PATH, add the
reported directory to your shell profile.
WARNING: The script scalable_bootstrap.exe is installed in '/path/to/python/scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
For example:
echo "export PATH=\$PATH:/path/to/python/scripts" >> <shell_profile>
source <shell_profile>
This only needs to be done once per environment.
Compatibility Requirements¶
Required and supported tooling:
Local host: Docker (optional for local provider)
HPC scheduler: Slurm
HPC container runtime: Apptainer
Cloud: AWS (Fargate/EC2), GCP (scaffold)
Orchestration: Kubernetes with Dask Operator
Bootstrapping is designed for POSIX-like shells. On Windows, Git Bash is recommended.
Work Directory Setup¶
A dedicated work directory on the HPC host keeps dependencies, runtime assets,
and outputs in a consistent layout. The scalable_bootstrap script prepares
that directory and builds required worker containers.
Using key-based SSH authentication is strongly recommended because bootstrap may open multiple remote sessions. A setup guide is available on this website.
From a local working directory, run:
cd <local_work_dir>
scalable_bootstrap
Follow the interactive prompts. Bootstrap downloads and builds dependencies on both local and HPC systems, then opens an SSH session into the configured HPC work directory.
Inside the prepared environment, python3 starts an interactive session with
Scalable dependencies available. You can also execute scripts directly.
Only files under the configured HPC work directory (and its subdirectories) are available in this execution model.
python3
python3 <filename>.py
If bootstrap is interrupted, rerun scalable_bootstrap. It resumes from the
last valid step and skips completed setup where possible.
Environment Configuration¶
Scalable uses a .env file in your working directory to centralize runtime
configuration — especially AI provider credentials, cache paths, and telemetry
settings.
How .env Loading Works¶
Whenever the scalable package is imported (or any CLI command is run), the
scalable.common module automatically loads .env from the current
working directory using python-dotenv
with override=True. Values in .env therefore take precedence over
pre-existing system environment variables.
Setup Steps¶
Copy the example file from the repository root into your project directory:
cp .env.example .env
Edit
.envand set the values you need. At minimum, configureAI_PROVIDERandAI_API_KEYto enable AI features:AI_PROVIDER=openai AI_API_KEY=sk-your-key-here LLM_MODEL_NAME=gpt-4o
Run Scalable from the directory containing
.env:cd /path/to/your/project # directory containing .env scalable validate ./scalable.yaml scalable compose "Run GCAM then Stitches"
Or in Python:
# .env is loaded automatically on import from scalable import ScalableSession
Where to Place the .env File¶
The file must be in the current working directory at the time Scalable is first imported. Common scenarios:
CLI usage — the directory you
cdinto before runningscalablecommands.Python scripts — the directory from which you run
python your_script.py.Jupyter notebooks — the notebook’s working directory (check with
os.getcwd()).
If your working directory differs from where .env lives (for example, in
notebooks that os.chdir() into temporary directories), use the programmatic
helper before changing directories:
from scalable.common import load_env
load_env("/absolute/path/to/your/.env")
Override Priority¶
Environment variable resolution follows this order (highest → lowest):
SCALABLE_AI_*variables (e.g.,SCALABLE_AI_BACKEND) — Scalable-specific overrides.Generic
AI_*/LLM_*variables (e.g.,AI_PROVIDER,LLM_MODEL_NAME) — typically set in.env.Provider-specific keys (e.g.,
OPENAI_API_KEY) — used as fallback forAI_API_KEY.Built-in defaults (e.g.,
AI_PROVIDER=none,SCALABLE_CACHE_DIR=./cache).
Security¶
Warning
Never commit .env to version control. The repository .gitignore
already excludes it. The bundled .env.example is safe to commit and
serves as a configuration template.
Key Environment Variables¶
AI provider configuration (generic — recommended):
Variable |
Default |
Description |
|---|---|---|
|
|
Provider name ( |
|
(unset) |
Universal API key (works for any provider) |
|
(unset) |
Model name (e.g. |
|
(unset) |
Custom API endpoint (for proxies; xAI auto-configures) |
Core settings:
Variable |
Default |
Description |
|---|---|---|
|
|
Disk cache directory |
|
|
xxhash seed for cache keys |
|
(unset) |
Library log level (e.g. |
|
|
Default manifest path |
|
(unset) |
Default target override |
|
|
Telemetry run directory |
|
|
Enable/disable telemetry ( |
See .env.example in the repository root for the complete template with
inline documentation.
CLI Commands¶
Scalable v2.0.0 provides a full CLI for manifest-driven workflows:
scalable validate ./scalable.yaml
scalable plan ./scalable.yaml --target local --dry-run
scalable run ./scalable.yaml --target local --workflow workflow.py
scalable report --latest
scalable advise --task run_demeter_scenario --target local
scalable init-component ./path/to/model --name gcam
scalable diagnose --latest
scalable explain plan.json
scalable compose "Run GCAM then Stitches"
scalable migrate scalable.yaml --to-provider kubernetes
Next Steps¶
After setup:
New to distributed computing? Start with the Beginner Tutorials for a guided introduction that explains all concepts from first principles.
For declarative workflows, start with Manifest-Driven Workflows and Provider Abstraction.
Use manifest overlays for environment-specific overrides: Manifest Overlays.
Review run telemetry in Telemetry and Run Reports.
Use deterministic history-based recommendations from Deterministic Resource Advising.
For ML-driven optimization, see ML Optimization.
For AI-assisted onboarding and diagnosis, see AI Assistants.
For cloud and Kubernetes targets, see Cloud Providers and Kubernetes Provider.
For artifact storage, see Artifact Store.
Review the API for worker, caching, and function interfaces.
Run examples from demos_section.
Use how_tos_section for targeted implementation guidance.
Report issues at https://github.com/JGCRI/scalable/issues.