Cloud Providers

Scalable v2.0.0 supports cloud-based execution through the scalable[cloud] extra, which provides access to AWS and GCP deployment providers with integrated cost estimation.

Installation

pip install scalable[cloud]

This installs dask-cloudprovider, s3fs, gcsfs, and fsspec.

AWS Provider

The AWSBatchProvider wraps dask-cloudprovider’s FargateCluster or EC2Cluster.

Target options:

  • region: AWS region (default: us-east-1)

  • cluster_type: "fargate" (default) or "ec2"

  • instance_type: EC2 instance type (for cost estimation)

  • image: Docker image for workers

  • n_workers: Initial worker count

  • worker_cpu: CPU units per worker (Fargate: 256-4096)

  • worker_mem: Memory in MiB per worker

  • vpc: VPC identifier

  • subnets: List of subnet IDs

  • security_groups: List of security group IDs

  • execution_role_arn: ECS execution role ARN

  • task_role_arn: ECS task role ARN

  • adaptive: Dict with minimum and maximum for adaptive scaling

Example manifest:

# Scalable manifest targeting AWS Fargate
# Requires: pip install scalable[cloud]
#
# Demeter LULCC running example — see docs/tutorials/demeter_setup.rst
# for the full setup instructions and docs/examples/scalable.demeter.yaml
# for the canonical multi-target version.

version: 1
project:
  name: demeter-lulcc-aws
  default_storage: s3://my-bucket/scalable-runs/

targets:
  aws:
    provider: aws
    region: us-east-1
    cluster_type: fargate
    instance_type: m5.xlarge
    worker_cpu: 4096
    worker_mem: 16384
    image: 123456789.dkr.ecr.us-east-1.amazonaws.com/demeter:2.0.1
    execution_role_arn: arn:aws:iam::123456789:role/ecsTaskExecutionRole
    task_role_arn: arn:aws:iam::123456789:role/ecsTaskRole
    subnets:
      - subnet-abc123
      - subnet-def456
    security_groups:
      - sg-xyz789
    adaptive:
      minimum: 1
      maximum: 10

components:
  demeter:
    image: 123456789.dkr.ecr.us-east-1.amazonaws.com/demeter:2.0.1
    cpus: 4
    memory: 16G
    tags: [lulcc, downscaling, gcam]

  postprocess:
    cpus: 2
    memory: 8G
    tags: [lulcc, aggregation]

tasks:
  run_demeter_scenario:
    component: demeter
    cache: true
    outputs:
      output_dir: dir

  aggregate_demeter_outputs:
    component: postprocess
    cache: true

GCP Provider (Scaffold)

The GCPProvider is a validation-only scaffold. It validates manifest options but raises NotImplementedError on build_cluster().

Target options:

  • region: GCP region

  • project_id: GCP project identifier

  • instance_type: GCE machine type (for cost estimation)

  • image: Container image

  • n_workers: Worker count

Cost Estimation

Cloud providers include static cost tables for common instance types. Run scalable run --dry-run to see estimated costs:

scalable run scalable.yaml --target aws --dry-run

The cost estimate is also recorded in telemetry (cost.jsonl). See Cost Estimation for detailed cost estimation documentation.

See Also