Beginner Tutorial 3: How Distributed Computing Works¶

The Big Picture¶

You’ve written a workflow that runs on your laptop with 2 workers. But what happens when your data grows 100× and you need 64 workers on an HPC cluster? Or when you need to burst into the cloud during peak demand?

This tutorial explains the fundamentals of distributed computing — how work gets split up, how multiple machines coordinate, and how Scalable’s provider architecture lets you switch between execution backends without changing your code.

💡 Key Concept: Why Distribute at All?

The fundamental problem: Some computations take too long on one machine.

Consider running 1000 energy scenarios where each takes 5 minutes:

Sequential (1 CPU): 1000 × 5 min = 83 hours (3.5 days)
Parallel (10 CPUs): 1000 ÷ 10 × 5 min = 8.3 hours
Parallel (100 CPUs): 1000 ÷ 100 × 5 min = 50 minutes

Distributed computing trades hardware for time. But it introduces complexity: coordination, communication, failure handling. Scalable manages that complexity for you.

What You Will Learn¶

By the end of this tutorial you will:

Understand the client-scheduler-worker architecture.
Know the difference between vertical and horizontal scaling.
Grasp concurrency vs. parallelism.
Use the Local, Slurm, and Cloud providers.
Configure manual, adaptive, and objective-driven scaling.
Understand Amdahl’s Law and when NOT to distribute.
Monitor scaling decisions through telemetry.

Prerequisites¶

Completed Beginner Tutorial 1: Your First Workflow and Beginner Tutorial 2: Understanding the Manifest System.
Scalable installed (pip install scalable).
For HPC concepts: no cluster needed (follow along conceptually).

Key Concepts Explained¶

💡 Key Concept: Vertical vs. Horizontal Scaling

There are two ways to get more computing power:

Vertical scaling (scale UP):

Get a bigger machine — more CPUs, more RAM. Like buying a faster car.

Pros: Simple (no coordination needed), works for any workload
Cons: Expensive, has physical limits (you can’t buy a 10,000-core laptop)

Horizontal scaling (scale OUT):

Get more machines working together. Like having a fleet of cars.

Pros: Nearly unlimited capacity, cost-effective
Cons: Requires coordination, not all problems can be split

Scalable focuses on horizontal scaling — distributing work across multiple workers. But the workers themselves can be vertically scaled (bigger instances with more RAM per worker).

💡 Key Concept: The Scheduler-Worker Architecture

Distributed systems typically have three roles:

Client (you):: Submits work and collects results. This is your Python script.
Scheduler (traffic controller):: Receives tasks from clients and assigns them to workers. It tracks which workers are available, which tasks are queued, and which are complete. It makes the decisions about where each task runs.
Workers (the labor force):: Actually execute the functions. Each worker is a separate process (or thread) that can run independently.

┌──────────┐         ┌────────────┐         ┌──────────┐
│  Client  │────────▶│  Scheduler │────────▶│ Worker 1 │
│ (you)    │         │ (Dask)     │────────▶│ Worker 2 │
│          │◀────────│            │────────▶│ Worker 3 │
└──────────┘         └────────────┘         └──────────┘
   submit()            assigns tasks          executes &
   gather()            tracks state           returns results

In Scalable:

The client is your Python script using ScalableSession
The scheduler is Dask’s scheduler (managed automatically)
The workers are spawned by the provider (local processes, Slurm jobs, cloud containers, K8s pods)

💡 Key Concept: Concurrency vs. Parallelism

These terms are related but different:

Concurrency: Multiple tasks in progress at the same time (but maybe not literally simultaneous). Like a chef working on 3 dishes — chopping for one, checking the oven for another.

Parallelism: Multiple tasks executing at the exact same instant on different CPUs. Like 3 chefs each cooking their own dish simultaneously.

Threads give you concurrency (and parallelism for I/O, but not for CPU-bound Python code due to the GIL).
Processes give you true parallelism (each has its own Python interpreter and memory space).

Scalable supports both modes via the processes setting in your target.

💡 Key Concept: What is an HPC Cluster?

An HPC (High-Performance Computing) cluster is a collection of powerful computers (called “nodes”) connected by a fast network, managed by a job scheduler.

Key components:

Login nodes — where you SSH in and submit jobs
Compute nodes — where actual work runs
Job scheduler (e.g., Slurm) — queues and allocates jobs to nodes
Shared filesystem — storage accessible from all nodes

How it works: You don’t directly pick which computer runs your code. Instead, you submit a job request (“I need 4 nodes for 2 hours”) and the scheduler finds available resources.

💡 Key Concept: What is Slurm?

Slurm (Simple Linux Utility for Resource Management) is the most popular job scheduler for HPC clusters. It’s the “traffic controller” that decides when and where your computation runs.

Key Slurm concepts:

Queue/Partition — groups of nodes with similar properties
Account — billing/allocation identifier for your group
Walltime — maximum allowed runtime for your job
Job — a unit of work submitted to the scheduler

Scalable’s Slurm provider translates your manifest’s target configuration into Slurm job submissions automatically.

💡 Key Concept: Amdahl’s Law

Amdahl’s Law says that the speedup from parallelism is limited by the sequential portion of your program.

If 90% of your work can be parallelized and 10% must be sequential:

10 workers → ~5.3× speedup (not 10×)
100 workers → ~9.2× speedup (not 100×)
1000 workers → ~9.9× speedup (not 1000×)

Lesson: Don’t throw more workers at a problem than necessary. There’s always a point of diminishing returns. Scalable’s telemetry helps you find the sweet spot.

Step 1: The Provider Architecture¶

Scalable separates what runs from where it runs:

┌──────────────┐     ┌──────────────────┐     ┌─────────────┐
│  Manifest    │────▶│ DeploymentSpec    │────▶│  Provider   │
│(scalable.yaml)│    │(provider-neutral) │     │ (backend)   │
└──────────────┘     └──────────────────┘     └──────┬──────┘
                                                      │
                     ┌────────────────────────────────┼────────┐
                     │                                │        │
               ┌─────▼──────┐  ┌──────▼──────┐  ┌───▼────────┐
               │   Local    │  │    Slurm    │  │    Cloud    │
               │ (threads/  │  │  (HPC jobs) │  │  (Fargate/  │
               │  processes)│  │             │  │   EC2/GKE)  │
               └────────────┘  └─────────────┘  └─────────────┘

💡 Key Concept: Abstraction Layer

An abstraction layer hides complexity behind a simple interface. You interact with the abstraction (the provider API) without knowing the details underneath.

Real-world analogy: When you flip a light switch, you don’t need to know whether your electricity comes from solar panels, a nuclear plant, or a gas turbine. The switch is the abstraction layer.

In Scalable, the provider abstraction means your workflow code (client.submit()) works identically regardless of whether tasks run locally, on Slurm, or in AWS.

Step 2: The Local Provider (Development)¶

The simplest provider runs everything on your machine:

targets:
  local:
    provider: local
    max_workers: 4
    threads_per_worker: 2
    processes: false
    containers: none

What each setting controls:

max_workers: 4: The maximum number of parallel executors. With 4 workers, up to 4 tasks can run simultaneously.
threads_per_worker: 2: Each worker can handle 2 threads. This matters for I/O-bound tasks (network calls, file reads) that spend time waiting.
processes: false: Workers run as threads in a single process (fast startup, shared memory). Set to true for CPU-bound work that needs to bypass the GIL.
containers: none: No containerization — functions run in your current Python environment.

from scalable import ScalableSession

session = ScalableSession.from_yaml("./scalable.yaml", target="local")
plan = session.plan()
client = session.start(plan)

# Submit work — it runs on local workers
futures = [client.submit(my_func, i, tag="analysis") for i in range(20)]
results = client.gather(futures)
session.close()

Under the Hood

When you create a ScalableSession with the local provider:

Scalable reads the manifest and parses the local target
It creates a Dask LocalCluster with the specified workers
A Dask Client connects to the cluster’s scheduler
Your submit() calls become Dask client.submit() calls
The scheduler distributes tasks across the local workers
Results flow back through the client to your script

Step 3: The Slurm Provider (HPC)¶

For HPC clusters, the Slurm provider translates your manifest into job submissions:

targets:
  hpc:
    provider: slurm
    queue: batch
    account: GCIMS
    walltime: "04:00:00"
    interface: ib0

What these settings mean in HPC terms

queue: batch: Which partition (group of nodes) to submit to. Clusters often have batch (general), gpu (GPU nodes), debug (quick, limited).
account: GCIMS: Your team’s allocation identifier. HPC centers track usage by account for billing and fairness.
walltime: "04:00:00": Maximum runtime (4 hours). The job is killed if it exceeds this. Quoted because 04:00:00 looks like a time to YAML.
interface: ib0: Network interface for worker communication. ib0 = InfiniBand (high-speed interconnect common in HPC).

When you run with --target hpc, Scalable:

Generates Slurm job scripts automatically
Submits them to the Slurm scheduler
Workers start on allocated nodes
Your tasks distribute across the HPC workers
Results flow back to your client

You don’t write Slurm scripts manually — the manifest declares what you need and the provider handles the “how.”

Step 4: Scaling Strategies¶

💡 Key Concept: Scaling Strategy

A scaling strategy determines how many workers are active at any time. Options range from fixed (always N workers) to fully dynamic (workers spin up/down based on demand).

Manual (Fixed) Scaling:

targets:
  local:
    provider: local
    max_workers: 4     # Always exactly 4 workers

You decide the worker count upfront. Simple and predictable.

Adaptive Scaling:

targets:
  cloud:
    provider: aws
    adaptive:
      minimum: 1       # At least 1 worker always running
      maximum: 20      # Scale up to 20 when busy

💡 Key Concept: Adaptive Scaling

Adaptive scaling automatically adjusts worker count based on workload:

Queue growing → add workers (scale up)
Workers idle → remove workers (scale down)

Benefits:

Cost efficiency — don’t pay for idle workers
Responsiveness — handle bursts without pre-provisioning
Simplicity — no need to predict workload size

Trade-offs:

Latency — spinning up new workers takes time
Thrashing — rapid up/down cycles waste resources
Minimum guarantee — you need at least some workers ready

Scalable implements adaptive scaling with configurable thresholds and cooldown periods to prevent thrashing.

Objective-Driven Scaling:

from scalable import ScalableSession

session = ScalableSession.from_yaml(
    "./scalable.yaml",
    target="cloud",
    objectives={"budget_usd": 50.0, "deadline_hours": 2.0},
)

💡 Key Concept: Objective-Driven Planning

Objective-driven planning lets you specify goals (budget, deadline) and Scalable figures out the optimal resource allocation:

“I have $50 and need results in 2 hours” → Scalable calculates how many workers fit within budget and meet the deadline
Based on telemetry history, it predicts task duration and scales accordingly

This is the most sophisticated scaling mode — it requires telemetry history to make predictions.

Step 5: Monitoring Scaling Decisions¶

Every scaling decision is recorded in telemetry:

from scalable import ScalableSession

session = ScalableSession.from_yaml("./scalable.yaml", target="local")
plan = session.plan()
client = session.start(plan)

# After your run, check what happened
# Telemetry records scaling events:
# - worker_added (when a new worker started)
# - worker_removed (when a worker was stopped)
# - scale_decision (why the system scaled up/down)

The scalable report command summarizes scaling behavior:

scalable report --last

Run: run-20260520T...-demeter-lulcc-abc123
Target: local (provider: local)
Workers: peak=4, avg=3.2
Tasks: 20 completed, 0 failed
Duration: 12.4s
Efficiency: 87% (worker utilization)

Step 6: Choosing the Right Strategy¶

Scenario	Strategy	Why	Config
Development	Fixed (2–4 workers)	Fast startup, predictable	`max_workers: 4`
Batch production	Fixed (many workers)	Known workload size	`max_workers: 64`
Variable workload	Adaptive	Cost-efficient	`adaptive: {min: 2, max: 50}`
Budget-constrained	Objective-driven	Optimize cost/time	`objectives: {budget_usd: 100}`

🤔 Think About It

If you have 100 independent tasks that each take 1 minute:

1 worker → 100 minutes
10 workers → 10 minutes
100 workers → ~1 minute (plus ~30s startup overhead)
200 workers → ~1 minute (half the workers sit idle!)

The sweet spot depends on task count, task duration, and worker startup cost. Telemetry from past runs helps you find it.

Common Questions¶

Q: What if I only have one computer?

The local provider still gives you parallelism through multiple processes or threads. A modern laptop with 8 cores can run 8 workers doing genuine parallel work (with processes: true).

Q: Do workers communicate with each other?

Not directly in most cases. Workers communicate through the scheduler (via futures and results). If Task B depends on Task A’s output, the scheduler ensures A completes before B starts, and transfers the result.

Q: What happens if a worker crashes?

Scalable (via Dask) detects the failure and can reassign the task to another worker. Tutorial 7 covers this in detail.

Q: Is there overhead to distributing work?

Yes! Each task has overhead: serialization, network transfer, scheduling decisions. For very small tasks (< 1ms), the overhead exceeds the computation. Rule of thumb: tasks should take at least 100ms to benefit from distribution.

Q: Can I mix providers in one run?

No — a single run uses one target (one provider). But you can run the same manifest with different targets for different purposes (dev locally, run in production on HPC).

What You Learned¶

Term	Definition
Horizontal Scaling	Adding more machines/workers to handle more work
Vertical Scaling	Getting a bigger/faster single machine
Scheduler	Component that assigns tasks to workers
Worker	Process/thread that executes tasks
Client	Your script that submits work and collects results
Concurrency	Multiple tasks in progress (maybe not simultaneous)
Parallelism	Multiple tasks executing at the same instant
HPC Cluster	Collection of computers managed by a job scheduler
Slurm	Popular HPC job scheduler
Provider	Abstraction over an execution backend
Adaptive Scaling	Automatically adjusting worker count based on demand
Amdahl’s Law	Parallelism speedup limited by sequential portion
Abstraction Layer	Simple interface hiding complex implementation details

Next Steps¶

You now understand how distributed computing works and how Scalable’s provider architecture makes it portable across environments.

Next beginner tutorial: Beginner Tutorial 4: Caching — Avoiding Redundant Work — avoid repeating expensive computation
Standard tutorial: Tutorial 3: Scaling Strategies with Providers — advanced provider configuration and production scaling patterns
Experiment: Change max_workers in your manifest from 2 to 8. Submit 100 tasks and time the difference. At what point do more workers stop helping?