Beginner Tutorial 3: How Distributed Computing Works¶
The Big Picture¶
You’ve written a workflow that runs on your laptop with 2 workers. But what happens when your data grows 100× and you need 64 workers on an HPC cluster? Or when you need to burst into the cloud during peak demand?
This tutorial explains the fundamentals of distributed computing — how work gets split up, how multiple machines coordinate, and how Scalable’s provider architecture lets you switch between execution backends without changing your code.
💡 Key Concept: Why Distribute at All?
The fundamental problem: Some computations take too long on one machine.
Consider running 1000 energy scenarios where each takes 5 minutes:
Sequential (1 CPU): 1000 × 5 min = 83 hours (3.5 days)
Parallel (10 CPUs): 1000 ÷ 10 × 5 min = 8.3 hours
Parallel (100 CPUs): 1000 ÷ 100 × 5 min = 50 minutes
Distributed computing trades hardware for time. But it introduces complexity: coordination, communication, failure handling. Scalable manages that complexity for you.
What You Will Learn¶
By the end of this tutorial you will:
Understand the client-scheduler-worker architecture.
Know the difference between vertical and horizontal scaling.
Grasp concurrency vs. parallelism.
Use the Local, Slurm, and Cloud providers.
Configure manual, adaptive, and objective-driven scaling.
Understand Amdahl’s Law and when NOT to distribute.
Monitor scaling decisions through telemetry.
Prerequisites¶
Completed Beginner Tutorial 1: Your First Workflow and Beginner Tutorial 2: Understanding the Manifest System.
Scalable installed (
pip install scalable).For HPC concepts: no cluster needed (follow along conceptually).
Key Concepts Explained¶
💡 Key Concept: Vertical vs. Horizontal Scaling
There are two ways to get more computing power:
- Vertical scaling (scale UP):
Get a bigger machine — more CPUs, more RAM. Like buying a faster car.
Pros: Simple (no coordination needed), works for any workload
Cons: Expensive, has physical limits (you can’t buy a 10,000-core laptop)
- Horizontal scaling (scale OUT):
Get more machines working together. Like having a fleet of cars.
Pros: Nearly unlimited capacity, cost-effective
Cons: Requires coordination, not all problems can be split
Scalable focuses on horizontal scaling — distributing work across multiple workers. But the workers themselves can be vertically scaled (bigger instances with more RAM per worker).
💡 Key Concept: The Scheduler-Worker Architecture
Distributed systems typically have three roles:
- Client (you):
Submits work and collects results. This is your Python script.
- Scheduler (traffic controller):
Receives tasks from clients and assigns them to workers. It tracks which workers are available, which tasks are queued, and which are complete. It makes the decisions about where each task runs.
- Workers (the labor force):
Actually execute the functions. Each worker is a separate process (or thread) that can run independently.
┌──────────┐ ┌────────────┐ ┌──────────┐
│ Client │────────▶│ Scheduler │────────▶│ Worker 1 │
│ (you) │ │ (Dask) │────────▶│ Worker 2 │
│ │◀────────│ │────────▶│ Worker 3 │
└──────────┘ └────────────┘ └──────────┘
submit() assigns tasks executes &
gather() tracks state returns results
In Scalable:
The client is your Python script using
ScalableSessionThe scheduler is Dask’s scheduler (managed automatically)
The workers are spawned by the provider (local processes, Slurm jobs, cloud containers, K8s pods)
💡 Key Concept: Concurrency vs. Parallelism
These terms are related but different:
Concurrency: Multiple tasks in progress at the same time (but maybe not literally simultaneous). Like a chef working on 3 dishes — chopping for one, checking the oven for another.
Parallelism: Multiple tasks executing at the exact same instant on different CPUs. Like 3 chefs each cooking their own dish simultaneously.
Threads give you concurrency (and parallelism for I/O, but not for CPU-bound Python code due to the GIL).
Processes give you true parallelism (each has its own Python interpreter and memory space).
Scalable supports both modes via the processes setting in your target.
💡 Key Concept: What is an HPC Cluster?
An HPC (High-Performance Computing) cluster is a collection of powerful computers (called “nodes”) connected by a fast network, managed by a job scheduler.
Key components:
Login nodes — where you SSH in and submit jobs
Compute nodes — where actual work runs
Job scheduler (e.g., Slurm) — queues and allocates jobs to nodes
Shared filesystem — storage accessible from all nodes
How it works: You don’t directly pick which computer runs your code. Instead, you submit a job request (“I need 4 nodes for 2 hours”) and the scheduler finds available resources.
💡 Key Concept: What is Slurm?
Slurm (Simple Linux Utility for Resource Management) is the most popular job scheduler for HPC clusters. It’s the “traffic controller” that decides when and where your computation runs.
Key Slurm concepts:
Queue/Partition — groups of nodes with similar properties
Account — billing/allocation identifier for your group
Walltime — maximum allowed runtime for your job
Job — a unit of work submitted to the scheduler
Scalable’s Slurm provider translates your manifest’s target configuration into Slurm job submissions automatically.
💡 Key Concept: Amdahl’s Law
Amdahl’s Law says that the speedup from parallelism is limited by the sequential portion of your program.
If 90% of your work can be parallelized and 10% must be sequential:
10 workers → ~5.3× speedup (not 10×)
100 workers → ~9.2× speedup (not 100×)
1000 workers → ~9.9× speedup (not 1000×)
Lesson: Don’t throw more workers at a problem than necessary. There’s always a point of diminishing returns. Scalable’s telemetry helps you find the sweet spot.
Step 1: The Provider Architecture¶
Scalable separates what runs from where it runs:
┌──────────────┐ ┌──────────────────┐ ┌─────────────┐
│ Manifest │────▶│ DeploymentSpec │────▶│ Provider │
│(scalable.yaml)│ │(provider-neutral) │ │ (backend) │
└──────────────┘ └──────────────────┘ └──────┬──────┘
│
┌────────────────────────────────┼────────┐
│ │ │
┌─────▼──────┐ ┌──────▼──────┐ ┌───▼────────┐
│ Local │ │ Slurm │ │ Cloud │
│ (threads/ │ │ (HPC jobs) │ │ (Fargate/ │
│ processes)│ │ │ │ EC2/GKE) │
└────────────┘ └─────────────┘ └─────────────┘
💡 Key Concept: Abstraction Layer
An abstraction layer hides complexity behind a simple interface. You interact with the abstraction (the provider API) without knowing the details underneath.
Real-world analogy: When you flip a light switch, you don’t need to know whether your electricity comes from solar panels, a nuclear plant, or a gas turbine. The switch is the abstraction layer.
In Scalable, the provider abstraction means your workflow code
(client.submit()) works identically regardless of whether tasks run
locally, on Slurm, or in AWS.
Step 2: The Local Provider (Development)¶
The simplest provider runs everything on your machine:
targets:
local:
provider: local
max_workers: 4
threads_per_worker: 2
processes: false
containers: none
What each setting controls:
max_workers: 4The maximum number of parallel executors. With 4 workers, up to 4 tasks can run simultaneously.
threads_per_worker: 2Each worker can handle 2 threads. This matters for I/O-bound tasks (network calls, file reads) that spend time waiting.
processes: falseWorkers run as threads in a single process (fast startup, shared memory). Set to
truefor CPU-bound work that needs to bypass the GIL.containers: noneNo containerization — functions run in your current Python environment.
from scalable import ScalableSession
session = ScalableSession.from_yaml("./scalable.yaml", target="local")
plan = session.plan()
client = session.start(plan)
# Submit work — it runs on local workers
futures = [client.submit(my_func, i, tag="analysis") for i in range(20)]
results = client.gather(futures)
session.close()
Under the Hood
When you create a ScalableSession with the local provider:
Scalable reads the manifest and parses the
localtargetIt creates a Dask
LocalClusterwith the specified workersA Dask
Clientconnects to the cluster’s schedulerYour
submit()calls become Daskclient.submit()callsThe scheduler distributes tasks across the local workers
Results flow back through the client to your script
Step 3: The Slurm Provider (HPC)¶
For HPC clusters, the Slurm provider translates your manifest into job submissions:
targets:
hpc:
provider: slurm
queue: batch
account: GCIMS
walltime: "04:00:00"
interface: ib0
What these settings mean in HPC terms
queue: batchWhich partition (group of nodes) to submit to. Clusters often have
batch(general),gpu(GPU nodes),debug(quick, limited).account: GCIMSYour team’s allocation identifier. HPC centers track usage by account for billing and fairness.
walltime: "04:00:00"Maximum runtime (4 hours). The job is killed if it exceeds this. Quoted because
04:00:00looks like a time to YAML.interface: ib0Network interface for worker communication.
ib0= InfiniBand (high-speed interconnect common in HPC).
When you run with --target hpc, Scalable:
Generates Slurm job scripts automatically
Submits them to the Slurm scheduler
Workers start on allocated nodes
Your tasks distribute across the HPC workers
Results flow back to your client
You don’t write Slurm scripts manually — the manifest declares what you need and the provider handles the “how.”
Step 4: Scaling Strategies¶
💡 Key Concept: Scaling Strategy
A scaling strategy determines how many workers are active at any time. Options range from fixed (always N workers) to fully dynamic (workers spin up/down based on demand).
Manual (Fixed) Scaling:
targets:
local:
provider: local
max_workers: 4 # Always exactly 4 workers
You decide the worker count upfront. Simple and predictable.
Adaptive Scaling:
targets:
cloud:
provider: aws
adaptive:
minimum: 1 # At least 1 worker always running
maximum: 20 # Scale up to 20 when busy
💡 Key Concept: Adaptive Scaling
Adaptive scaling automatically adjusts worker count based on workload:
Queue growing → add workers (scale up)
Workers idle → remove workers (scale down)
Benefits:
Cost efficiency — don’t pay for idle workers
Responsiveness — handle bursts without pre-provisioning
Simplicity — no need to predict workload size
Trade-offs:
Latency — spinning up new workers takes time
Thrashing — rapid up/down cycles waste resources
Minimum guarantee — you need at least some workers ready
Scalable implements adaptive scaling with configurable thresholds and cooldown periods to prevent thrashing.
Objective-Driven Scaling:
from scalable import ScalableSession
session = ScalableSession.from_yaml(
"./scalable.yaml",
target="cloud",
objectives={"budget_usd": 50.0, "deadline_hours": 2.0},
)
💡 Key Concept: Objective-Driven Planning
Objective-driven planning lets you specify goals (budget, deadline) and Scalable figures out the optimal resource allocation:
“I have $50 and need results in 2 hours” → Scalable calculates how many workers fit within budget and meet the deadline
Based on telemetry history, it predicts task duration and scales accordingly
This is the most sophisticated scaling mode — it requires telemetry history to make predictions.
Step 5: Monitoring Scaling Decisions¶
Every scaling decision is recorded in telemetry:
from scalable import ScalableSession
session = ScalableSession.from_yaml("./scalable.yaml", target="local")
plan = session.plan()
client = session.start(plan)
# After your run, check what happened
# Telemetry records scaling events:
# - worker_added (when a new worker started)
# - worker_removed (when a worker was stopped)
# - scale_decision (why the system scaled up/down)
The scalable report command summarizes scaling behavior:
scalable report --last
Run: run-20260520T...-demeter-lulcc-abc123
Target: local (provider: local)
Workers: peak=4, avg=3.2
Tasks: 20 completed, 0 failed
Duration: 12.4s
Efficiency: 87% (worker utilization)
Step 6: Choosing the Right Strategy¶
Scenario |
Strategy |
Why |
Config |
|---|---|---|---|
Development |
Fixed (2–4 workers) |
Fast startup, predictable |
|
Batch production |
Fixed (many workers) |
Known workload size |
|
Variable workload |
Adaptive |
Cost-efficient |
|
Budget-constrained |
Objective-driven |
Optimize cost/time |
|
🤔 Think About It
If you have 100 independent tasks that each take 1 minute:
1 worker → 100 minutes
10 workers → 10 minutes
100 workers → ~1 minute (plus ~30s startup overhead)
200 workers → ~1 minute (half the workers sit idle!)
The sweet spot depends on task count, task duration, and worker startup cost. Telemetry from past runs helps you find it.
Common Questions¶
Q: What if I only have one computer?
The local provider still gives you parallelism through multiple processes or
threads. A modern laptop with 8 cores can run 8 workers doing genuine
parallel work (with processes: true).
Q: Do workers communicate with each other?
Not directly in most cases. Workers communicate through the scheduler (via futures and results). If Task B depends on Task A’s output, the scheduler ensures A completes before B starts, and transfers the result.
Q: What happens if a worker crashes?
Scalable (via Dask) detects the failure and can reassign the task to another worker. Tutorial 7 covers this in detail.
Q: Is there overhead to distributing work?
Yes! Each task has overhead: serialization, network transfer, scheduling decisions. For very small tasks (< 1ms), the overhead exceeds the computation. Rule of thumb: tasks should take at least 100ms to benefit from distribution.
Q: Can I mix providers in one run?
No — a single run uses one target (one provider). But you can run the same manifest with different targets for different purposes (dev locally, run in production on HPC).
What You Learned¶
Term |
Definition |
|---|---|
Horizontal Scaling |
Adding more machines/workers to handle more work |
Vertical Scaling |
Getting a bigger/faster single machine |
Scheduler |
Component that assigns tasks to workers |
Worker |
Process/thread that executes tasks |
Client |
Your script that submits work and collects results |
Concurrency |
Multiple tasks in progress (maybe not simultaneous) |
Parallelism |
Multiple tasks executing at the same instant |
HPC Cluster |
Collection of computers managed by a job scheduler |
Slurm |
Popular HPC job scheduler |
Provider |
Abstraction over an execution backend |
Adaptive Scaling |
Automatically adjusting worker count based on demand |
Amdahl’s Law |
Parallelism speedup limited by sequential portion |
Abstraction Layer |
Simple interface hiding complex implementation details |
Next Steps¶
You now understand how distributed computing works and how Scalable’s provider architecture makes it portable across environments.
Next beginner tutorial: Beginner Tutorial 4: Caching — Avoiding Redundant Work — avoid repeating expensive computation
Standard tutorial: Tutorial 3: Scaling Strategies with Providers — advanced provider configuration and production scaling patterns
Experiment: Change
max_workersin your manifest from 2 to 8. Submit 100 tasks and time the difference. At what point do more workers stop helping?