Beginner Tutorial 3: How Distributed Computing Works

The Big Picture

You’ve written a workflow that runs on your laptop with 2 workers. But what happens when your data grows 100× and you need 64 workers on an HPC cluster? Or when you need to burst into the cloud during peak demand?

This tutorial explains the fundamentals of distributed computing — how work gets split up, how multiple machines coordinate, and how Scalable’s provider architecture lets you switch between execution backends without changing your code.

💡 Key Concept: Why Distribute at All?

The fundamental problem: Some computations take too long on one machine.

Consider running 1000 energy scenarios where each takes 5 minutes:

  • Sequential (1 CPU): 1000 × 5 min = 83 hours (3.5 days)

  • Parallel (10 CPUs): 1000 ÷ 10 × 5 min = 8.3 hours

  • Parallel (100 CPUs): 1000 ÷ 100 × 5 min = 50 minutes

Distributed computing trades hardware for time. But it introduces complexity: coordination, communication, failure handling. Scalable manages that complexity for you.

What You Will Learn

By the end of this tutorial you will:

  • Understand the client-scheduler-worker architecture.

  • Know the difference between vertical and horizontal scaling.

  • Grasp concurrency vs. parallelism.

  • Use the Local, Slurm, and Cloud providers.

  • Configure manual, adaptive, and objective-driven scaling.

  • Understand Amdahl’s Law and when NOT to distribute.

  • Monitor scaling decisions through telemetry.

Prerequisites

Key Concepts Explained

💡 Key Concept: Vertical vs. Horizontal Scaling

There are two ways to get more computing power:

Vertical scaling (scale UP):

Get a bigger machine — more CPUs, more RAM. Like buying a faster car.

  • Pros: Simple (no coordination needed), works for any workload

  • Cons: Expensive, has physical limits (you can’t buy a 10,000-core laptop)

Horizontal scaling (scale OUT):

Get more machines working together. Like having a fleet of cars.

  • Pros: Nearly unlimited capacity, cost-effective

  • Cons: Requires coordination, not all problems can be split

Scalable focuses on horizontal scaling — distributing work across multiple workers. But the workers themselves can be vertically scaled (bigger instances with more RAM per worker).

💡 Key Concept: The Scheduler-Worker Architecture

Distributed systems typically have three roles:

Client (you):

Submits work and collects results. This is your Python script.

Scheduler (traffic controller):

Receives tasks from clients and assigns them to workers. It tracks which workers are available, which tasks are queued, and which are complete. It makes the decisions about where each task runs.

Workers (the labor force):

Actually execute the functions. Each worker is a separate process (or thread) that can run independently.

┌──────────┐         ┌────────────┐         ┌──────────┐
│  Client  │────────▶│  Scheduler │────────▶│ Worker 1 │
│ (you)    │         │ (Dask)     │────────▶│ Worker 2 │
│          │◀────────│            │────────▶│ Worker 3 │
└──────────┘         └────────────┘         └──────────┘
   submit()            assigns tasks          executes &
   gather()            tracks state           returns results

In Scalable:

  • The client is your Python script using ScalableSession

  • The scheduler is Dask’s scheduler (managed automatically)

  • The workers are spawned by the provider (local processes, Slurm jobs, cloud containers, K8s pods)

💡 Key Concept: Concurrency vs. Parallelism

These terms are related but different:

Concurrency: Multiple tasks in progress at the same time (but maybe not literally simultaneous). Like a chef working on 3 dishes — chopping for one, checking the oven for another.

Parallelism: Multiple tasks executing at the exact same instant on different CPUs. Like 3 chefs each cooking their own dish simultaneously.

  • Threads give you concurrency (and parallelism for I/O, but not for CPU-bound Python code due to the GIL).

  • Processes give you true parallelism (each has its own Python interpreter and memory space).

Scalable supports both modes via the processes setting in your target.

💡 Key Concept: What is an HPC Cluster?

An HPC (High-Performance Computing) cluster is a collection of powerful computers (called “nodes”) connected by a fast network, managed by a job scheduler.

Key components:

  • Login nodes — where you SSH in and submit jobs

  • Compute nodes — where actual work runs

  • Job scheduler (e.g., Slurm) — queues and allocates jobs to nodes

  • Shared filesystem — storage accessible from all nodes

How it works: You don’t directly pick which computer runs your code. Instead, you submit a job request (“I need 4 nodes for 2 hours”) and the scheduler finds available resources.

💡 Key Concept: What is Slurm?

Slurm (Simple Linux Utility for Resource Management) is the most popular job scheduler for HPC clusters. It’s the “traffic controller” that decides when and where your computation runs.

Key Slurm concepts:

  • Queue/Partition — groups of nodes with similar properties

  • Account — billing/allocation identifier for your group

  • Walltime — maximum allowed runtime for your job

  • Job — a unit of work submitted to the scheduler

Scalable’s Slurm provider translates your manifest’s target configuration into Slurm job submissions automatically.

💡 Key Concept: Amdahl’s Law

Amdahl’s Law says that the speedup from parallelism is limited by the sequential portion of your program.

If 90% of your work can be parallelized and 10% must be sequential:

  • 10 workers → ~5.3× speedup (not 10×)

  • 100 workers → ~9.2× speedup (not 100×)

  • 1000 workers → ~9.9× speedup (not 1000×)

Lesson: Don’t throw more workers at a problem than necessary. There’s always a point of diminishing returns. Scalable’s telemetry helps you find the sweet spot.

Step 1: The Provider Architecture

Scalable separates what runs from where it runs:

┌──────────────┐     ┌──────────────────┐     ┌─────────────┐
│  Manifest    │────▶│ DeploymentSpec    │────▶│  Provider   │
│(scalable.yaml)│    │(provider-neutral) │     │ (backend)   │
└──────────────┘     └──────────────────┘     └──────┬──────┘
                                                      │
                     ┌────────────────────────────────┼────────┐
                     │                                │        │
               ┌─────▼──────┐  ┌──────▼──────┐  ┌───▼────────┐
               │   Local    │  │    Slurm    │  │    Cloud    │
               │ (threads/  │  │  (HPC jobs) │  │  (Fargate/  │
               │  processes)│  │             │  │   EC2/GKE)  │
               └────────────┘  └─────────────┘  └─────────────┘

💡 Key Concept: Abstraction Layer

An abstraction layer hides complexity behind a simple interface. You interact with the abstraction (the provider API) without knowing the details underneath.

Real-world analogy: When you flip a light switch, you don’t need to know whether your electricity comes from solar panels, a nuclear plant, or a gas turbine. The switch is the abstraction layer.

In Scalable, the provider abstraction means your workflow code (client.submit()) works identically regardless of whether tasks run locally, on Slurm, or in AWS.

Step 2: The Local Provider (Development)

The simplest provider runs everything on your machine:

targets:
  local:
    provider: local
    max_workers: 4
    threads_per_worker: 2
    processes: false
    containers: none

What each setting controls:

max_workers: 4

The maximum number of parallel executors. With 4 workers, up to 4 tasks can run simultaneously.

threads_per_worker: 2

Each worker can handle 2 threads. This matters for I/O-bound tasks (network calls, file reads) that spend time waiting.

processes: false

Workers run as threads in a single process (fast startup, shared memory). Set to true for CPU-bound work that needs to bypass the GIL.

containers: none

No containerization — functions run in your current Python environment.

from scalable import ScalableSession

session = ScalableSession.from_yaml("./scalable.yaml", target="local")
plan = session.plan()
client = session.start(plan)

# Submit work — it runs on local workers
futures = [client.submit(my_func, i, tag="analysis") for i in range(20)]
results = client.gather(futures)
session.close()

Under the Hood

When you create a ScalableSession with the local provider:

  1. Scalable reads the manifest and parses the local target

  2. It creates a Dask LocalCluster with the specified workers

  3. A Dask Client connects to the cluster’s scheduler

  4. Your submit() calls become Dask client.submit() calls

  5. The scheduler distributes tasks across the local workers

  6. Results flow back through the client to your script

Step 3: The Slurm Provider (HPC)

For HPC clusters, the Slurm provider translates your manifest into job submissions:

targets:
  hpc:
    provider: slurm
    queue: batch
    account: GCIMS
    walltime: "04:00:00"
    interface: ib0

What these settings mean in HPC terms

queue: batch

Which partition (group of nodes) to submit to. Clusters often have batch (general), gpu (GPU nodes), debug (quick, limited).

account: GCIMS

Your team’s allocation identifier. HPC centers track usage by account for billing and fairness.

walltime: "04:00:00"

Maximum runtime (4 hours). The job is killed if it exceeds this. Quoted because 04:00:00 looks like a time to YAML.

interface: ib0

Network interface for worker communication. ib0 = InfiniBand (high-speed interconnect common in HPC).

When you run with --target hpc, Scalable:

  1. Generates Slurm job scripts automatically

  2. Submits them to the Slurm scheduler

  3. Workers start on allocated nodes

  4. Your tasks distribute across the HPC workers

  5. Results flow back to your client

You don’t write Slurm scripts manually — the manifest declares what you need and the provider handles the “how.”

Step 4: Scaling Strategies

💡 Key Concept: Scaling Strategy

A scaling strategy determines how many workers are active at any time. Options range from fixed (always N workers) to fully dynamic (workers spin up/down based on demand).

Manual (Fixed) Scaling:

targets:
  local:
    provider: local
    max_workers: 4     # Always exactly 4 workers

You decide the worker count upfront. Simple and predictable.

Adaptive Scaling:

targets:
  cloud:
    provider: aws
    adaptive:
      minimum: 1       # At least 1 worker always running
      maximum: 20      # Scale up to 20 when busy

💡 Key Concept: Adaptive Scaling

Adaptive scaling automatically adjusts worker count based on workload:

  • Queue growing → add workers (scale up)

  • Workers idle → remove workers (scale down)

Benefits:

  • Cost efficiency — don’t pay for idle workers

  • Responsiveness — handle bursts without pre-provisioning

  • Simplicity — no need to predict workload size

Trade-offs:

  • Latency — spinning up new workers takes time

  • Thrashing — rapid up/down cycles waste resources

  • Minimum guarantee — you need at least some workers ready

Scalable implements adaptive scaling with configurable thresholds and cooldown periods to prevent thrashing.

Objective-Driven Scaling:

from scalable import ScalableSession

session = ScalableSession.from_yaml(
    "./scalable.yaml",
    target="cloud",
    objectives={"budget_usd": 50.0, "deadline_hours": 2.0},
)

💡 Key Concept: Objective-Driven Planning

Objective-driven planning lets you specify goals (budget, deadline) and Scalable figures out the optimal resource allocation:

  • “I have $50 and need results in 2 hours” → Scalable calculates how many workers fit within budget and meet the deadline

  • Based on telemetry history, it predicts task duration and scales accordingly

This is the most sophisticated scaling mode — it requires telemetry history to make predictions.

Step 5: Monitoring Scaling Decisions

Every scaling decision is recorded in telemetry:

from scalable import ScalableSession

session = ScalableSession.from_yaml("./scalable.yaml", target="local")
plan = session.plan()
client = session.start(plan)

# After your run, check what happened
# Telemetry records scaling events:
# - worker_added (when a new worker started)
# - worker_removed (when a worker was stopped)
# - scale_decision (why the system scaled up/down)

The scalable report command summarizes scaling behavior:

scalable report --last
Run: run-20260520T...-demeter-lulcc-abc123
Target: local (provider: local)
Workers: peak=4, avg=3.2
Tasks: 20 completed, 0 failed
Duration: 12.4s
Efficiency: 87% (worker utilization)

Step 6: Choosing the Right Strategy

Scenario

Strategy

Why

Config

Development

Fixed (2–4 workers)

Fast startup, predictable

max_workers: 4

Batch production

Fixed (many workers)

Known workload size

max_workers: 64

Variable workload

Adaptive

Cost-efficient

adaptive: {min: 2, max: 50}

Budget-constrained

Objective-driven

Optimize cost/time

objectives: {budget_usd: 100}

🤔 Think About It

If you have 100 independent tasks that each take 1 minute:

  • 1 worker → 100 minutes

  • 10 workers → 10 minutes

  • 100 workers → ~1 minute (plus ~30s startup overhead)

  • 200 workers → ~1 minute (half the workers sit idle!)

The sweet spot depends on task count, task duration, and worker startup cost. Telemetry from past runs helps you find it.

Common Questions

Q: What if I only have one computer?

The local provider still gives you parallelism through multiple processes or threads. A modern laptop with 8 cores can run 8 workers doing genuine parallel work (with processes: true).

Q: Do workers communicate with each other?

Not directly in most cases. Workers communicate through the scheduler (via futures and results). If Task B depends on Task A’s output, the scheduler ensures A completes before B starts, and transfers the result.

Q: What happens if a worker crashes?

Scalable (via Dask) detects the failure and can reassign the task to another worker. Tutorial 7 covers this in detail.

Q: Is there overhead to distributing work?

Yes! Each task has overhead: serialization, network transfer, scheduling decisions. For very small tasks (< 1ms), the overhead exceeds the computation. Rule of thumb: tasks should take at least 100ms to benefit from distribution.

Q: Can I mix providers in one run?

No — a single run uses one target (one provider). But you can run the same manifest with different targets for different purposes (dev locally, run in production on HPC).

What You Learned

Term

Definition

Horizontal Scaling

Adding more machines/workers to handle more work

Vertical Scaling

Getting a bigger/faster single machine

Scheduler

Component that assigns tasks to workers

Worker

Process/thread that executes tasks

Client

Your script that submits work and collects results

Concurrency

Multiple tasks in progress (maybe not simultaneous)

Parallelism

Multiple tasks executing at the same instant

HPC Cluster

Collection of computers managed by a job scheduler

Slurm

Popular HPC job scheduler

Provider

Abstraction over an execution backend

Adaptive Scaling

Automatically adjusting worker count based on demand

Amdahl’s Law

Parallelism speedup limited by sequential portion

Abstraction Layer

Simple interface hiding complex implementation details

Next Steps

You now understand how distributed computing works and how Scalable’s provider architecture makes it portable across environments.