.. _beginner_scaling_strategies: ====================================================== Beginner Tutorial 3: How Distributed Computing Works ====================================================== The Big Picture ---------------- You've written a workflow that runs on your laptop with 2 workers. But what happens when your data grows 100Γ— and you need 64 workers on an HPC cluster? Or when you need to burst into the cloud during peak demand? This tutorial explains the **fundamentals of distributed computing** β€” how work gets split up, how multiple machines coordinate, and how Scalable's provider architecture lets you switch between execution backends without changing your code. .. admonition:: πŸ’‘ Key Concept: Why Distribute at All? :class: tip **The fundamental problem:** Some computations take too long on one machine. Consider running 1000 energy scenarios where each takes 5 minutes: * **Sequential (1 CPU):** 1000 Γ— 5 min = 83 hours (3.5 days) * **Parallel (10 CPUs):** 1000 Γ· 10 Γ— 5 min = 8.3 hours * **Parallel (100 CPUs):** 1000 Γ· 100 Γ— 5 min = 50 minutes Distributed computing trades **hardware** for **time**. But it introduces complexity: coordination, communication, failure handling. Scalable manages that complexity for you. What You Will Learn -------------------- By the end of this tutorial you will: * Understand the client-scheduler-worker architecture. * Know the difference between vertical and horizontal scaling. * Grasp concurrency vs. parallelism. * Use the Local, Slurm, and Cloud providers. * Configure manual, adaptive, and objective-driven scaling. * Understand Amdahl's Law and when NOT to distribute. * Monitor scaling decisions through telemetry. Prerequisites -------------- * Completed :ref:`beginner_getting_started` and :ref:`beginner_manifest_system`. * Scalable installed (``pip install scalable``). * For HPC concepts: no cluster needed (follow along conceptually). Key Concepts Explained ----------------------- .. admonition:: πŸ’‘ Key Concept: Vertical vs. Horizontal Scaling :class: tip There are two ways to get more computing power: **Vertical scaling (scale UP):** Get a bigger machine β€” more CPUs, more RAM. Like buying a faster car. * Pros: Simple (no coordination needed), works for any workload * Cons: Expensive, has physical limits (you can't buy a 10,000-core laptop) **Horizontal scaling (scale OUT):** Get more machines working together. Like having a fleet of cars. * Pros: Nearly unlimited capacity, cost-effective * Cons: Requires coordination, not all problems can be split Scalable focuses on **horizontal scaling** β€” distributing work across multiple workers. But the workers themselves can be vertically scaled (bigger instances with more RAM per worker). .. admonition:: πŸ’‘ Key Concept: The Scheduler-Worker Architecture :class: tip Distributed systems typically have three roles: **Client** (you): Submits work and collects results. This is your Python script. **Scheduler** (traffic controller): Receives tasks from clients and assigns them to workers. It tracks which workers are available, which tasks are queued, and which are complete. It makes the decisions about *where* each task runs. **Workers** (the labor force): Actually execute the functions. Each worker is a separate process (or thread) that can run independently. .. code-block:: text β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Client │────────▢│ Scheduler │────────▢│ Worker 1 β”‚ β”‚ (you) β”‚ β”‚ (Dask) │────────▢│ Worker 2 β”‚ β”‚ │◀────────│ │────────▢│ Worker 3 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ submit() assigns tasks executes & gather() tracks state returns results In Scalable: * The **client** is your Python script using ``ScalableSession`` * The **scheduler** is Dask's scheduler (managed automatically) * The **workers** are spawned by the provider (local processes, Slurm jobs, cloud containers, K8s pods) .. admonition:: πŸ’‘ Key Concept: Concurrency vs. Parallelism :class: tip These terms are related but different: **Concurrency:** Multiple tasks *in progress* at the same time (but maybe not literally simultaneous). Like a chef working on 3 dishes β€” chopping for one, checking the oven for another. **Parallelism:** Multiple tasks *executing* at the exact same instant on different CPUs. Like 3 chefs each cooking their own dish simultaneously. * **Threads** give you concurrency (and parallelism for I/O, but not for CPU-bound Python code due to the GIL). * **Processes** give you true parallelism (each has its own Python interpreter and memory space). Scalable supports both modes via the ``processes`` setting in your target. .. admonition:: πŸ’‘ Key Concept: What is an HPC Cluster? :class: tip An **HPC (High-Performance Computing) cluster** is a collection of powerful computers (called "nodes") connected by a fast network, managed by a job scheduler. Key components: * **Login nodes** β€” where you SSH in and submit jobs * **Compute nodes** β€” where actual work runs * **Job scheduler** (e.g., Slurm) β€” queues and allocates jobs to nodes * **Shared filesystem** β€” storage accessible from all nodes **How it works:** You don't directly pick which computer runs your code. Instead, you submit a job request ("I need 4 nodes for 2 hours") and the scheduler finds available resources. .. admonition:: πŸ’‘ Key Concept: What is Slurm? :class: tip **Slurm** (Simple Linux Utility for Resource Management) is the most popular job scheduler for HPC clusters. It's the "traffic controller" that decides when and where your computation runs. Key Slurm concepts: * **Queue/Partition** β€” groups of nodes with similar properties * **Account** β€” billing/allocation identifier for your group * **Walltime** β€” maximum allowed runtime for your job * **Job** β€” a unit of work submitted to the scheduler Scalable's Slurm provider translates your manifest's target configuration into Slurm job submissions automatically. .. admonition:: πŸ’‘ Key Concept: Amdahl's Law :class: tip **Amdahl's Law** says that the speedup from parallelism is limited by the sequential portion of your program. If 90% of your work can be parallelized and 10% must be sequential: * 10 workers β†’ ~5.3Γ— speedup (not 10Γ—) * 100 workers β†’ ~9.2Γ— speedup (not 100Γ—) * 1000 workers β†’ ~9.9Γ— speedup (not 1000Γ—) **Lesson:** Don't throw more workers at a problem than necessary. There's always a point of diminishing returns. Scalable's telemetry helps you find the sweet spot. Step 1: The Provider Architecture ----------------------------------- Scalable separates **what** runs from **where** it runs: .. code-block:: text β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Manifest │────▢│ DeploymentSpec │────▢│ Provider β”‚ β”‚(scalable.yaml)β”‚ β”‚(provider-neutral) β”‚ β”‚ (backend) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Local β”‚ β”‚ Slurm β”‚ β”‚ Cloud β”‚ β”‚ (threads/ β”‚ β”‚ (HPC jobs) β”‚ β”‚ (Fargate/ β”‚ β”‚ processes)β”‚ β”‚ β”‚ β”‚ EC2/GKE) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ .. admonition:: πŸ’‘ Key Concept: Abstraction Layer :class: tip An **abstraction layer** hides complexity behind a simple interface. You interact with the abstraction (the provider API) without knowing the details underneath. **Real-world analogy:** When you flip a light switch, you don't need to know whether your electricity comes from solar panels, a nuclear plant, or a gas turbine. The switch is the abstraction layer. In Scalable, the provider abstraction means your workflow code (``client.submit()``) works identically regardless of whether tasks run locally, on Slurm, or in AWS. Step 2: The Local Provider (Development) ------------------------------------------ The simplest provider runs everything on your machine: .. code-block:: yaml targets: local: provider: local max_workers: 4 threads_per_worker: 2 processes: false containers: none **What each setting controls:** ``max_workers: 4`` The maximum number of parallel executors. With 4 workers, up to 4 tasks can run simultaneously. ``threads_per_worker: 2`` Each worker can handle 2 threads. This matters for I/O-bound tasks (network calls, file reads) that spend time waiting. ``processes: false`` Workers run as threads in a single process (fast startup, shared memory). Set to ``true`` for CPU-bound work that needs to bypass the GIL. ``containers: none`` No containerization β€” functions run in your current Python environment. .. code-block:: python from scalable import ScalableSession session = ScalableSession.from_yaml("./scalable.yaml", target="local") plan = session.plan() client = session.start(plan) # Submit work β€” it runs on local workers futures = [client.submit(my_func, i, tag="analysis") for i in range(20)] results = client.gather(futures) session.close() .. admonition:: Under the Hood :class: hint When you create a ``ScalableSession`` with the local provider: 1. Scalable reads the manifest and parses the ``local`` target 2. It creates a Dask ``LocalCluster`` with the specified workers 3. A Dask ``Client`` connects to the cluster's scheduler 4. Your ``submit()`` calls become Dask ``client.submit()`` calls 5. The scheduler distributes tasks across the local workers 6. Results flow back through the client to your script Step 3: The Slurm Provider (HPC) ---------------------------------- For HPC clusters, the Slurm provider translates your manifest into job submissions: .. code-block:: yaml targets: hpc: provider: slurm queue: batch account: GCIMS walltime: "04:00:00" interface: ib0 .. admonition:: What these settings mean in HPC terms :class: note ``queue: batch`` Which partition (group of nodes) to submit to. Clusters often have ``batch`` (general), ``gpu`` (GPU nodes), ``debug`` (quick, limited). ``account: GCIMS`` Your team's allocation identifier. HPC centers track usage by account for billing and fairness. ``walltime: "04:00:00"`` Maximum runtime (4 hours). The job is killed if it exceeds this. Quoted because ``04:00:00`` looks like a time to YAML. ``interface: ib0`` Network interface for worker communication. ``ib0`` = InfiniBand (high-speed interconnect common in HPC). When you run with ``--target hpc``, Scalable: 1. Generates Slurm job scripts automatically 2. Submits them to the Slurm scheduler 3. Workers start on allocated nodes 4. Your tasks distribute across the HPC workers 5. Results flow back to your client **You don't write Slurm scripts manually** β€” the manifest declares what you need and the provider handles the "how." Step 4: Scaling Strategies ---------------------------- .. admonition:: πŸ’‘ Key Concept: Scaling Strategy :class: tip A **scaling strategy** determines how many workers are active at any time. Options range from fixed (always N workers) to fully dynamic (workers spin up/down based on demand). **Manual (Fixed) Scaling:** .. code-block:: yaml targets: local: provider: local max_workers: 4 # Always exactly 4 workers You decide the worker count upfront. Simple and predictable. **Adaptive Scaling:** .. code-block:: yaml targets: cloud: provider: aws adaptive: minimum: 1 # At least 1 worker always running maximum: 20 # Scale up to 20 when busy .. admonition:: πŸ’‘ Key Concept: Adaptive Scaling :class: tip **Adaptive scaling** automatically adjusts worker count based on workload: * Queue growing β†’ add workers (scale up) * Workers idle β†’ remove workers (scale down) **Benefits:** * Cost efficiency β€” don't pay for idle workers * Responsiveness β€” handle bursts without pre-provisioning * Simplicity β€” no need to predict workload size **Trade-offs:** * Latency β€” spinning up new workers takes time * Thrashing β€” rapid up/down cycles waste resources * Minimum guarantee β€” you need at least some workers ready Scalable implements adaptive scaling with configurable thresholds and cooldown periods to prevent thrashing. **Objective-Driven Scaling:** .. code-block:: python from scalable import ScalableSession session = ScalableSession.from_yaml( "./scalable.yaml", target="cloud", objectives={"budget_usd": 50.0, "deadline_hours": 2.0}, ) .. admonition:: πŸ’‘ Key Concept: Objective-Driven Planning :class: tip **Objective-driven planning** lets you specify goals (budget, deadline) and Scalable figures out the optimal resource allocation: * "I have $50 and need results in 2 hours" β†’ Scalable calculates how many workers fit within budget and meet the deadline * Based on telemetry history, it predicts task duration and scales accordingly This is the most sophisticated scaling mode β€” it requires telemetry history to make predictions. Step 5: Monitoring Scaling Decisions -------------------------------------- Every scaling decision is recorded in telemetry: .. code-block:: python from scalable import ScalableSession session = ScalableSession.from_yaml("./scalable.yaml", target="local") plan = session.plan() client = session.start(plan) # After your run, check what happened # Telemetry records scaling events: # - worker_added (when a new worker started) # - worker_removed (when a worker was stopped) # - scale_decision (why the system scaled up/down) The ``scalable report`` command summarizes scaling behavior: .. code-block:: bash scalable report --last .. code-block:: text Run: run-20260520T...-demeter-lulcc-abc123 Target: local (provider: local) Workers: peak=4, avg=3.2 Tasks: 20 completed, 0 failed Duration: 12.4s Efficiency: 87% (worker utilization) Step 6: Choosing the Right Strategy -------------------------------------- .. list-table:: :header-rows: 1 :widths: 20 20 30 30 * - Scenario - Strategy - Why - Config * - Development - Fixed (2–4 workers) - Fast startup, predictable - ``max_workers: 4`` * - Batch production - Fixed (many workers) - Known workload size - ``max_workers: 64`` * - Variable workload - Adaptive - Cost-efficient - ``adaptive: {min: 2, max: 50}`` * - Budget-constrained - Objective-driven - Optimize cost/time - ``objectives: {budget_usd: 100}`` .. admonition:: πŸ€” Think About It :class: note If you have 100 independent tasks that each take 1 minute: * 1 worker β†’ 100 minutes * 10 workers β†’ 10 minutes * 100 workers β†’ ~1 minute (plus ~30s startup overhead) * 200 workers β†’ ~1 minute (half the workers sit idle!) The sweet spot depends on task count, task duration, and worker startup cost. Telemetry from past runs helps you find it. Common Questions ----------------- **Q: What if I only have one computer?** The local provider still gives you parallelism through multiple processes or threads. A modern laptop with 8 cores can run 8 workers doing genuine parallel work (with ``processes: true``). **Q: Do workers communicate with each other?** Not directly in most cases. Workers communicate through the scheduler (via futures and results). If Task B depends on Task A's output, the scheduler ensures A completes before B starts, and transfers the result. **Q: What happens if a worker crashes?** Scalable (via Dask) detects the failure and can reassign the task to another worker. Tutorial 7 covers this in detail. **Q: Is there overhead to distributing work?** Yes! Each task has overhead: serialization, network transfer, scheduling decisions. For very small tasks (< 1ms), the overhead exceeds the computation. Rule of thumb: tasks should take at least 100ms to benefit from distribution. **Q: Can I mix providers in one run?** No β€” a single run uses one target (one provider). But you can run the same manifest with different targets for different purposes (dev locally, run in production on HPC). What You Learned ----------------- .. list-table:: :header-rows: 1 :widths: 30 70 * - Term - Definition * - Horizontal Scaling - Adding more machines/workers to handle more work * - Vertical Scaling - Getting a bigger/faster single machine * - Scheduler - Component that assigns tasks to workers * - Worker - Process/thread that executes tasks * - Client - Your script that submits work and collects results * - Concurrency - Multiple tasks in progress (maybe not simultaneous) * - Parallelism - Multiple tasks executing at the same instant * - HPC Cluster - Collection of computers managed by a job scheduler * - Slurm - Popular HPC job scheduler * - Provider - Abstraction over an execution backend * - Adaptive Scaling - Automatically adjusting worker count based on demand * - Amdahl's Law - Parallelism speedup limited by sequential portion * - Abstraction Layer - Simple interface hiding complex implementation details Next Steps ----------- You now understand how distributed computing works and how Scalable's provider architecture makes it portable across environments. * **Next beginner tutorial:** :ref:`beginner_caching` β€” avoid repeating expensive computation * **Standard tutorial:** :ref:`tutorial_scaling_strategies` β€” advanced provider configuration and production scaling patterns * **Experiment:** Change ``max_workers`` in your manifest from 2 to 8. Submit 100 tasks and time the difference. At what point do more workers stop helping?