Beginner Tutorial 10: AI-Assisted Workflow Development¶
The Big Picture¶
Writing configuration files, diagnosing errors, and composing workflows requires expertise — you need to know Scalable’s manifest schema, provider options, component settings, and best practices. What if an AI assistant could help with these tasks?
Scalable includes AI-powered assistants that can onboard new model components, diagnose run failures, explain execution plans, compose workflows from descriptions, and migrate between providers. These assistants work in two modes: a fast deterministic mode (heuristics) and an intelligent LLM-powered mode.
This tutorial explains what LLMs are, how Scalable uses them, and how to leverage AI assistance in your workflow development.
What You Will Learn¶
By the end of this tutorial you will:
Understand what Large Language Models (LLMs) are at a high level.
Know the difference between heuristic and LLM-powered modes.
Use
scalable init-componentto onboard new models.Use
scalable diagnoseto analyze failures.Use
scalable explainto understand execution plans.Use
scalable composeto generate workflows from descriptions.Use
scalable migrateto convert between providers.Understand when to trust (and verify) AI-generated output.
Prerequisites¶
Completed Beginner Tutorial 1: Your First Workflow and Beginner Tutorial 2: Understanding the Manifest System.
pip install scalable[ai](installsjinja2,rich).Tutorial Setup: Run the Demeter Example End-to-End — the running example throughout this tutorial onboards the real Demeter model that lives in
capabilities/demeter.For LLM mode (optional): an API key for OpenAI, or a running Ollama instance.
Heuristic mode works without any AI setup.
Key Concepts Explained¶
💡 Key Concept: What is a Large Language Model (LLM)?
A Large Language Model is an AI system trained on massive amounts of text data that can generate human-like text, answer questions, and perform reasoning tasks.
How LLMs work (simplified):
Trained on billions of words from the internet (books, code, documentation)
Learns patterns: “given this input text, what text is likely to come next?”
At inference time: given your prompt (question), generates a response word by word, each word chosen based on what’s most likely to follow
Examples: ChatGPT (OpenAI), Claude (Anthropic), Llama (Meta), Gemini (Google)
Key properties:
Can generate configuration files, code, explanations
Not deterministic — same input may give slightly different outputs
Can be wrong (hallucination) — always verify output
Requires API access (cloud) or local hardware (Ollama)
💡 Key Concept: Heuristic vs. AI-Powered
Scalable’s assistants work in two modes:
- Heuristic mode (rules-based):
Uses predefined rules, templates, and pattern matching
Deterministic: same input → always same output
Works offline (no API calls)
Fast and free
Best for: CI/CD pipelines, reproducible outputs, no AI budget
- LLM-enhanced mode (AI-powered):
Uses an LLM for intelligent generation and reasoning
Non-deterministic: may give slightly different outputs
Requires API access (and costs money per call)
Slower but more flexible
Best for: creative composition, complex diagnosis, migration
Why both? Heuristic mode ensures Scalable works without external dependencies. LLM mode adds intelligence for complex tasks. The system gracefully degrades: if the LLM is unavailable, it falls back to heuristics.
💡 Key Concept: Templates
A template is a pre-structured document with placeholders that get filled in with specific values. Think of it like a form letter:
Dear {{ name }},
Your order of {{ item }} will arrive on {{ date }}.
In Scalable’s AI assistants:
Heuristic mode uses templates extensively (predictable, fast)
LLM mode uses templates as “prompts” — instructions to the AI about what to generate
Templates use Jinja2 syntax ({{ variable }}, {% if %})
which is the most popular Python templating language.
💡 Key Concept: Prompt Engineering
Prompt engineering is the art of crafting inputs to LLMs to get desired outputs. LLMs are sensitive to how you ask:
- Bad prompt:
“Make me a manifest”
- Good prompt:
“Generate a Scalable manifest for an energy modeling workflow with: - 2 targets: local (4 workers) and AWS Fargate - 1 component: demeter (4 CPUs, 16GB RAM, Apptainer container) - 1 task: run_demeter_scenario bound to demeter”
Scalable’s AI assistants handle prompt engineering internally — they construct detailed prompts from your high-level commands.
💡 Key Concept: Code Generation
Code generation is using AI to automatically write code or configuration. In Scalable’s context:
Generate manifest YAML from descriptions
Generate component definitions from model documentation
Generate migration plans between providers
Trust but verify: AI-generated code should always be reviewed by a human. It might be syntactically correct but semantically wrong (e.g., reasonable-looking but incorrect resource allocations).
💡 Key Concept: Deterministic vs. Non-Deterministic
- Deterministic: Same input always produces the same output.
2 + 2 = 4(always). Heuristic mode is deterministic.- Non-deterministic: Same input may produce different outputs.
LLMs generate different text each time (due to random sampling in the generation process). LLM mode is non-deterministic.
Why this matters:
For CI/CD and testing → use heuristic mode (reproducible)
For creative tasks → LLM mode is fine (you review the output anyway)
💡 Key Concept: API (Application Programming Interface)
An API is a standardized way for programs to communicate. When Scalable uses OpenAI’s LLM, it sends a request to OpenAI’s API (over the internet) and receives the LLM’s response.
Your computer OpenAI servers
┌──────────┐ HTTP request ┌──────────────┐
│ Scalable │───────────────────▶│ GPT-4 model │
│ │◀───────────────────│ │
└──────────┘ JSON response └──────────────┘
API keys authenticate you (prove you’re allowed to use the service). Each API call costs money (typically fractions of a cent).
Step 1: Choosing Your Mode¶
Configure the AI backend via environment variable or .env file:
# Heuristic mode (default, no AI required)
export SCALABLE_AI_BACKEND=none
# OpenAI mode (requires API key)
export SCALABLE_AI_BACKEND=openai
export AI_API_KEY=sk-your-key-here
# Ollama mode (local LLM, no cloud dependency)
export SCALABLE_AI_BACKEND=ollama
# (requires Ollama running locally with a model loaded)
For this tutorial, all examples work in heuristic mode (no API key needed). LLM mode enhances the output quality but isn’t required.
Step 2: Onboarding the Demeter Component¶
You’re adding a real model — the Demeter land-use / land-cover disaggregation model — to your pipeline. Instead of writing the component definition manually, let the assistant analyze the cloned repository for you:
scalable init-component ./capabilities/demeter --name demeter --no-ai
Output (heuristic mode):
# Generated component definition
components:
demeter:
image: ghcr.io/jgcri/demeter:2.0.1
cpus: 4
memory: 16G
tags: [lulcc, downscaling, gcam]
mounts:
./demeter_data: /data
env:
DEMETER_DATA: /data
tasks:
run_demeter_scenario:
component: demeter
cache: true
What happened here
The assistant:
Read
setup.pyandrequirements.txtto determine that this is a Python 3.9+ packageDetected
Dockerfile.scalableand proposed a matching image tagInferred tags from the README (“downscaling”, “GCAM”, “land-use”)
Generated matching task bindings with caching enabled (Demeter runs are deterministic per-config, so caching is safe by default)
Suggested a mount for the example data directory created by
demeter.get_package_data(...)
In LLM mode, it could also read the module docstrings to suggest optimal resource allocations per spatial resolution, and generate a preload script that warms the constraint files into memory before the first task executes.
Step 3: Diagnosing Failures¶
When a run fails, the diagnostic assistant helps identify root causes:
scalable diagnose --run run-20260520T...-demeter-lulcc-abc123
Output:
═══════════════════════════════════════
Diagnosis Report
═══════════════════════════════════════
Failures: 3 of 50 tasks
Root Cause Analysis:
────────────────────
1. MEMORY_EXHAUSTION (2 tasks)
Tasks: run_demeter_scenario(ssp1_0p05),
run_demeter_scenario(ssp5_0p05)
Evidence: MemoryError raised inside ProcessStep, peak memory
15.8GB exceeds the 16GB limit. Both scenarios use
``spatial_resolution = 0.05``.
Recommendation: Apply the ``k8s-fine-resolution`` overlay (which
bumps ``demeter.memory`` to 64G) for fine-resolution scenarios.
2. INVALID_INPUT (1 task)
Task: run_demeter_scenario(reference_v3)
Evidence: IOError raised in 0.1s (fast fail pattern):
``constraints/soil_quality.csv not found``.
Recommendation: Add ``constraints/`` to the demeter component's
``mounts:`` block, or copy the file into ``demeter_data/`` before
fan-out.
Suggested Fixes:
────────────────
• Apply overlay to increase memory:
overlays:
fix-oom:
components:
demeter:
memory: 24G
💡 Key Concept: Root Cause Analysis
Root cause analysis means identifying the underlying reason for a failure, not just the symptom.
Symptom: “Task failed with MemoryError”
Root cause: “Component memory (16G) is insufficient for Demeter scenarios at 0.05° resolution, which expand the projected-LU CSV to 500k+ grid cells and need ~20GB during the kernel-density step”
The diagnostic assistant uses patterns in telemetry (failure timing, error types, resource usage) to infer root causes.
Step 4: Explaining Execution Plans¶
Get a human-readable explanation of what a plan will do:
scalable explain ./docs/examples/scalable.demeter.yaml --target aws
Output:
Plan Explanation
═══════════════
This execution plan will:
1. Deploy the demeter-lulcc project to AWS Fargate in us-east-1 region
2. Start with 1 demeter worker, scaling up to 10 based on the scenario
backlog
3. Each demeter worker has 4 vCPUs and 16GB RAM
4. Workers run the ghcr.io/jgcri/demeter:2.0.1 container
5. Per-scenario outputs stored to s3://${ARTIFACT_STORAGE}/demeter-lulcc/
Estimated cost: $4.82 for a 50-scenario run (≈ 2.5 hours of Fargate
compute + S3 storage)
Key decisions:
• Adaptive scaling chosen (min=1, max=10) — cost-efficient because
scenario count is variable
• Fargate selected — no server management overhead
• S3 storage — durable, accessible from any future Demeter run for
comparison
This is especially useful for:
Reviewing a plan before running in production
Explaining to stakeholders what a workflow does
Documenting deployment decisions for team members
Step 5: Composing Workflows from Descriptions¶
The most powerful assistant — generate manifests from natural language:
scalable compose \
--description "Demeter LULCC pipeline that downscales GCAM \
scenarios in parallel (4 CPUs, 16GB RAM, containerized) followed by \
NetCDF aggregation (2 CPUs, 8GB). Needs local and AWS targets with \
adaptive scaling."
Output:
# Generated by scalable compose
version: 1
project:
name: demeter-lulcc
targets:
local:
provider: local
max_workers: 4
threads_per_worker: 1
processes: true
containers: none
aws:
provider: aws
region: us-east-1
cluster_type: fargate
worker_cpu: 4096
worker_mem: 16384
image: ${ECR_DEMETER_IMAGE}
adaptive:
minimum: 1
maximum: 10
components:
demeter:
cpus: 4
memory: 16G
image: ghcr.io/jgcri/demeter:2.0.1
tags: [lulcc, downscaling, gcam]
postprocess:
cpus: 2
memory: 8G
tags: [lulcc, aggregation]
tasks:
run_demeter_scenario:
component: demeter
aggregate_demeter_outputs:
component: postprocess
Heuristic vs. LLM composition
Heuristic mode: Parses your description for keywords (CPUs, memory, provider names) and fills templates. Works well for straightforward requests.
LLM mode: Understands context and nuance. Can handle complex descriptions like “similar to our reference Demeter pipeline but for the SSP1-5 ensemble, with the k8s-fine-resolution overlay applied for spatial_resolution <= 0.1° scenarios.” Generates more tailored output.
Step 6: Migrating Between Providers¶
Moving a workflow from one provider to another:
scalable migrate ./docs/examples/scalable.demeter.yaml \
--from slurm --to kubernetes
Output:
# Migration: slurm → kubernetes
# Changes applied:
targets:
k8s: # Replaces 'hpc' target
provider: kubernetes
namespace: demeter-prod
image: ghcr.io/jgcri/demeter:2.0.1 # already on the demeter component
adaptive:
minimum: 2
maximum: 20 # Mapped from Slurm max_workers
# Migration notes:
# - Slurm 'queue: short' → K8s namespace 'demeter-prod'
# - Slurm 'walltime' → K8s pod activeDeadlineSeconds (no direct equivalent)
# - Slurm 'interface: ib0' → removed (K8s uses pod networking)
# - Apptainer mount './demeter_data:/data' → PVC 'demeter-data-pvc'
# - The hpc-large overlay (demeter.memory: 64G) is preserved as
# k8s-fine-resolution so it can be re-applied per-target.
Why migration is complex
Providers have different capabilities and concepts:
Slurm has queues, walltimes, accounts → no direct K8s equivalent
K8s has namespaces, pod specs, operators → no Slurm equivalent
Cloud has regions, instance types, VPCs → not applicable to HPC
The migration assistant maps concepts where possible and flags differences that require human decision.
Step 7: Human-in-the-Loop Verification¶
💡 Key Concept: Human-in-the-Loop
Human-in-the-loop means AI generates suggestions but a human makes the final decision. This is important because:
AI can generate plausible-looking but incorrect configuration
Resource allocations affect cost and correctness
Provider-specific nuances may be missed
Security implications (IAM roles, network access) need human review
Scalable’s approach: AI generates → human reviews → human applies. All generated output requires explicit confirmation before being used.
Best practices for verifying AI-generated output:
Always validate: Run
scalable validateon generated manifestsDry-run first: Use
--dry-runto see effects without committingCheck resource allocations: Are they sensible for your workload?
Review security: Are IAM roles, images, and network settings correct?
Test locally first: Use
--target localbefore deploying to cloud
Common Questions¶
Q: Do I need to pay for an LLM API to use the AI features?
No! Heuristic mode works without any API key and handles most common cases. LLM mode is an enhancement for complex or creative tasks.
Q: Is the AI generating code that could be insecure?
The AI generates configuration (YAML), not executable code. Always review generated manifests before running, especially for:
Container image sources (trust the registry?)
IAM/permission settings
Network exposure (public vs. private subnets)
Resource allocations (could generate expensive configurations)
Q: How much does LLM mode cost?
Typically $0.01–$0.10 per AI assistant call (depending on the model and
prompt length). The explain command is cheapest (short output). The
compose command is most expensive (longer generation).
Q: Can I use a local LLM instead of OpenAI?
Yes! Set SCALABLE_AI_BACKEND=ollama and run an Ollama instance locally.
This is free (no API costs) but requires a machine with enough RAM for
the model (8–32GB depending on model size).
Q: What if the AI gives a wrong answer?
That’s why validation exists. Generated manifests go through the same
validation as hand-written ones. scalable validate catches structural
errors. Semantic errors (wrong but valid resource allocations) require
human judgment.
Q: Are heuristic outputs always correct?
Heuristic mode is deterministic and template-based, so it’s predictable. But it may not handle edge cases as well as LLM mode. For standard workflows, heuristics work great. For unusual configurations, LLM mode provides better results.
What You Learned¶
Term |
Definition |
|---|---|
Large Language Model (LLM) |
AI trained on text that can generate human-like responses |
Heuristic Mode |
Rule-based, deterministic processing (no AI required) |
LLM-Enhanced Mode |
AI-powered processing with richer understanding |
Template |
Pre-structured document with fill-in-the-blank placeholders |
Prompt Engineering |
Crafting inputs to LLMs to get desired outputs |
Code Generation |
Using AI to automatically write code or configuration |
Deterministic |
Same input always produces the same output |
Non-Deterministic |
Same input may produce different outputs (LLM behavior) |
API |
Standardized interface for programs to communicate |
Human-in-the-Loop |
AI suggests, human decides and validates |
Root Cause Analysis |
Identifying the underlying reason for a failure |
Graceful Degradation |
Falling back to simpler mode when advanced features unavailable |
Next Steps¶
Tutorial Setup: Run the Demeter Example End-to-End — One-time setup (clone, install,
demeter.get_package_data, optional Docker image build) for the examples in this tutorial.
You’ve completed all 10 beginner tutorials! You now have a solid foundation in:
Distributed computing and workflow orchestration
Declarative configuration with manifests
Scaling strategies and provider architecture
Caching and performance optimization
Cloud computing and container technology
Telemetry and observability
Error handling and fault tolerance
Kubernetes and container orchestration
Machine learning for workflow optimization
AI-assisted development
Where to go from here:
Standard tutorials: Work through Tutorials for deeper technical content and production patterns
API documentation: Explore the API for detailed reference
Real project: Apply what you’ve learned to your own workflow!
Community: Contribute improvements via How to Contribute
🎉 Congratulations!
You’ve gone from “what is distributed computing?” to understanding ML optimization and AI-assisted development. The beginner tutorials gave you the conceptual foundation — the standard tutorials and real-world practice will build expertise on top of it.