.. _beginner_ai_composition: ====================================================== Beginner Tutorial 10: AI-Assisted Workflow Development ====================================================== The Big Picture ---------------- Writing configuration files, diagnosing errors, and composing workflows requires expertise β€” you need to know Scalable's manifest schema, provider options, component settings, and best practices. What if an AI assistant could help with these tasks? Scalable includes AI-powered assistants that can onboard new model components, diagnose run failures, explain execution plans, compose workflows from descriptions, and migrate between providers. These assistants work in two modes: a fast deterministic mode (heuristics) and an intelligent LLM-powered mode. This tutorial explains what LLMs are, how Scalable uses them, and how to leverage AI assistance in your workflow development. What You Will Learn -------------------- By the end of this tutorial you will: * Understand what Large Language Models (LLMs) are at a high level. * Know the difference between heuristic and LLM-powered modes. * Use ``scalable init-component`` to onboard new models. * Use ``scalable diagnose`` to analyze failures. * Use ``scalable explain`` to understand execution plans. * Use ``scalable compose`` to generate workflows from descriptions. * Use ``scalable migrate`` to convert between providers. * Understand when to trust (and verify) AI-generated output. Prerequisites -------------- * Completed :ref:`beginner_getting_started` and :ref:`beginner_manifest_system`. * ``pip install scalable[ai]`` (installs ``jinja2``, ``rich``). * :ref:`tutorial_demeter_setup` β€” the running example throughout this tutorial onboards the real `Demeter `_ model that lives in ``capabilities/demeter``. * For LLM mode (optional): an API key for OpenAI, or a running Ollama instance. * Heuristic mode works without any AI setup. Key Concepts Explained ----------------------- .. admonition:: πŸ’‘ Key Concept: What is a Large Language Model (LLM)? :class: tip A **Large Language Model** is an AI system trained on massive amounts of text data that can generate human-like text, answer questions, and perform reasoning tasks. **How LLMs work (simplified):** 1. Trained on billions of words from the internet (books, code, documentation) 2. Learns patterns: "given this input text, what text is likely to come next?" 3. At inference time: given your prompt (question), generates a response word by word, each word chosen based on what's most likely to follow **Examples:** ChatGPT (OpenAI), Claude (Anthropic), Llama (Meta), Gemini (Google) **Key properties:** * Can generate configuration files, code, explanations * Not deterministic β€” same input may give slightly different outputs * Can be wrong (hallucination) β€” always verify output * Requires API access (cloud) or local hardware (Ollama) .. admonition:: πŸ’‘ Key Concept: Heuristic vs. AI-Powered :class: tip Scalable's assistants work in two modes: **Heuristic mode** (rules-based): * Uses predefined rules, templates, and pattern matching * Deterministic: same input β†’ always same output * Works offline (no API calls) * Fast and free * Best for: CI/CD pipelines, reproducible outputs, no AI budget **LLM-enhanced mode** (AI-powered): * Uses an LLM for intelligent generation and reasoning * Non-deterministic: may give slightly different outputs * Requires API access (and costs money per call) * Slower but more flexible * Best for: creative composition, complex diagnosis, migration **Why both?** Heuristic mode ensures Scalable works without external dependencies. LLM mode adds intelligence for complex tasks. The system gracefully degrades: if the LLM is unavailable, it falls back to heuristics. .. admonition:: πŸ’‘ Key Concept: Templates :class: tip A **template** is a pre-structured document with placeholders that get filled in with specific values. Think of it like a form letter: .. code-block:: text Dear {{ name }}, Your order of {{ item }} will arrive on {{ date }}. In Scalable's AI assistants: * Heuristic mode uses templates extensively (predictable, fast) * LLM mode uses templates as "prompts" β€” instructions to the AI about what to generate Templates use **Jinja2** syntax (``{{ variable }}``, ``{% if %}``) which is the most popular Python templating language. .. admonition:: πŸ’‘ Key Concept: Prompt Engineering :class: tip **Prompt engineering** is the art of crafting inputs to LLMs to get desired outputs. LLMs are sensitive to how you ask: **Bad prompt:** "Make me a manifest" **Good prompt:** "Generate a Scalable manifest for an energy modeling workflow with: - 2 targets: local (4 workers) and AWS Fargate - 1 component: demeter (4 CPUs, 16GB RAM, Apptainer container) - 1 task: run_demeter_scenario bound to demeter" Scalable's AI assistants handle prompt engineering internally β€” they construct detailed prompts from your high-level commands. .. admonition:: πŸ’‘ Key Concept: Code Generation :class: tip **Code generation** is using AI to automatically write code or configuration. In Scalable's context: * Generate manifest YAML from descriptions * Generate component definitions from model documentation * Generate migration plans between providers **Trust but verify:** AI-generated code should always be reviewed by a human. It might be syntactically correct but semantically wrong (e.g., reasonable-looking but incorrect resource allocations). .. admonition:: πŸ’‘ Key Concept: Deterministic vs. Non-Deterministic :class: tip **Deterministic:** Same input always produces the same output. ``2 + 2 = 4`` (always). Heuristic mode is deterministic. **Non-deterministic:** Same input may produce different outputs. LLMs generate different text each time (due to random sampling in the generation process). LLM mode is non-deterministic. **Why this matters:** * For CI/CD and testing β†’ use heuristic mode (reproducible) * For creative tasks β†’ LLM mode is fine (you review the output anyway) .. admonition:: πŸ’‘ Key Concept: API (Application Programming Interface) :class: tip An **API** is a standardized way for programs to communicate. When Scalable uses OpenAI's LLM, it sends a request to OpenAI's API (over the internet) and receives the LLM's response. .. code-block:: text Your computer OpenAI servers β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” HTTP request β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Scalable │───────────────────▢│ GPT-4 model β”‚ β”‚ │◀───────────────────│ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ JSON response β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ API keys authenticate you (prove you're allowed to use the service). Each API call costs money (typically fractions of a cent). Step 1: Choosing Your Mode ---------------------------- Configure the AI backend via environment variable or ``.env`` file: .. code-block:: bash # Heuristic mode (default, no AI required) export SCALABLE_AI_BACKEND=none # OpenAI mode (requires API key) export SCALABLE_AI_BACKEND=openai export AI_API_KEY=sk-your-key-here # Ollama mode (local LLM, no cloud dependency) export SCALABLE_AI_BACKEND=ollama # (requires Ollama running locally with a model loaded) For this tutorial, all examples work in **heuristic mode** (no API key needed). LLM mode enhances the output quality but isn't required. Step 2: Onboarding the Demeter Component ------------------------------------------- You're adding a real model β€” the `Demeter `_ land-use / land-cover disaggregation model β€” to your pipeline. Instead of writing the component definition manually, let the assistant analyze the cloned repository for you: .. code-block:: bash scalable init-component ./capabilities/demeter --name demeter --no-ai Output (heuristic mode): .. code-block:: yaml # Generated component definition components: demeter: image: ghcr.io/jgcri/demeter:2.0.1 cpus: 4 memory: 16G tags: [lulcc, downscaling, gcam] mounts: ./demeter_data: /data env: DEMETER_DATA: /data tasks: run_demeter_scenario: component: demeter cache: true .. admonition:: What happened here :class: note The assistant: 1. Read ``setup.py`` and ``requirements.txt`` to determine that this is a Python 3.9+ package 2. Detected ``Dockerfile.scalable`` and proposed a matching image tag 3. Inferred tags from the README ("downscaling", "GCAM", "land-use") 4. Generated matching task bindings with caching enabled (Demeter runs are deterministic per-config, so caching is safe by default) 5. Suggested a mount for the example data directory created by ``demeter.get_package_data(...)`` In LLM mode, it could also read the module docstrings to suggest optimal resource allocations per spatial resolution, and generate a preload script that warms the constraint files into memory before the first task executes. Step 3: Diagnosing Failures ------------------------------ When a run fails, the diagnostic assistant helps identify root causes: .. code-block:: bash scalable diagnose --run run-20260520T...-demeter-lulcc-abc123 Output: .. code-block:: text ═══════════════════════════════════════ Diagnosis Report ═══════════════════════════════════════ Failures: 3 of 50 tasks Root Cause Analysis: ──────────────────── 1. MEMORY_EXHAUSTION (2 tasks) Tasks: run_demeter_scenario(ssp1_0p05), run_demeter_scenario(ssp5_0p05) Evidence: MemoryError raised inside ProcessStep, peak memory 15.8GB exceeds the 16GB limit. Both scenarios use ``spatial_resolution = 0.05``. Recommendation: Apply the ``k8s-fine-resolution`` overlay (which bumps ``demeter.memory`` to 64G) for fine-resolution scenarios. 2. INVALID_INPUT (1 task) Task: run_demeter_scenario(reference_v3) Evidence: IOError raised in 0.1s (fast fail pattern): ``constraints/soil_quality.csv not found``. Recommendation: Add ``constraints/`` to the demeter component's ``mounts:`` block, or copy the file into ``demeter_data/`` before fan-out. Suggested Fixes: ──────────────── β€’ Apply overlay to increase memory: overlays: fix-oom: components: demeter: memory: 24G .. admonition:: πŸ’‘ Key Concept: Root Cause Analysis :class: tip **Root cause analysis** means identifying the underlying reason for a failure, not just the symptom. * Symptom: "Task failed with MemoryError" * Root cause: "Component memory (16G) is insufficient for Demeter scenarios at 0.05Β° resolution, which expand the projected-LU CSV to 500k+ grid cells and need ~20GB during the kernel-density step" The diagnostic assistant uses patterns in telemetry (failure timing, error types, resource usage) to infer root causes. Step 4: Explaining Execution Plans ------------------------------------- Get a human-readable explanation of what a plan will do: .. code-block:: bash scalable explain ./docs/examples/scalable.demeter.yaml --target aws Output: .. code-block:: text Plan Explanation ═══════════════ This execution plan will: 1. Deploy the demeter-lulcc project to AWS Fargate in us-east-1 region 2. Start with 1 demeter worker, scaling up to 10 based on the scenario backlog 3. Each demeter worker has 4 vCPUs and 16GB RAM 4. Workers run the ghcr.io/jgcri/demeter:2.0.1 container 5. Per-scenario outputs stored to s3://${ARTIFACT_STORAGE}/demeter-lulcc/ Estimated cost: $4.82 for a 50-scenario run (β‰ˆ 2.5 hours of Fargate compute + S3 storage) Key decisions: β€’ Adaptive scaling chosen (min=1, max=10) β€” cost-efficient because scenario count is variable β€’ Fargate selected β€” no server management overhead β€’ S3 storage β€” durable, accessible from any future Demeter run for comparison This is especially useful for: * Reviewing a plan before running in production * Explaining to stakeholders what a workflow does * Documenting deployment decisions for team members Step 5: Composing Workflows from Descriptions ------------------------------------------------ The most powerful assistant β€” generate manifests from natural language: .. code-block:: bash scalable compose \ --description "Demeter LULCC pipeline that downscales GCAM \ scenarios in parallel (4 CPUs, 16GB RAM, containerized) followed by \ NetCDF aggregation (2 CPUs, 8GB). Needs local and AWS targets with \ adaptive scaling." Output: .. code-block:: yaml # Generated by scalable compose version: 1 project: name: demeter-lulcc targets: local: provider: local max_workers: 4 threads_per_worker: 1 processes: true containers: none aws: provider: aws region: us-east-1 cluster_type: fargate worker_cpu: 4096 worker_mem: 16384 image: ${ECR_DEMETER_IMAGE} adaptive: minimum: 1 maximum: 10 components: demeter: cpus: 4 memory: 16G image: ghcr.io/jgcri/demeter:2.0.1 tags: [lulcc, downscaling, gcam] postprocess: cpus: 2 memory: 8G tags: [lulcc, aggregation] tasks: run_demeter_scenario: component: demeter aggregate_demeter_outputs: component: postprocess .. admonition:: Heuristic vs. LLM composition :class: note **Heuristic mode:** Parses your description for keywords (CPUs, memory, provider names) and fills templates. Works well for straightforward requests. **LLM mode:** Understands context and nuance. Can handle complex descriptions like "similar to our reference Demeter pipeline but for the SSP1-5 ensemble, with the k8s-fine-resolution overlay applied for spatial_resolution <= 0.1Β° scenarios." Generates more tailored output. Step 6: Migrating Between Providers -------------------------------------- Moving a workflow from one provider to another: .. code-block:: bash scalable migrate ./docs/examples/scalable.demeter.yaml \ --from slurm --to kubernetes Output: .. code-block:: yaml # Migration: slurm β†’ kubernetes # Changes applied: targets: k8s: # Replaces 'hpc' target provider: kubernetes namespace: demeter-prod image: ghcr.io/jgcri/demeter:2.0.1 # already on the demeter component adaptive: minimum: 2 maximum: 20 # Mapped from Slurm max_workers # Migration notes: # - Slurm 'queue: short' β†’ K8s namespace 'demeter-prod' # - Slurm 'walltime' β†’ K8s pod activeDeadlineSeconds (no direct equivalent) # - Slurm 'interface: ib0' β†’ removed (K8s uses pod networking) # - Apptainer mount './demeter_data:/data' β†’ PVC 'demeter-data-pvc' # - The hpc-large overlay (demeter.memory: 64G) is preserved as # k8s-fine-resolution so it can be re-applied per-target. .. admonition:: Why migration is complex :class: hint Providers have different capabilities and concepts: * Slurm has queues, walltimes, accounts β†’ no direct K8s equivalent * K8s has namespaces, pod specs, operators β†’ no Slurm equivalent * Cloud has regions, instance types, VPCs β†’ not applicable to HPC The migration assistant maps concepts where possible and flags differences that require human decision. Step 7: Human-in-the-Loop Verification ----------------------------------------- .. admonition:: πŸ’‘ Key Concept: Human-in-the-Loop :class: tip **Human-in-the-loop** means AI generates suggestions but a human makes the final decision. This is important because: * AI can generate plausible-looking but incorrect configuration * Resource allocations affect cost and correctness * Provider-specific nuances may be missed * Security implications (IAM roles, network access) need human review **Scalable's approach:** AI generates β†’ human reviews β†’ human applies. All generated output requires explicit confirmation before being used. Best practices for verifying AI-generated output: 1. **Always validate:** Run ``scalable validate`` on generated manifests 2. **Dry-run first:** Use ``--dry-run`` to see effects without committing 3. **Check resource allocations:** Are they sensible for your workload? 4. **Review security:** Are IAM roles, images, and network settings correct? 5. **Test locally first:** Use ``--target local`` before deploying to cloud Common Questions ----------------- **Q: Do I need to pay for an LLM API to use the AI features?** No! Heuristic mode works without any API key and handles most common cases. LLM mode is an enhancement for complex or creative tasks. **Q: Is the AI generating code that could be insecure?** The AI generates configuration (YAML), not executable code. Always review generated manifests before running, especially for: * Container image sources (trust the registry?) * IAM/permission settings * Network exposure (public vs. private subnets) * Resource allocations (could generate expensive configurations) **Q: How much does LLM mode cost?** Typically $0.01–$0.10 per AI assistant call (depending on the model and prompt length). The ``explain`` command is cheapest (short output). The ``compose`` command is most expensive (longer generation). **Q: Can I use a local LLM instead of OpenAI?** Yes! Set ``SCALABLE_AI_BACKEND=ollama`` and run an Ollama instance locally. This is free (no API costs) but requires a machine with enough RAM for the model (8–32GB depending on model size). **Q: What if the AI gives a wrong answer?** That's why validation exists. Generated manifests go through the same validation as hand-written ones. ``scalable validate`` catches structural errors. Semantic errors (wrong but valid resource allocations) require human judgment. **Q: Are heuristic outputs always correct?** Heuristic mode is deterministic and template-based, so it's predictable. But it may not handle edge cases as well as LLM mode. For standard workflows, heuristics work great. For unusual configurations, LLM mode provides better results. What You Learned ----------------- .. list-table:: :header-rows: 1 :widths: 30 70 * - Term - Definition * - Large Language Model (LLM) - AI trained on text that can generate human-like responses * - Heuristic Mode - Rule-based, deterministic processing (no AI required) * - LLM-Enhanced Mode - AI-powered processing with richer understanding * - Template - Pre-structured document with fill-in-the-blank placeholders * - Prompt Engineering - Crafting inputs to LLMs to get desired outputs * - Code Generation - Using AI to automatically write code or configuration * - Deterministic - Same input always produces the same output * - Non-Deterministic - Same input may produce different outputs (LLM behavior) * - API - Standardized interface for programs to communicate * - Human-in-the-Loop - AI suggests, human decides and validates * - Root Cause Analysis - Identifying the underlying reason for a failure * - Graceful Degradation - Falling back to simpler mode when advanced features unavailable Next Steps ----------- * :ref:`tutorial_demeter_setup` β€” One-time setup (clone, install, ``demeter.get_package_data``, optional Docker image build) for the examples in this tutorial. You've completed all 10 beginner tutorials! You now have a solid foundation in: * Distributed computing and workflow orchestration * Declarative configuration with manifests * Scaling strategies and provider architecture * Caching and performance optimization * Cloud computing and container technology * Telemetry and observability * Error handling and fault tolerance * Kubernetes and container orchestration * Machine learning for workflow optimization * AI-assisted development **Where to go from here:** * **Standard tutorials:** Work through :ref:`tutorials` for deeper technical content and production patterns * **API documentation:** Explore the :ref:`api_section` for detailed reference * **Real project:** Apply what you've learned to your own workflow! * **Community:** Contribute improvements via :doc:`/how_to_contribute` .. admonition:: πŸŽ‰ Congratulations! :class: note You've gone from "what is distributed computing?" to understanding ML optimization and AI-assisted development. The beginner tutorials gave you the conceptual foundation β€” the standard tutorials and real-world practice will build expertise on top of it.