.. _beginner_ml_emulation:

======================================================
Beginner Tutorial 9: Machine Learning for Smarter Workflows
======================================================

The Big Picture
----------------

After running your workflow many times, you've accumulated telemetry data
showing how tasks perform: which scenarios are fast, which are slow, how much
memory different inputs require. What if a computer could learn these patterns
and predict optimal resource allocations automatically?

This tutorial introduces **machine learning** concepts in the context of
workflow optimization: using past experience to make smarter decisions about
how many workers to start, how much memory to request, and when to scale up
or down.

What You Will Learn
--------------------

By the end of this tutorial you will:

* Understand what machine learning is at a high level.
* Know the difference between training and inference.
* Understand how Scalable's LearnedAdvisor predicts resource needs.
* Understand cross-validation and why it matters.
* Use the AdaptiveScaler for real-time scaling decisions backed by ML.

Prerequisites
--------------

* Completed :ref:`beginner_getting_started`, :ref:`beginner_telemetry`, and
  :ref:`beginner_scaling_strategies`.
* ``pip install scalable[ml]`` (installs scikit-learn, dask-ml).
* At least 5 completed telemetry runs (more history → better predictions).


Key Concepts Explained
-----------------------

.. admonition:: 💡 Key Concept: What is Machine Learning?
   :class: tip

   **Machine learning (ML)** is teaching computers to find patterns in data
   and make predictions without being explicitly programmed with rules.

   **Traditional programming:**
     Human writes rules → computer follows rules

     .. code-block:: text

        IF memory_usage > 8GB THEN allocate 16GB
        IF memory_usage > 16GB THEN allocate 32GB

   **Machine learning:**
     Computer finds rules from data → uses them to predict

     .. code-block:: text

        Training data: [past runs with memory usage patterns]
        ML model learns: "scenarios with >1000 nodes need ~12GB"
        Prediction: "scenario 47 (1200 nodes) → recommend 16GB"

   **Analogy:** A traditional program is like a recipe (follow these steps).
   ML is like learning to cook from experience (after cooking 100 dishes,
   you develop intuition about seasoning, timing, etc.).

.. admonition:: 💡 Key Concept: Training vs. Inference
   :class: tip

   ML has two phases:

   **Training** (learning):
     Feed historical data to an algorithm. The algorithm adjusts its internal
     parameters to fit the patterns in the data.

     * Slow (minutes to hours)
     * Done once (or periodically when new data is available)
     * Requires labeled data (inputs + known correct outputs)

   **Inference** (predicting):
     Use the trained model to make predictions on new inputs.

     * Fast (milliseconds)
     * Done many times
     * Uses the patterns learned during training

   **In Scalable:**

   * **Training** = learning from telemetry history (past run metrics)
   * **Inference** = predicting resource needs for new runs

.. admonition:: 💡 Key Concept: Features
   :class: tip

   **Features** are the input variables that a model uses to make
   predictions. They're the characteristics of your data that the model
   "looks at."

   For Scalable's resource prediction:

   * Task name
   * Number of input data points
   * Historical average duration for this task type
   * Time of day
   * Target provider type

   **Feature engineering** is the process of choosing and transforming raw
   data into useful features. Good features → good predictions.

.. admonition:: 💡 Key Concept: What is a Model?
   :class: tip

   In ML, a **model** is a mathematical function learned from data. It maps
   inputs (features) to outputs (predictions):

   .. code-block:: text

      Model: features → prediction
      Example: [task="demeter", scenarios=50, history_avg=45s] → memory=12GB

   Think of a model as a function that was "written" by the training process
   rather than by a human programmer. The model doesn't understand what it's
   doing — it just captures statistical patterns in the training data.

   Common model types:

   * **Linear regression** — simple, interpretable, assumes linear relationships
   * **Decision tree** — series of if/then rules learned from data
   * **Random forest** — many decision trees that vote on the answer
   * **Gradient boosting** — trees that correct each other's mistakes

   Scalable uses gradient boosting and random forests — they work well for
   tabular data (like telemetry metrics) without much tuning.

.. admonition:: 💡 Key Concept: Confidence
   :class: tip

   **Confidence** quantifies how sure a model is about a prediction.

   .. code-block:: text

      Model prediction: memory = 12GB (confidence: 0.87)

   * High confidence → trust the prediction, use it directly
   * Low confidence → fall back to a safer default (the rule-based advisor)

   Scalable uses confidence to decide whether to apply the ML recommendation
   or fall back to deterministic statistics. You never have to choose
   manually — the system picks whichever is more reliable for the current
   situation.

.. admonition:: 💡 Key Concept: Cross-Validation
   :class: tip

   **Cross-validation** tests model quality by repeatedly splitting data into
   training and testing sets:

   1. Split data into 5 parts (folds)
   2. Train on 4 parts, test on 1
   3. Repeat 5 times (each part is the test set once)
   4. Average the test scores

   This prevents **overfitting** — a model that memorizes the training data
   but fails on new data. Cross-validation estimates how well the model will
   perform on data it hasn't seen.


Step 1: The ResourceAdvisor (Baseline — No ML)
-------------------------------------------------

Before ML, Scalable provides a deterministic, rule-based advisor:

.. code-block:: python

   from scalable import ResourceAdvisor

   advisor = ResourceAdvisor.from_history("./.scalable/runs")
   recommendation = advisor.recommend(task="run_simulation")
   print(recommendation)
   # {'cpus': 4, 'memory': '16G', 'basis': 'p95 of 50 historical runs'}

This uses simple statistics (percentiles) — it works but doesn't learn
complex patterns.


Step 2: The LearnedAdvisor (ML-Powered)
------------------------------------------

The LearnedAdvisor uses machine learning on your telemetry history:

.. code-block:: python

   from scalable import LearnedAdvisor

   # Train on historical telemetry
   advisor = LearnedAdvisor.from_history(
       "./.scalable/runs",
       model_type="gradient_boosting",   # Algorithm choice
   )

   # Predict resources for a new run
   recommendation = advisor.recommend(
       task="run_simulation",
       input_features={"num_nodes": 1200, "scenario_type": "peak_demand"},
   )
   print(recommendation)
   # {'cpus': 2, 'memory': '8G', 'confidence': 0.87}

.. admonition:: What's happening here
   :class: note

   1. ``from_history()`` loads telemetry data from past runs
   2. It extracts features (task names, durations, resource usage)
   3. It trains a gradient boosting model to predict resource needs
   4. ``recommend()`` uses the trained model to predict for new inputs

   The ``confidence: 0.87`` means the model is 87% confident in this
   prediction. High confidence → the prediction is likely accurate.


Step 3: The AdaptiveScaler
----------------------------

The AdaptiveScaler uses ML predictions to decide scaling in real-time:

.. code-block:: python

   from scalable import AdaptiveScaler

   scaler = AdaptiveScaler(
       min_workers=2,
       max_workers=20,
       scale_up_threshold=0.8,    # Scale up when 80% busy
       scale_down_threshold=0.3,  # Scale down when 30% busy
       cooldown_seconds=60,       # Wait 60s between scaling decisions
   )

.. admonition:: How adaptive scaling works with ML
   :class: note

   Without ML: scale based on simple thresholds (queue depth > N → add workers)

   With ML: predict future load based on patterns. If the model predicts a
   burst of heavy tasks coming, scale up BEFORE the queue fills. This reduces
   latency because workers are already ready when tasks arrive.


Step 4: Putting It All Together
---------------------------------

A workflow using ML-informed advising and adaptive scaling:

.. code-block:: python

   from scalable import (
       ScalableSession, LearnedAdvisor, AdaptiveScaler
   )

   # 1. ML-informed resource allocation
   advisor = LearnedAdvisor.from_history("./.scalable/runs")
   recommendation = advisor.recommend(task="run_demeter_scenario")
   print(f"Recommended: {recommendation.resources} (confidence={recommendation.confidence})")

   # 2. Adaptive scaler driven by the ML advisor
   scaler = AdaptiveScaler(
       advisor=advisor,
       min_workers={"demeter": 2},
       max_workers={"demeter": 20},
   )

   # 3. Run with ML-optimized resources
   session = ScalableSession.from_yaml("./scalable.yaml", target="local")
   client = session.start()

   futures = [client.submit(run_demeter_scenario, i, tag="demeter")
              for i in range(100)]
   results = client.gather(futures)

   # The advisor informs initial sizing; the scaler adjusts in real time
   # Telemetry records every scaling decision and its reasoning


Common Questions
-----------------

**Q: Do I need ML expertise to use these features?**

No. Scalable provides sensible defaults. You just need:

* Enough telemetry history (5+ runs for the advisor)
* The ``[ml]`` extra installed

The system handles model selection, training, and evaluation.

**Q: How much data do I need for the LearnedAdvisor?**

Rule of thumb:

* 5 runs → basic predictions (limited accuracy)
* 20+ runs → reliable predictions
* 100+ runs → high accuracy with confidence intervals

More data = better predictions. The system falls back to the rule-based
advisor when insufficient data exists.

**Q: When should I retrain the advisor?**

Retrain after significant changes — new task types, new hardware, or large
shifts in input characteristics. ``LearnedAdvisor.from_history()`` simply
re-reads the telemetry directory, so retraining is just calling that function
again.

**Q: What if the ML model gives a bad recommendation?**

The recommendation includes a ``confidence`` score. Low confidence
automatically falls back to the deterministic ``ResourceAdvisor``. You can
also set hard ceilings via ``max_workers`` to bound any prediction.


What You Learned
-----------------

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Term
     - Definition
   * - Machine Learning
     - Teaching computers to find patterns and make predictions from data
   * - Training
     - Learning phase where model adjusts to fit historical data
   * - Inference
     - Prediction phase using a trained model on new inputs
   * - Features
     - Input variables the model uses for predictions
   * - Model
     - Mathematical function learned from data (inputs → predictions)
   * - Confidence
     - How sure the model is in a particular prediction
   * - Cross-Validation
     - Testing model quality by splitting data into train/test sets
   * - Gradient Boosting
     - ML algorithm using sequential corrective decision trees
   * - Adaptive Scaling
     - Adjusting worker counts in real time based on workload signals


Next Steps
-----------

You now understand how ML enhances workflow optimization through learned
resource advising and adaptive scaling.

* **Next beginner tutorial:** :ref:`beginner_ai_composition` — using AI
  assistants for workflow development
* **Standard tutorial:** :ref:`tutorial_ml_advanced` — advanced ML patterns
  and hyperparameter tuning
* **Try it:** Run your workflow 5+ times with different inputs. Then use
  ``LearnedAdvisor.from_history()`` to see what it recommends. Compare the
  ML recommendation to your current resource allocation.