ML Optimization =============== The :mod:`scalable.ml` package provides machine-learning-backed resource prediction, adaptive worker scaling, and distributed hyperparameter tuning. All features degrade seamlessly to heuristic advising when ``scalable[ml]`` is not installed. Installation ------------ .. code-block:: bash pip install scalable[ml] This installs ``scikit-learn >= 1.3``, ``dask-ml >= 2023.3.24``, and ``joblib >= 1.3``. LearnedAdvisor -------------- :class:`~scalable.ml.LearnedAdvisor` provides ML-backed resource recommendations using gradient boosting, random forest, or quantile regression trained on run telemetry history. .. code-block:: python from scalable import LearnedAdvisor advisor = LearnedAdvisor.from_history( "./.scalable/runs", model_type="gradient_boosting", ) recommendation = advisor.recommend(task="run_demeter_scenario", target="local") print(recommendation.resources) print(recommendation.confidence) Supported model types: - ``gradient_boosting`` (default) — gradient boosting regressor - ``random_forest`` — random forest regressor - ``quantile_regression`` — quantile regression for interval estimates When insufficient training data is available, ``LearnedAdvisor`` transparently falls back to the :class:`~scalable.advising.ResourceAdvisor` heuristic. AdaptiveScaler -------------- :class:`~scalable.ml.AdaptiveScaler` provides real-time adaptive worker scaling with configurable thresholds, min/max bounds, and cooldown periods. .. code-block:: python from scalable import AdaptiveScaler scaler = AdaptiveScaler( min_workers=1, max_workers=16, scale_up_threshold=0.8, scale_down_threshold=0.3, cooldown_seconds=60, ) decision = scaler.evaluate(current_metrics) print(decision.action) # "scale_up", "scale_down", or "hold" print(decision.target_workers) FeatureExtractor ---------------- :class:`~scalable.ml.FeatureExtractor` provides telemetry feature engineering with rolling aggregates, task identity hashing, and user-provided input features for ML model training. .. code-block:: python from scalable.ml import FeatureExtractor extractor = FeatureExtractor() features = extractor.extract(telemetry_records) HyperparameterSearch -------------------- :class:`~scalable.ml.HyperparameterSearch` integrates Dask-ML distributed hyperparameter tuning with support for hyperband, successive halving, and random search strategies. Falls back to sklearn ``GridSearchCV`` when ``dask-ml`` is unavailable. .. code-block:: python from scalable import HyperparameterSearch search = HyperparameterSearch( strategy="hyperband", param_distributions={ "n_estimators": [50, 100, 200], "max_depth": [3, 5, 10], }, ) result = search.fit(X_train, y_train) print(result.best_params) print(result.best_score) Model Validation ---------------- Use ``cross_validate_advisor`` to assess model quality before deployment: .. code-block:: python from scalable.ml import cross_validate_advisor quality = cross_validate_advisor(advisor, X_test, y_test) print(quality.mae) print(quality.coverage) CLI Command ----------- The ``scalable advise`` command provides ML-backed recommendations from the command line: .. code-block:: bash scalable advise --task run_demeter_scenario --target local --confidence 0.95 scalable advise --task run_demeter_scenario --model-type random_forest --format json Options: - ``--task`` — Task name to get recommendations for (required) - ``--target`` — Deployment target to scope recommendations - ``--runs-dir`` — Path to runs directory (default: ``.scalable/runs``) - ``--model-type`` — ML model type (``gradient_boosting``, ``random_forest``, ``quantile_regression``) - ``--confidence`` — Confidence level (default: 0.95) - ``--format`` — Output format (``text`` or ``json``) - ``--output`` — Output file path (default: stdout) Configuration ------------- ML features are controlled via environment variables: - ``SCALABLE_ML`` — Enable/disable ML features (default: ``1``) - ``SCALABLE_ML_CACHE_DIR`` — Model cache directory (default: ``.scalable/models``)