.. _tutorial_demeter_setup:

==========================================================
Tutorial Setup: Run the Demeter Example End-to-End
==========================================================

What You Will Do
-----------------

Tutorials 2–10 use the
`Demeter <https://github.com/JGCRI/demeter>`_ land-use / land-cover
disaggregation model as their running example. This page is the **one-time
setup** that makes those tutorials' code blocks actually executable on your
machine. After completing it you will have:

* The Scalable repo and the Demeter source tree on disk.
* The Demeter Zenodo example dataset under ``./demeter_data/``.
* (Optional) A locally built ``demeter:local`` container image suitable for
  use as a Dask worker image on Kubernetes / Fargate.
* A successful smoke-run of one Demeter scenario through Scalable.

The Demeter source already lives in the Scalable monorepo at
``capabilities/demeter`` — you cloned it when you cloned this repository.

Prerequisites
-------------

* Python 3.11 or later.
* Git (to verify the Demeter clone).
* (Optional) Docker for the container build / smoke-test.
* ~ 1 GB of free disk space for ``demeter_data/``.

Step 1: Install Scalable and Demeter
--------------------------------------

From the Scalable repository root:

.. code-block:: bash

   python -m venv .venv
   source .venv/bin/activate

   # Scalable itself
   pip install -e .

   # Demeter as an editable install so you can see the source it ships
   pip install -e capabilities/demeter

Both installs are lightweight; Demeter's heavy native dependencies
(``netcdf4``, ``scipy``, ``matplotlib``) come from PyPI wheels.

Step 2: Download the Demeter example data
-------------------------------------------

Demeter ships a Zenodo bundle of GCAM reference scenarios and constraint
files. Download it once:

.. code-block:: python

   import demeter

   demeter.get_package_data("./demeter_data")

The directory layout looks like this when complete (Zenodo bundle for
Demeter 2.0.x):

.. code-block:: text

   demeter_data/
   ├── config_gcam_reference.ini
   └── inputs/
       ├── allocation/
       ├── constraints/
       ├── mapping/
       ├── observed/
       └── projected/

The :download:`canonical workflow </examples/workflow_demeter.py>` resolves
the base ``.ini`` automatically; if your bundle expands to a different
path, set the ``DEMETER_DATA`` environment variable or pass ``--data-dir``
to ``scalable_example.run``.

Step 3 (optional): Build the Demeter container image
------------------------------------------------------

Tutorials 5 (cloud), 8 (Kubernetes), and 10 (AI composition) reference a
Demeter container image. The Scalable-friendly Dockerfile lives at
``capabilities/demeter/Dockerfile.scalable`` and uses Python 3.11 +
``slim-bookworm``:

.. code-block:: bash

   docker build \
     -t demeter:local \
     -f capabilities/demeter/Dockerfile.scalable \
     capabilities/demeter

Skip this step if you only plan to run the local target — the
``containers: none`` mode in
:download:`scalable.demeter.yaml  </examples/scalable.demeter.yaml>` does not
need an image.

Step 4: Smoke-test one scenario
---------------------------------

The ``capabilities/demeter/scalable_example/`` subpackage ships a runnable
driver that exercises the full pipeline. Run a single scenario locally:

.. code-block:: bash

   python -m scalable_example.run --scenarios reference

Expected output (abbreviated):

.. code-block:: text

   2026-05-20 14:03:11 INFO scalable_example.demeter Using shared manifest:
       /…/scalable/docs/examples/scalable.demeter.yaml
   2026-05-20 14:03:13 INFO scalable_example.demeter Prepared 1 scenario config(s)
   2026-05-20 14:04:42 INFO scalable_example.demeter Completed Demeter runs for: reference
   {
     "summary_path": "/…/outputs/demeter/_summary/scenarios.json",
     "scenario_count": 1
   }

What just happened?

1. Stage 1 (``prepare_demeter_config``) cloned the Zenodo
   ``config.ini`` into ``./outputs/demeter/reference/`` and rewrote the
   ``[PARAMS] scenario`` and ``[STRUCTURE] output_dir`` fields.
2. Stage 2 (``run_demeter_scenario``) invoked
   :func:`demeter.run_model` against that ``.ini`` on a Scalable worker
   tagged ``demeter``.
3. Stage 3 (``aggregate_demeter_outputs``) collected the per-scenario
   summary into ``_summary/scenarios.json``.

Add scenarios to fan out:

.. code-block:: bash

   python -m scalable_example.run --scenarios reference ssp1 ssp2

Each scenario runs in its own Dask process; on a 4-core laptop the
local target processes them in parallel up to ``max_workers: 4``.

Step 5: Run the same workflow via the ``scalable`` CLI
--------------------------------------------------------

The :download:`reference workflow  </examples/workflow_demeter.py>` lives at
``docs/examples/workflow_demeter.py``. It reads the same manifest and
ships the same three task functions but is loaded as a script by the
Scalable CLI:

.. code-block:: bash

   scalable run docs/examples/scalable.demeter.yaml \
       --target local \
       --workflow docs/examples/workflow_demeter.py

This is the form the rest of the tutorials use when they say "run the
Demeter pipeline". The ``scalable_example`` driver above is a convenience
wrapper that bypasses the CLI in favor of direct Python; both ultimately
call :class:`scalable.ScalableSession` and produce identical output.

Troubleshooting
---------------

**``ModuleNotFoundError: No module named 'demeter'``**
  The Demeter editable install in step 1 didn't pick up. Re-run
  ``pip install -e capabilities/demeter`` and ensure the ``.venv`` is
  activated.

**``FileNotFoundError`` for ``config_gcam_reference/config.ini``**
  Step 2 (data download) was skipped or wrote to a different directory.
  Confirm with ``ls demeter_data/example/config_gcam_reference/`` and re-run
  ``demeter.get_package_data("./demeter_data")`` if needed.

**``MemoryError`` during ``run_demeter_scenario``**
  Default local target allocates 4 GB per worker; the reference scenario
  fits, but fine-resolution variants can exceed it. Edit the manifest
  target ``local`` to increase per-process memory, or apply the
  ``k8s-fine-resolution`` overlay (which sets ``demeter.memory: 64G``)
  when running on a larger target.

**Docker build fails on ``apt-get install libhdf5-dev``**
  The Scalable-friendly Dockerfile assumes the host can reach the Debian
  package mirrors. Behind a proxy you may need to pass
  ``--build-arg HTTP_PROXY=...``.

**Tests fail with ``ImportError: cannot import name 'configobj'``**
  The Demeter editable install did not pull its dependencies. Run
  ``pip install -r capabilities/demeter/requirements.txt``.

**``ModuleNotFoundError: No module named 'pkg_resources'`` on Python 3.13+**
  Setuptools 81+ dropped ``pkg_resources``; Demeter still imports it.
  Pin to ``pip install 'setuptools<81'`` until the upstream Demeter
  release migrates to ``importlib.metadata``.

**``ValueError: assignment destination is read-only`` inside
demeter.demeter_io.reader.read_base**
  Triggered by a Demeter-internal compatibility issue with numpy 2.x where
  a returned array carries ``writeable=False``. Pin to
  ``pip install 'numpy<2'`` or apply the upstream fix from the Demeter
  issue tracker. The Scalable side of the pipeline is unaffected — the
  per-scenario ``.ini`` was generated correctly and the task was scheduled
  on a Scalable worker; the failure is wholly inside ``demeter.run_model``.

What's Next
-----------

You now have a runnable Demeter pipeline. Continue with:

* :ref:`tutorial_getting_started` — Tutorial 1 verifies the Scalable
  install with the trivial ``hello-scalable`` project.
* :ref:`tutorial_manifest_system` — Tutorial 2 dissects
  ``scalable.demeter.yaml``.
* :ref:`tutorial_scaling_strategies` — Tutorial 3 fans out to N Demeter
  scenarios.
* :ref:`tutorial_ai_composition` — Tutorial 10 onboards the same Demeter
  repo via ``scalable init-component``.