Workflow

gaia is designed with a climate-driven empirical model at its core, integrated into an efficient modular structure. This architecture streamlines the entire workflow, from initial climate and crop data processing through empirical model fitting, yield shock projections under future climate scenarios, to the calculation of agricultural productivity changes for GCAM. The modular design also facilitates comprehensive diagnostic outputs, enhancing the tool’s utility for researchers and policymakers.

The primary functionality of gaia is encapsulated in the yield_impact wrapper function, which executes the entire workflow from climate data processing to yield shock estimation. Users can also execute individual functions to work through the main steps of the process (Figure 1).

  1. weighted_climate: Processes CMIP-ISIMIP climate NetCDF data and calculates cropland-weighted precipitation and temperature at the country level, differentiated by crop type and irrigation type . The function accepts both daily or monthly climate data that are consistent with the CMIP-ISIMIP NetCDF data format.

  2. crop_calenders: Generates crop planting months for each country and crop based on crop calendar data Sacks et al., (2020).

  3. data_aggregation: Calculates crop growing seasons using climate variables processed by weighted_climate and crop calendars for both historical and projected periods. This function prepares climate and yield data for subsequent model fitting.

  4. yield_regression: Performs regression analysis fitted with historical annual crop yields, monthly growing season temperature and precipitation, CO2 concentrations, GDP per capita, and year. The default econometric model applied in gaia is from Waldhoff et al., (2020). User can specify alternative formulas that are consistent with the data processed in data_aggregation.

  5. yield_shock_projection: Projects yield shocks for future climate scenarios using the fitted model and temperature, precipitation, and CO2 projections from the climate scenario.

  6. gcam_agprodchange: Remaps country-level yield shocks to GCAM-required spatial scales (i.e., region, basin, technology intersections), based on harvested areas, and aggregates crops to GCAM commodities. This function applies the projected shocks to GCAM scenario agricultural productivity growth rates (the unit used to project future yields in GCAM) and creates ready-to-use XML outputs for GCAM.


Figure 1: The gaia workflow showing the functions and the corresponding outputs of modeling crop yield shocks to climate variations using empritical econometric model.


Example

Example Climate Data

gaia requires global climate data from the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) or data formatted similarly to ISIMIP data. Additionally, gaia supports climate data aggregated to a monthly time step. Due to the large size of daily time step climate data, we have provided an example of monthly aggregated climate data covering the period from 2015 to 2100. Please download the example data using the instructions below.

NOTE: In the case where there is no available historical climate data, gaia will use the default fitted model that has already been fitted with WATCH historical climate forcing data (Weedon et al., (2011)).

Run gaia!

Example 1

Download the example climate NetCDF data and configure the data paths accordingly.

# load gaia
library(gaia)

# NOTE: please change `data_dir` to your desired location for downloaded data
data_dir <- gaia::get_example_data(
  download_url = 'https://zenodo.org/records/13179630/files/gaea_example_climate.zip?download=1',
  data_dir = 'path/to/desired/location'
)

# Path to the climate NetCDF files
# NOTE: Each variable can have more than one file
# historical climate data
pr_historical_file <- file.path(data_dir, 'pr_monthly_canesm5_w5e5_rcp7_1950_2014.nc')
tas_historical_file <- file.path(data_dir, 'tas_monthly_canesm5_w5e5_rcp7_1950_2014.nc')

# projected climate data
pr_projection_file <- file.path(data_dir, 'pr_monthly_canesm5_w5e5_rcp7_2015_2100.nc')
tas_projection_file <- file.path(data_dir, 'tas_monthly_canesm5_w5e5_rcp7_2015_2100.nc')


Once the example climate NetCDF data is in place, we can run gaia with a single function yield_impact that streamlines the entire workflow. For the explanation of each argument of yield_impact, refer to this reference page.

NOTE: This workflow may take up to one hour due to the amount of climate data gaia needs to process. For a faster example, please see Example 2.

# load gaia
library(gaia)

# Run gaia
# The full run with raw climate data can take up to an hour
gaia::yield_impact(
  pr_hist_ncdf = pr_historical_file, 
  tas_hist_ncdf = tas_historical_file, 
  pr_proj_ncdf = pr_projection_file, 
  tas_proj_ncdf = tas_projection_file, 
  timestep = 'monthly',                   # specify the time step of the NetCDF data (monthly or daily)
  historical_periods = c(1950:2014),      # vector of historical years selected for fitting
  climate_model = 'canesm5',              # label of climate model name
  climate_scenario = 'gcam-ref',          # label of climate scenario name
  member = 'r1i1p1f1',                    # label of ensemble member name
  bias_adj = 'w5e5',                      # label of climate data for bias adjustment
  cfe = 'no-cfe',                         # label of CO2 fertilization effect in the formula (default is no CFE)
  gcam_version = 'gcam7',                 # output is different depending on the GCAM version (gcam6 or gcam7)
  use_default_coeff = FALSE,              # set to TRUE when there is no historical climate data available
  base_year = 2015                        # GCAM base year
  start_year = 2015,                      # start year of the projected climate data
  end_year = 2100,                        # end year of the projected climate data
  smooth_window = 20,                     # number of years as smoothing window
  co2_hist = NULL,                        # historical annual CO2 concentration. If NULL, will use default value
  co2_proj = NULL,                        # projected annual CO2 concentration. If NULL, will use default value
  diagnostics = TRUE,                     # set to TRUE to output diagnostic plots
  output_dir = 'path/to/output/folder'    # path to the output folder
)


NOTE: The arguments climate_model, climate_scenario, member, bias_adj, and cfe require corresponding strings that demonstrate climate model information in the output files. These arguments will not affect gaia model simulation, other than the meta information of climate data in the output files.


Example 2

We also provide an example of weighted climate data, processed using cropland weights at the country level. This weighted climate data is generated by gaia::weighted_climate. We have provided this example to help users format their data to match the weighted climate data format if their raw climate data differs from the ISIMIP format. Download the example of weighted climate data and run gaia by following the instructions below.

# load gaia
library(gaia)

# NOTE: please change `data_dir` to your desired location for downloaded data
data_dir <- gaia::get_example_data(
  download_url = 'https://zenodo.org/records/13179630/files/weighted_climate.zip?download=1',
  data_dir = 'path/to/desired/location'
)

# Path to the climate NetCDF files
# NOTE: Each variable can have more than one file
# historical climate data
climate_hist_dir <- file.path(data_dir, 'canesm5_hist')

# projected climate data
climate_impact_dir <- file.path(data_dir, 'canesm5')


Running gaia directly with weighted climate data only takes a few minutes!

# load gaia
library(gaia)

# Run gaia
gaia::yield_impact(
  climate_hist_dir = climate_hist_dir,
  climate_impact_dir = climate_impact_dir,
  timestep = 'monthly',                   # specify the time step of the NetCDF data (monthly or daily)
  climate_model = 'canesm5',              # label of climate model name
  climate_scenario = 'gcam-ref',          # label of climate scenario name
  member = 'r1i1p1f1',                    # label of ensemble member name
  bias_adj = 'w5e5',                      # label of climate data for bias adjustment
  cfe = 'no-cfe',                         # label of CO2 fertilization effect in the formula (default is no CFE)
  gcam_version = 'gcam7',                 # output is different depending on the GCAM version (gcam6 or gcam7)
  use_default_coeff = FALSE,              # set to TRUE when there is no historical climate data available
  base_year = 2015                        # GCAM base year
  start_year = 2015,                      # start year of the projected climate data
  end_year = 2100,                        # end year of the projected climate data
  smooth_window = 20,                     # number of years as smoothing window
  co2_hist = NULL,                        # historical annual CO2 concentration. If NULL, will use default value
  co2_proj = NULL,                        # projected annual CO2 concentration. If NULL, will use default value
  diagnostics = TRUE,                     # set to TRUE to output diagnostic plots
  output_dir = 'path/to/output/folder'    # path to the output folder
)