gaia
is designed with a climate-driven empirical model
at its core, integrated into an efficient modular structure. This
architecture streamlines the entire workflow, from initial climate and
crop data processing through empirical model fitting, yield shock
projections under future climate scenarios, to the calculation of
agricultural productivity changes for GCAM. The modular design also
facilitates comprehensive diagnostic outputs, enhancing the tool’s
utility for researchers and policymakers.
The primary functionality of gaia
is encapsulated in the
yield_impact
wrapper function, which executes the entire
workflow from climate data processing to yield shock estimation. Users
can also execute individual functions to work through the main steps of
the process (Figure 1).
weighted_climate
: Processes CMIP-ISIMIP climate
NetCDF data and calculates cropland-weighted precipitation and
temperature at the country level, differentiated by crop type and
irrigation type . The function accepts both daily or monthly climate
data that are consistent with the CMIP-ISIMIP NetCDF data
format.
crop_calenders
: Generates crop planting months for
each country and crop based on crop calendar data Sacks et al.,
(2020).
data_aggregation
: Calculates crop growing seasons
using climate variables processed by weighted_climate
and
crop calendars for both historical and projected periods. This function
prepares climate and yield data for subsequent model fitting.
yield_regression
: Performs regression analysis
fitted with historical annual crop yields, monthly growing season
temperature and precipitation, CO2 concentrations, GDP per capita, and
year. The default econometric model applied in gaia
is from
Waldhoff et al.,
(2020). User can specify alternative formulas that are consistent
with the data processed in data_aggregation.
yield_shock_projection
: Projects yield shocks for
future climate scenarios using the fitted model and temperature,
precipitation, and CO2 projections from the climate scenario.
gcam_agprodchange
: Remaps country-level yield shocks
to GCAM-required spatial scales (i.e., region, basin, technology
intersections), based on harvested areas, and aggregates crops to GCAM
commodities. This function applies the projected shocks to GCAM scenario
agricultural productivity growth rates (the unit used to project future
yields in GCAM) and creates ready-to-use XML outputs for GCAM.
Figure 1: The gaia workflow showing the functions and the corresponding outputs of modeling crop yield shocks to climate variations using empritical econometric model.
gaia
requires global climate data from the
Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) or data formatted similarly
to ISIMIP data. Additionally, gaia
supports climate data
aggregated to a monthly time step. Due to the large size of daily time
step climate data, we have provided an example of monthly aggregated
climate data covering the period from 2015 to 2100. Please download the
example data using the instructions below.
NOTE: In the case where there is no available
historical climate data, gaia
will use the default fitted
model that has already been fitted with WATCH historical climate forcing
data (Weedon et al.,
(2011)).
Download the example climate NetCDF data and configure the data paths accordingly.
# load gaia
library(gaia)
# NOTE: please change `data_dir` to your desired location for downloaded data
data_dir <- gaia::get_example_data(
download_url = 'https://zenodo.org/records/13179630/files/gaea_example_climate.zip?download=1',
data_dir = 'path/to/desired/location'
)
# Path to the climate NetCDF files
# NOTE: Each variable can have more than one file
# historical climate data
pr_historical_file <- file.path(data_dir, 'pr_monthly_canesm5_w5e5_rcp7_1950_2014.nc')
tas_historical_file <- file.path(data_dir, 'tas_monthly_canesm5_w5e5_rcp7_1950_2014.nc')
# projected climate data
pr_projection_file <- file.path(data_dir, 'pr_monthly_canesm5_w5e5_rcp7_2015_2100.nc')
tas_projection_file <- file.path(data_dir, 'tas_monthly_canesm5_w5e5_rcp7_2015_2100.nc')
Once the example climate NetCDF data is in place, we can run
gaia
with a single function yield_impact
that
streamlines the entire workflow. For the explanation of each argument of
yield_impact
, refer to this reference
page.
NOTE: This workflow may take up to one hour due to
the amount of climate data gaia
needs to process. For a
faster example, please see Example 2.
# load gaia
library(gaia)
# Run gaia
# The full run with raw climate data can take up to an hour
gaia::yield_impact(
pr_hist_ncdf = pr_historical_file,
tas_hist_ncdf = tas_historical_file,
pr_proj_ncdf = pr_projection_file,
tas_proj_ncdf = tas_projection_file,
timestep = 'monthly', # specify the time step of the NetCDF data (monthly or daily)
historical_periods = c(1950:2014), # vector of historical years selected for fitting
climate_model = 'canesm5', # label of climate model name
climate_scenario = 'gcam-ref', # label of climate scenario name
member = 'r1i1p1f1', # label of ensemble member name
bias_adj = 'w5e5', # label of climate data for bias adjustment
cfe = 'no-cfe', # label of CO2 fertilization effect in the formula (default is no CFE)
gcam_version = 'gcam7', # output is different depending on the GCAM version (gcam6 or gcam7)
use_default_coeff = FALSE, # set to TRUE when there is no historical climate data available
base_year = 2015 # GCAM base year
start_year = 2015, # start year of the projected climate data
end_year = 2100, # end year of the projected climate data
smooth_window = 20, # number of years as smoothing window
co2_hist = NULL, # historical annual CO2 concentration. If NULL, will use default value
co2_proj = NULL, # projected annual CO2 concentration. If NULL, will use default value
diagnostics = TRUE, # set to TRUE to output diagnostic plots
output_dir = 'path/to/output/folder' # path to the output folder
)
NOTE: The arguments climate_model
,
climate_scenario
, member
,
bias_adj
, and cfe
require corresponding
strings that demonstrate climate model information in the output files.
These arguments will not affect gaia
model simulation,
other than the meta information of climate data in the output files.
We also provide an example of weighted climate data,
processed using cropland weights at the country level. This weighted
climate data is generated by gaia::weighted_climate
. We
have provided this example to help users format their data to match the
weighted climate data format if their raw climate data differs from the
ISIMIP format. Download the example of weighted climate data and run
gaia
by following the instructions below.
# load gaia
library(gaia)
# NOTE: please change `data_dir` to your desired location for downloaded data
data_dir <- gaia::get_example_data(
download_url = 'https://zenodo.org/records/13179630/files/weighted_climate.zip?download=1',
data_dir = 'path/to/desired/location'
)
# Path to the climate NetCDF files
# NOTE: Each variable can have more than one file
# historical climate data
climate_hist_dir <- file.path(data_dir, 'canesm5_hist')
# projected climate data
climate_impact_dir <- file.path(data_dir, 'canesm5')
Running gaia
directly with weighted climate data only
takes a few minutes!
# load gaia
library(gaia)
# Run gaia
gaia::yield_impact(
climate_hist_dir = climate_hist_dir,
climate_impact_dir = climate_impact_dir,
timestep = 'monthly', # specify the time step of the NetCDF data (monthly or daily)
climate_model = 'canesm5', # label of climate model name
climate_scenario = 'gcam-ref', # label of climate scenario name
member = 'r1i1p1f1', # label of ensemble member name
bias_adj = 'w5e5', # label of climate data for bias adjustment
cfe = 'no-cfe', # label of CO2 fertilization effect in the formula (default is no CFE)
gcam_version = 'gcam7', # output is different depending on the GCAM version (gcam6 or gcam7)
use_default_coeff = FALSE, # set to TRUE when there is no historical climate data available
base_year = 2015 # GCAM base year
start_year = 2015, # start year of the projected climate data
end_year = 2100, # end year of the projected climate data
smooth_window = 20, # number of years as smoothing window
co2_hist = NULL, # historical annual CO2 concentration. If NULL, will use default value
co2_proj = NULL, # projected annual CO2 concentration. If NULL, will use default value
diagnostics = TRUE, # set to TRUE to output diagnostic plots
output_dir = 'path/to/output/folder' # path to the output folder
)