Introduction

Global economic and multisector dynamic models have become pivotal tools for investigating complex interactions between human activities and the environment, as evident in recent research (Doelman et al. 2022; Fujimori et al. 2022; IPCC 2022; Ven et al. 2023). Agriculture and land use (AgLU) plays a critical role in these models, particularly when used to address key agroeconomic questions (Graham et al. 2023; Yarlagadda et al. 2023; Zhang et al. 2023; Zhao, Calvin, Wise, Patel, et al. 2021; Zhao, Calvin, and Wise 2020). Sound economic modeling hinges significantly upon the accessibility and quality of data (Bruckner et al. 2019; Calvin et al. 2022; Chepeliev 2022). The Food and Agriculture Organization Statistis (FAOSTAT) (FAO 2023) serves as one of the key global data sources, offering open-access data on country-level agricultural production, land use, trade, food consumption, nutrient content, prices, and more. However, the raw data from FAOSTAT requires cleaning, balancing, and synthesis, involving assumptions such as interpolation and mapping, which can introduce uncertainties. In addition, some of the core datasets reported by FAOSTAT, such as FAO’s Food Balance Sheets (FBS), are compiled at a specific level of aggregation, combining together primary and processed commodities (e.g., wheat and flour), which creates additional data processing challenges for the agroeconomic modeling community (Chepeliev 2022). It is noteworthy that each agroeconomic modeling team typically develops its own assumptions and methods to prepare and process FAOSTAT data (Bond-Lamberty et al. 2019). While largely overlooked, the uncertainty in the base data calibration approach likely contribute to the disparities in model outcomes (Lampe et al. 2014; Zhao, Calvin, Wise, and Iyer 2021). Hence, our motivation is to create an open-source tool (gcamfaostat) for the preparation, processing, and synthesis of FAOSTAT data for global agroeconomic modeling. The tool can also be valuable to a broader range of users interested in understanding global agriculture trends and dynamics, as it provides accessible and processed data and visualization functions.


The gaps gcamfaostat fills

Figure 1 shows a standard framework of using FAOSTAT data in GCAM. GCAM is a widely recognized global economic and multisector dynamic model complemented by the gcamdata R package, which serves as its data processing system. Particularly, gcamdata includes modules (data processing chunks) and functions to convert raw data inputs into hundreds of XML input files used by GCAM (Bond-Lamberty et al. 2019). As an illustration, in the latest GCAM version, GCAM v7 (Bond-Lamberty et al. 2023), about 280 XML files, with a combined size of 4.1 GB, are generated. Although AgLU-related XMLs represent only about 10% of the total number of files, they contribute over 50% in size (~2.1 GB). The majority of AgLU-related data, whether directly or indirectly, rely on raw data sourced from the FAOSTAT.

Nonetheless, the FAOSTAT data employed within gcamdata has traditionally involved manual downloads and may have undergone preprocessing. In light of the increasing data needs, maintaining the FAOSTAT data processing tasks in gcamdata has become increasingly challenging. In addition, the processing of FAOSTAT data in the AgLU modules of gcamdata is tailored specifically for GCAM. Consequently, the integration of FAOSTAT data updates has proven to be a non-trivial task, and the data processed by the AgLU module has limited applicability in other modeling contexts (Zhao and Wise 2023). The gcamfaostat package aims to address these limitations (Figure 2). The targeted approach incorporates data preparation, processing, and synthesis capabilities within a dedicated package, gcamfaostat, while regional and sectoral aggregation functions in the model data system are implemented using standalone routines within the gcamdata package. This strategy not only ensures the streamlined operation of gcamfaostat but also contributes to keeping model data system lightweight and more straightforward to maintain.


Figure 1. The original framework of utilizing FAOSTAT data in GCAM and similar large-scale models. Note that FAOSTAT data is mainly processed in the AgLU modules in gcamdata while there could be interdependency across data processing modules.


Figure 2. The new framework of utilizing FAOSTAT data in GCAM and similar large-scale models through gcamfaostat. Modules with identifier xfaostat only exist in gcamfaostat. The AgLU-related modules (aglu) that rely on outputs from gcamfaostat can run in both packages. Other gcamdata modules that process data in such areas as energy, emissions, water, and socioeconomics only exist in gcamdata.


Installing gcamfaostat

R

  • R version 4.0 or higher and RStudio are recommended.

Clone this repository (size < 1 GB)

  • On the command line: navigate to your desired folder location and then enter
git clone https://github.com/JGCRI/gcamfaostat.git
  • Or using devtools to download and install in r environment (into .libPaths())
install.packages("devtools") # install devtools package if not already
devtools::install_github("jgcri/gcamfaostat")
  • Note that gcamfaostat is not dependent on GCAM, so it can be installed to folders as desired by the user.

Loading the gcamfaostat package

  • Open the gcamfaostat folder you just cloned and double-click the gcamfaostat.Rproj file. RStudio should open the project.
  • Then to load the gcamfaostat package:
devtools::load_all()

Package dependencies

Users can consider using renv.

  • renv has been initialized and a renv.lock file was included.
  • Using renv will save a private R library with the correct versions of any package dependencies.
  • Please find more details here in gcamdata manual.

Run the driver

driver_drake and driver

Users should now be ready to run the driver, which is the main processing function that generates intermediate data outputs and final output (csv or other files) for GCAM (gcamdata) or other models. Driver functions will run data processing modules/functions sequentially, see Processing Flow. There are two ways to run the driver, both inherited from gcamdata.

  1. driver_drake()

driver_drake() runs the driver and stores the outputs in a hidden cache. When you run driver_drake() again it will skip steps that are up-to-date. This is useful if you will be adjusting the data inputs and code and running the data system multiple times. For this reason, we almost always recommend using driver_drake(). More details can be found in the here.

  1. driver()

See here for more options when running the driver, such as what outputs to generate or when to stop.

For gcamfaostat, it is recommended to use driver_drake() so the data tracing and exploring after the drive will be possible, even though it may take longer to run and additional space (for drake cache) compared to driver(). The driver() approach has a default option (write_outputs = T) to write the intermediate csv outputs to ./outputs.

Output files

  • In constants.R, users can set OUTPUT_Export_CSV == TRUE and specify the output directory (DIR_OUTPUT_CSV) to export and store the output csv files.
  • The default directory is outputs/gcamfaostat_csv_output.
  • Users can also make use of the functions to trace the processing by step, whendriver_drake() is employed.

References

Bond-Lamberty, Ben, Kalyn Dorheim, Ryna Cui, Russell Horowitz, Abigail Snyder, Katherine Calvin, Leyang Feng, et al. 2019. “Gcamdata: An r Package for Preparation, Synthesis, and Tracking of Input Data for the GCAM Integrated Human-Earth Systems Model.” Journal of Open Research Software 7 (1).
Bond-Lamberty, Ben, Pralit Patel, Joshua Lurz, Page kyle, Kate Calvin, Steve Smith, Abigail Snyder, et al. 2023. “JGCRI/Gcam-Core: GCAM 7.0.” Zenodo. https://doi.org/10.5281/zenodo.8010145.
Bruckner, Martin, Richard Wood, Daniel Moran, Nikolas Kuschnig, Hanspeter Wieland, Victor Maus, and Jan Börner. 2019. “FABIO—the Construction of the Food and Agriculture Biomass Input–Output Model.” Environmental Science & Technology 53 (19): 11302–12. https://doi.org/10.1021/acs.est.9b03554.
Calvin, Katherine V., Abigail Snyder, Xin Zhao, and Marshall Wise. 2022. “Modeling Land Use and Land Cover Change: Using a Hindcast to Estimate Economic Parameters in Gcamland V2.0.” Geoscientific Model Development 15 (2): 429–47. https://doi.org/10.5194/gmd-15-429-2022.
Chepeliev, Maksym. 2022. “Incorporating Nutritional Accounts to the GTAP Data Base.” Journal of Global Economic Analysis 7 (1): 1–43. https://doi.org/10.21642/JGEA.070101AF.
Doelman, Jonathan C., Felicitas D. Beier, Elke Stehfest, Benjamin L. Bodirsky, Arthur H. W. Beusen, Florian Humpenöder, Abhijeet Mishra, et al. 2022. “Quantifying Synergies and Trade-Offs in the Global Water-Land-Food-Climate Nexus Using a Multi-Model Scenario Approach.” Environmental Research Letters 17 (4): 045004. https://doi.org/10.1088/1748-9326/ac5766.
FAO. 2023. “FAOSTAT Database.” https://www.fao.org/faostat/en/#data.
Fujimori, Shinichiro, Wenchao Wu, Jonathan Doelman, Stefan Frank, Jordan Hristov, Page Kyle, Ronald Sands, et al. 2022. “Land-Based Climate Change Mitigation Measures Can Affect Agricultural Markets and Food Security.” Nature Food 3 (2): 110–21. https://doi.org/10.1038/s43016-022-00464-4.
Graham, Neal T., Gokul Iyer, Thomas B. Wild, Flannery Dolan, Jonathan Lamontagne, and Katherine Calvin. 2023. “Agricultural Market Integration Preserves Future Global Water Resources.” One Earth 6 (9): 1235–45. https://doi.org/10.1016/j.oneear.2023.08.003.
IPCC. 2022. “Annex III: Scenarios and Modelling Methods.” In Climate Change 2022: Mitigation of Climate Change. Cambridge, UK; New York, NY, USA: Cambridge University Press,. https://doi.org/10.1017/9781009157926.022.
Lampe, Martin von, Dirk Willenbockel, Helal Ahammad, Elodie Blanc, Yongxia Cai, Katherine Calvin, Shinichiro Fujimori, et al. 2014. “Why Do Global Long-Term Scenarios for Agriculture Differ? An Overview of the AgMIP Global Economic Model Intercomparison.” Agricultural Economics 45 (1): 3–20. https://doi.org/10.1111/agec.12086.
Ven, Dirk-Jan van de, Shivika Mittal, Ajay Gambhir, Robin D. Lamboll, Haris Doukas, Sara Giarola, Adam Hawkes, et al. 2023. “A Multimodel Analysis of Post-Glasgow Climate Targets and Feasibility Challenges.” Nature Climate Change 13 (6): 570–78. https://doi.org/10.1038/s41558-023-01661-0.
Yarlagadda, Brinda, Thomas Wild, Xin Zhao, Leon Clarke, Ryna Cui, Zarrar Khan, Abigail Birnbaum, and Jonathan Lamontagne. 2023. “Trade and Climate Mitigation Interactions Create Agro-Economic Opportunities with Social and Environmental Trade-Offs in Latin America and the Caribbean.” Earth’s Future 11 (4): e2022EF003063. https://doi.org/10.1029/2022EF003063.
Zhang, Ying, Stephanie Waldhoff, Marshall Wise, Jae Edmonds, and Pralit Patel. 2023. “Agriculture, Bioenergy, and Water Implications of Constrained Cereal Trade and Climate Change Impacts.” PLOS ONE 18 (9): e0291577. https://doi.org/10.1371/journal.pone.0291577.
Zhao, Xin, Katherine V. Calvin, Marshall A. Wise, Pralit L. Patel, Abigail C. Snyder, Stephanie T. Waldhoff, Mohamad I. Hejazi, and James A. Edmonds. 2021. “Global Agricultural Responses to Interannual Climate and Biophysical Variability.” Environmental Research Letters 16 (10): 104037. https://doi.org/10.1088/1748-9326/ac2965.
Zhao, Xin, Katherine V Calvin, and Marshall A Wise. 2020. “The Critical Role of Conversion Cost and Comparative Advantage in Modeling Agricultural Land Use Change.” Climate Change Economics 11 (01): 2050004. https://doi.org/10.1142/s2010007820500049.
Zhao, Xin, Katherine V Calvin, Marshall A Wise, and Gokul Iyer. 2021. “The Role of Global Agricultural Market Integration in Multiregional Economic Modeling: Using Hindcast Experiments to Validate an Armington Model.” Economic Analysis and Policy 72: 1–17. https://doi.org/10.1016/j.eap.2021.07.007.
Zhao, Xin, and Marshall Wise. 2023. “Core Model Proposal# 360: GCAM Agriculture and Land Use (AgLU) Data and Method Updates: Connecting Land Hectares to Food Calories,” no. PNNL-34313. https://jgcri.github.io/gcam-doc/cmp/360_AgLU_data_and_methods.pdf.