Introduction

The gcamfaostat tool is designed to streamline the processing and synthesis of raw data sourced from FAOSTAT. The initial phase of this process involves data procurement, with a critical awareness of FAOSTAT’s ongoing data updates. These updates encompass a spectrum of improvements, including, e.g., the addition of historical data for new countries and items, data completion for previously missing entries, and refinements such as changes in country nomenclature. This dynamic data landscape underscores the need for a robust and adaptable approach in the gcamfaostat workflow. Here we describe a few key functions in gcamdatafaostat created to procure the raw data and facilitate the processing.

When gcamfaostat is downloaded, preprocessed FAOSTAT data, i.e., output of the xfaostat_L101_RawDataPreProc* modules, are stored in the Prebuilt Data of the package.

  • The package can be run with using those data, but make sure Process_Raw_FAO_Data <- FALSE in constants.R.
  • The prebuilt data were generated using FAOSTAT data archived in Zenodo address specified in the FF_download_RemoteArchive function (Zhao 2023).

1. Get the latest FAOSTAT Metadata

FAOSTAT_metadata()

  • Accessing API and returning a data frame of metadata

2. Generate the metadata for the gcamfaostat input data

gcamfaostat_metadata()

  • The function will save the latest FAOSTAT metadata to the metadata_log
  • The function will access both the latest FAOSTAT metadata and local data information and returns a summary table including the dataset information needed for gcamfaostat (see Table 1 below).
  • The dataset code needed were specified in the function to get a subset of the FAOSTAT metadata. The function will return only dataset code required when setting OnlyReturnDatasetCodeRequired = FALSE.
  • The function will check whether FAOSTAT raw data exists locally (Exist_Local) and in Prebuilt Data (Exist_Prebuilt). If Exist_Prebuilt is TRUE for all dataset, the package is ready to be built based on the Prebuilt package data.
  • FAO update data and FAO size indicate the information based on the latest FAOSTAT metadata.
Table 1. FAOSTAT dataset processed in gcamfaostat
Dataset Code Dataset Name Exist_Local Exist_Prebuilt FAO update date FAO size
CBH Food Balances: Commodity Balances (non-food) (-2013, old methodology) TRUE TRUE 2021-12-03 8MB
PD Prices: Deflators TRUE TRUE 2024-06-12 1MB
FBS Food Balances: Food Balances (2010-) TRUE TRUE 2024-07-19 50MB
FBSH Food Balances: Food Balances (-2013, old methodology and population) TRUE TRUE 2023-03-10 69MB
FO Forestry: Forestry Production and Trade TRUE TRUE 2023-12-21 15MB
RFN Land, Inputs and Sustainability: Fertilizers by Nutrient TRUE TRUE 2024-07-17 2MB
RL Land, Inputs and Sustainability: Land Use TRUE TRUE 2024-08-19 3MB
CS Macro-Economic Indicators: Capital Stock TRUE TRUE 2023-12-22 1MB
OA Population and Employment: Annual population TRUE TRUE 2022-11-10 2MB
PP Prices: Producer Prices TRUE TRUE 2024-01-09 10MB
QCL Production: Crops and livestock products TRUE TRUE 2024-10-07 32MB
SCL Food Balances: Supply Utilization Accounts (2010-) TRUE TRUE 2024-07-19 80MB
TCL Trade: Crops and livestock products TRUE TRUE 2023-12-21 244MB
TM Trade: Detailed trade matrix TRUE TRUE 2023-12-21 446MB

Note that if Exist_Prebuilt is TRUE for a dataset, it suggests the raw data was saved in the Prebuilt data. And if Exist_Prebuilt is TRUE for all dataset, the package is ready to be built based on the Prebuilt data.

3. Download FAOSTAT raw data

FF_download_RemoteArchive()

  • The function downloads the FAOSTAT raw data needed for the package from a remote archive.
  • The default Zenodo archive currently included in the function includes a snapshot of FAOSTAT data to ensure replicability.
  • The archived data is consistent with the Prebuilt package data.

FF_download_FAOSTAT() * The function downloads the latest raw data from FAOSTAT.

Two functions above are created for downloading the raw data from a remote archive or the FAOSTAT API (latest data). The dataset code variable in the two functions, if including all, can be generated using gcamfaostat_metadata(OnlyReturnDatasetCodeRequired = T).

Example:

# Dataset PP, producer prices, is downloaded from RemoteArchiveURL to DATA_FOLDER
# RemoteArchiveURL = "https://zenodo.org/record/13941470/files/"
# DATA_FOLDER = "inst/extdata/FAOSTAT"
FF_download_RemoteArchive( 
  DATASETCODE = "PP",   
  OverWrite = TRUE     # overwrite existing PP dataset 
  )

# Dataset OA, population, is downloaded from FAOSTAT to DATA_FOLDER
FF_download_FAOSTAT(DATASETCODE = "OA", OverWrite = TRUE)

# Note that single DATASETCODE is allowed in both functions. 

4. Check local raw data info

FF_rawdata_info()

  • Providing more detailed metadata information, similar to gcamfaostat_metadata (which calls FF_rawdata_info).
  • Indicate whether an update from FAOSTAT is potentially needed.
  • Can be used to download data if not exist.

Example:

# Provide detailed metadata of "PP" and "OA" in DATA_FOLDER ("inst/extdata/FAOSTAT")
FF_rawdata_info(DATASETCODE = c("PP", "OA"), DOWNLOAD_NONEXIST = FALSE)

# If "PP" or "OA" does not exist, download from remote archive
FF_rawdata_info(DATASETCODE = c("PP", "OA"), DOWNLOAD_NONEXIST = TRUE, FAOSTAT_or_Archive = "Archive")

5. Load raw data into package

FAOSTAT_load_raw_data

  • Loading FAOSTAT raw data, e.g., used in xfaostat_L191_RawDataPreProc* modules.
  • Note that . in column name is substituted with _.

Example:

# Read raw data of "PP" and "OA" from DATA_FOLDER ("inst/extdata/FAOSTAT") to .GlobalEnv
FAOSTAT_load_raw_data(DATASETCODE = c("PP", "OA"), .Envir = .GlobalEnv)

6. Update the Prebulit data

If users updated the FAOSTAT raw data, and run driver_drake is run with Process_Raw_FAO_Data <- TRUE in constants.R, the updated preprecessed data are stored in drake cache. And they can be used to update Prebulit data by sourcing data-raw/generate_package_data.R.

References

Zhao, Xin. 2023. “FAOSTAT AgLU Data Archive GCAMv7.” Zenodo. https://doi.org/10.5281/zenodo.8260225.