vignettes/vignette_preparing_data.Rmd
vignette_preparing_data.Rmd
The gcamfaostat
tool is designed to streamline the
processing and synthesis of raw data sourced from FAOSTAT. The initial
phase of this process involves data procurement, with a critical
awareness of FAOSTAT’s ongoing data updates. These updates encompass a
spectrum of improvements, including, e.g., the addition of historical
data for new countries and items, data completion for previously missing
entries, and refinements such as changes in country nomenclature. This
dynamic data landscape underscores the need for a robust and adaptable
approach in the gcamfaostat
workflow. Here we describe a
few key functions in gcamdatafaostat
created to procure the
raw data and facilitate the processing.
When gcamfaostat
is downloaded, preprocessed FAOSTAT
data, i.e., output of the xfaostat_L101_RawDataPreProc*
modules, are stored in the Prebuilt
Data of the package.
Process_Raw_FAO_Data <- FALSE
in
constants.R
.FF_download_RemoteArchive
function (Zhao 2023).gcamfaostat
input
data
OnlyReturnDatasetCodeRequired = FALSE
.Exist_Local
) and in Prebuilt
Data (Exist_Prebuilt
). If Exist_Prebuilt
is TRUE
for all dataset, the package is ready to be built
based on the Prebuilt package data.FAO update data
and FAO size
indicate the
information based on the latest FAOSTAT metadata.Dataset Code | Dataset Name | Exist_Local | Exist_Prebuilt | FAO update date | FAO size |
---|---|---|---|---|---|
CBH | Food Balances: Commodity Balances (non-food) (-2013, old methodology) | TRUE | TRUE | 2021-12-03 | 8MB |
PD | Prices: Deflators | TRUE | TRUE | 2024-06-12 | 1MB |
FBS | Food Balances: Food Balances (2010-) | TRUE | TRUE | 2024-07-19 | 50MB |
FBSH | Food Balances: Food Balances (-2013, old methodology and population) | TRUE | TRUE | 2023-03-10 | 69MB |
FO | Forestry: Forestry Production and Trade | TRUE | TRUE | 2023-12-21 | 15MB |
RFN | Land, Inputs and Sustainability: Fertilizers by Nutrient | TRUE | TRUE | 2024-07-17 | 2MB |
RL | Land, Inputs and Sustainability: Land Use | TRUE | TRUE | 2024-08-19 | 3MB |
CS | Macro-Economic Indicators: Capital Stock | TRUE | TRUE | 2023-12-22 | 1MB |
OA | Population and Employment: Annual population | TRUE | TRUE | 2022-11-10 | 2MB |
PP | Prices: Producer Prices | TRUE | TRUE | 2024-01-09 | 10MB |
QCL | Production: Crops and livestock products | TRUE | TRUE | 2024-10-07 | 32MB |
SCL | Food Balances: Supply Utilization Accounts (2010-) | TRUE | TRUE | 2024-07-19 | 80MB |
TCL | Trade: Crops and livestock products | TRUE | TRUE | 2023-12-21 | 244MB |
TM | Trade: Detailed trade matrix | TRUE | TRUE | 2023-12-21 | 446MB |
Note that if Exist_Prebuilt is TRUE
for a dataset, it
suggests the raw data was saved in the Prebuilt data. And if
Exist_Prebuilt is TRUE
for all dataset, the package is
ready to be built based on the Prebuilt data.
FF_download_FAOSTAT()
* The function downloads the latest raw data from FAOSTAT.
Two functions above are created for downloading the raw data from a
remote archive or the FAOSTAT API (latest data). The dataset code
variable in the two functions, if including all, can be generated using
gcamfaostat_metadata(OnlyReturnDatasetCodeRequired = T)
.
Example:
# Dataset PP, producer prices, is downloaded from RemoteArchiveURL to DATA_FOLDER
# RemoteArchiveURL = "https://zenodo.org/record/13941470/files/"
# DATA_FOLDER = "inst/extdata/FAOSTAT"
FF_download_RemoteArchive(
DATASETCODE = "PP",
OverWrite = TRUE # overwrite existing PP dataset
)
# Dataset OA, population, is downloaded from FAOSTAT to DATA_FOLDER
FF_download_FAOSTAT(DATASETCODE = "OA", OverWrite = TRUE)
# Note that single DATASETCODE is allowed in both functions.
gcamfaostat_metadata
(which calls
FF_rawdata_info
).Example:
# Provide detailed metadata of "PP" and "OA" in DATA_FOLDER ("inst/extdata/FAOSTAT")
FF_rawdata_info(DATASETCODE = c("PP", "OA"), DOWNLOAD_NONEXIST = FALSE)
# If "PP" or "OA" does not exist, download from remote archive
FF_rawdata_info(DATASETCODE = c("PP", "OA"), DOWNLOAD_NONEXIST = TRUE, FAOSTAT_or_Archive = "Archive")
xfaostat_L191_RawDataPreProc*
modules..
in column name is substituted with
_
.Example:
# Read raw data of "PP" and "OA" from DATA_FOLDER ("inst/extdata/FAOSTAT") to .GlobalEnv
FAOSTAT_load_raw_data(DATASETCODE = c("PP", "OA"), .Envir = .GlobalEnv)