Citation

Khan, Z., Thompson, I., Vernon, C.R. et al. Global monthly sectoral water use for 2010–2100 at 0.5° resolution across alternative futures. Sci Data 10, 201 (2023). https://doi.org/10.1038/s41597-023-02086-2

Models Used

Model	Version	Description	Language	DOI
Tethys	v1.3.1	Spatiotemporal downscaling model for global water use	Python	https://doi.org/10.5281/zenodo.6399488

Data Inputs

Data	Version	Location
GCAM Outputs	v4.3.chen http://doi.org/10.5281/zenodo.3713432	https://doi.org/10.7910/DVN/DYV29J
Demeter Outputs	v1.chen http://doi.org/10.5281/zenodo.3713378	https://data.pnnl.gov/dataset/13192

Data Outputs

Dataset	Version	Location
Tethys Outputs	v1.3.0	https://doi.org/10.7910/DVN/VIQEAB

Workflow Overview

Workflow Summary

Workflow 1 - Scenarios

This data set contains water withdrawal projections under a range of socioeconomic and climate scenarios. Combinations of five Shared Socioeconomic Pathways (SSPs 1-5) and four Representative Concentration Pathways (RCPs 2.6, 4.5, 6, and 8.5) are used. However, RCP 2.6 is not considered possible for SSP 3, and RCP 8.5 is only considered possible for SSP 5. This leaves 15 valid SSP-RCP combinations, shown in the diagram below.

Workflow 1

For each valid SSP-RCP combination, five General Circulation Models (GCMs) are used. This gives a grand total of 75 SSP-RCP-GCM combinations, which are fed into GCAM.

Workflow 2 - Model Runs

GCAM is run for each SSP-RCP-GCM combination. The GCAM outputs are at the resolution of 32 geopolitical regions, and 235 water basins, with 5-year timesteps. The sectoral water withdrawal outputs from GCAM are the primary inputs to Tethys, which will be downscaled to the target resolution. Crop outputs from GCAM are sent to Demeter for downscaling, to be used by Tethys for the irrigation sector.

Workflow 2

Workflow 3 - Spatial Downscaling

Spatial Workflow

Water withdrawals and consumption data for each sector are downscaled spatially first, from GCAM’s 32 regions (or in the case of irrigation, 434 region-basin intersections) to a 0.5 degree gridded format. Of the 259,200 possible grid cells at this resolution (360 x 720), only the 67,420 cells categorized as land are considered. Different sectors use different algorithms to allocate a region’s water withdrawals and consumption among its constituent grid cells, but in principle they are very similar: use gridded values of some proxy quantity to approximate the spatial distribution (i.e., divide each cell’s value by the total of all cells belonging to that region), then apply that distribution to GCAM’s value. These algorithms were derived from research by Edmonds and Reilly, 1985.

For spatial downscaling, the workflow is divided into three parts:

A. Nonagricultural Sectors
B. Livestock
C. Irrigation

These are explained in detail in their respective subsections. Keep in mind that downscaling occurs not just for all regions in a single slice of time, but for every 5 years from 2010 to 2100, across 75 alternative futures (SSP-RCP-GCM combinations).

Workflow 3

Workflow 3A - Nonagricultural Sectors

For nonagricultural sectors (domestic, electricity, manufacturing, and mining), water withdrawals and consumption in each grid cell are assumed to be proportional to that cell’s population. The population data set used for this paper is from Gridded Population of the World (CIESIN, 2016). Tethys uses the nearest available year, which for this paper was 2010 in 2010, and 2015 in all other years. Each region’s population is determined by taking the sum of population over all cells belonging to that region. For each of these sectors, Tethys calculates the withdrawal for a given cell by

\[\text{withdrawal}_\text{cell} = \text{withdrawal}_\text{region} \times \frac{\text{population}_\text{cell}}{\text{population}_\text{region}}.\] An analogous formula is used for consumption: \[\text{consumption}_\text{cell} = \text{consumption}_\text{region} \times \frac{\text{population}_\text{cell}}{\text{population}_\text{region}}.\]

Name	File	Reference	Source
Gridded Population of the World, Version 4	GPW_population.csv	CIESIN, 2016	http://dx.doi.org/10.7927/H4X63JVC

Workflow 3B - Livestock

Gridded global maps (Wint and Robinson, 2007) for six types of livestock (cattle, buffalo, sheep, goats, pigs, and poultry) are used as a proxy to downscale livestock water withdrawal and consumption. GCAM outputs are organized into five types (beef, dairy, pork, poultry, and “sheepgoat”), so these are first reorganized to match the six maps using ratios for each region, estimated from FAO gridded livestock of the world. These are stored in the files bfracFAO2005.csv (“buffalo fraction”) and gfracFAO2005.csv (“goat fraction”). The following formulas are used for each region:

\[\begin{align} \text{buffalo} &= (\text{beef} + \text{dairy}) \times \text{bfrac}\\ \text{cattle} &= (\text{beef} + \text{dairy}) \times (1-\text{bfrac})\\ \text{goat} &= \text{sheepgoat} \times \text{gfrac}\\ \text{sheep} &= \text{sheepgoat} \times (1-\text{gfrac}) \end{align}\]

No adjustment is required for pork (pigs) or poultry. After this, downscaling for each livestock type is very similar to downscaling the nonagricultural sectors, except the respective livestock population (heads) is used as the proxy instead of human population.

\[\text{withdrawal}_\text{animal, cell} = \text{withdrawal}_\text{animal, region} \times \frac{\text{heads}_\text{animal, cell}}{\text{heads}_\text{animal, region}}.\]

The results for each of the six types are then added together to get the total livestock withdrawal for each cell:

\[\begin{align} \text{withdrawal}_\text{livestock, cell} = &\phantom{{}+{}}\text{withdrawal}_\text{cattle, cell}\\ &+ \text{withdrawal}_\text{buffalo, cell}\\ &+ \text{withdrawal}_\text{sheep, cell}\\ &+ \text{withdrawal}_\text{goat, cell}\\ &+ \text{withdrawal}_\text{pig, cell}\\ &+ \text{withdrawal}_\text{poultry, cell} \end{align}\]

Again, analogous formulas follow for consumption.

Name	File	Reference	Source
Gridded Livestock of the World	livestock_<animal>.csv	Wint and Robinson, 2007	http://www.fao.org/docrep/010/a1259e/a1259e00.htm
Buffalo Fraction	bfracFAO2005.csv	“estimated from FAO gridded livestock of the world”
Goat Fraction	gfracFAO2005.csv	“estimated from FAO gridded livestock of the world”

Workflow 3C - Irrigation

GCAM irrigation withdrawal and consumption outputs are organized by 13 crop types: Biomass, Corn, Fiber Crop, Misc Crop, Oil Crop, Other Grain, Palm Fruit, Rice, Root Tuber, Sugar Crop, Wheat, Fodder Herb, and Fodder Grass. Demeter provides a spatial landcover breakdown by all crop types except biomass, which is downscaled uniformly within a region-basin intersection (with respect to land area).

\[\text{withdrawal}_\text{biomass, cell} = \text{withdrawal}_\text{biomass, region} \times \frac{\text{area}_\text{cell}}{\text{area}_\text{region}},\]

When possible, the other 12 crops are downscaled in proportion to the crop land area maps from Demeter, which have been reaggregated to the target resolution of 0.5 degrees. There are certain exceptions. If the GCAM withdrawal or consumption value for a crop in some region-basin is nonzero, but Demeter does not show any cells with that crop type in that region-basin, it will be downscaled uniformly, the same as biomass.

Additionally, it is possible for GCAM and Demeter have different total crop irrigation areas for a region basin intersection, so applying the raw Demeter ratios to irrigation withdrawals or consumption (which are directly related to irrigation areas) could result in cell withdrawal values that imply larger irrigation area than total cell area. In order to avoid this nonphysical situation, excess irrigation area in cells that are above capacity is assigned evenly among irrigated cells with capacity remaining if there are any, otherwise it is assigned evenly among the remaining cells in the region-basin. Should there still be excess after those cells have been filled, it would be dropped.

Using these adjusted irrigation area values for each crop, cell withdrawal values are given by

\[\text{withdrawal}_\text{crop, cell} = \text{withdrawal}_\text{crop, region} \times \frac{\text{area}_\text{crop, cell}}{\text{area}_\text{crop, region}},\]

and the analogous formula is used for consumption. The total irrigation sector value for a cell is the sum of that cell’s values for all 13 crops.

Note that 6 of the 434 region-basin intersections are so small that they are not represented in the 0.5 degree mapping, causing any of their irrigation withdrawals or consumption to be dropped, though these values are correspondingly small.

Workflow 4 - Temporal Downscaling

At this stage GCAM values have been downscaled spatially to 0.5 degree grids, but are still in 5 year timesteps. First, linear interpolation is applied to produce annual data. Then, downscaling algorithms are applied to each sector to produce monthly data with seasonal variation.

For spatial downscaling, the workflow is divided into four parts:

A. Livestock, Manufacturing, and Mining
B. Domestic
C. Electricity Generation
D. Irrigation

Workflow 4

Temporal Workflow

Workflow 4a - Domestic

Temporally downscaling domestic withdrawal uses a formula from Wada et al., 2011 (http://dx.doi.org/10.1029/2010WR009792), with data from Huang et al., 2018 (https://doi.org/10.5194/hess-22-2117-2018). Withdrawals for each month of a year for some cell are given by the formula

\[\text{withdrawal}_\text{month} = \frac{\text{withdrawal}_\text{year}}{12} \times \left(\frac{\text{temperature}_\text{month} - \text{temperature}_\text{mean}}{\text{temperature}_\text{max} - \text{temperature}_\text{min}}R + 1\right).\]

Here \(\text{temperature}_\text{month}\) is the average temperature that month, \(\text{temperature}_\text{mean}\), \(\text{temperature}_\text{max}\), and \(\text{temperature}_\text{min}\) are the mean, maximum, and minimum among monthly average temperatures for that year. \(R\) is a parameter representing the relative difference of water withdrawals between the warmest and coolest months of the year. Note that the monthly values given by this formula add back up to the total for the year (at least conceptually, since floating point arithmetic doesn’t work out perfectly).

Name	File	Reference	Source
Domestic R	DomesticR.csv	Huang et al., 2018	https://doi.org/10.5194/hess-22-2117-2018
WFDEI	watch_wfdei_monthly_1971_2010.npz	Weedon et al., 2014	https://doi.org/10.1002/2014WR015638

Workflow 4B - Electricity Generation

Water withdrawal and consumption for electricity generation each month are assumed to be proportional to the amount of electricity generated, using the formula developed in Voisin et al., 2013 (https://doi.org/10.5194/hess-17-4555-2013).

\[\text{withdrawal}_\text{month} = \text{withdrawal}_\text{year} \times \left[p_\text{b}\times\left(p_\text{h}\frac{\text{HDD}_\text{month}}{\text{HDD}_\text{year}} + p_\text{c}\frac{\text{CDD}_\text{month}}{\text{CDD}_\text{year}}+p_\text{u}\frac{1}{12}\right) + p_\text{it}\frac{1}{12}\right].\]

Here \(p_\text{b}\) and \(p_\text{it}\) are the proportions of electricity used for buildings and industry/transportation respectively, with \(p_\text{b} + p_\text{it} = 1\). The electricity use for industry and transportation is assumed to be uniform throughout the year, while building electricity is further broken down by heating (\(p_\text{h}\)), cooling (\(p_\text{c}\)), and other (\(p_\text{u}\)), with \(p_\text{h} + p_\text{c} + p_\text{u} = 1\).

Heating degree days (HDD) and cooling degree days (CDD) are indicators for the amount of electricity used to heat and cool buildings, and are calculated from mean daily outdoor air temperature. HDD for a month is the sum of \((18-\text{temperature}_\text{day})\) across all days where temperature is less than 18 degrees Celsius. CDD is the sum of \((\text{temperature}_\text{day} - 18)\) across all days where temperature is greater than 18. Annual HDD and CDD are the sum of their respective monthly values.

Tethys uses HDD, CDD, and \(p\) values for each cell from the nearest available year in the input files listed at the end of this subsection, which is 2010 for this data set.

The formula is modified for cells with low annual HDD or CDD as described in Huang et al., 2018, since these may not have heating or cooling services despite nonzero values of \(p_\text{h}\) or \(p_\text{c}\). When \(\text{HDD}_\text{year} < 650\), The HDD term is removed and \(p_\text{h}\) is reallocated to the cooling proportion, giving

\[\text{withdrawal}_\text{month} = \text{withdrawal}_\text{year} \times \left[p_\text{b}\times\left((p_\text{h} + p_\text{c})\frac{\text{CDD}_\text{month}}{\text{CDD}_\text{year}}+p_\text{u}\frac{1}{12}\right) + p_\text{it}\frac{1}{12}\right].\]

Likewise, when \(\text{CDD}_\text{year} < 450\), the formula becomes

\[\text{withdrawal}_\text{month} = \text{withdrawal}_\text{year} \times \left[p_\text{b}\times\left((p_\text{h} + p_\text{c})\frac{\text{HDD}_\text{month}}{\text{HDD}_\text{year}}+p_\text{u}\frac{1}{12}\right) + p_\text{it}\frac{1}{12}\right].\]

When annual HDD and CDD are both below their respective thresholds, all sources of monthly variation vanish and the formula reduces to

\[\text{withdrawal}_\text{month} = \frac{\text{withdrawal}_\text{year}}{12}.\]

Analogous formulas follow for consumption.

Name	File	Reference	Source
Building Proportion	ElecBuilding_1971_2010.csv	IEA historical data, Huang et al., 2018	https://doi.org/10.5194/hess-22-2117-2018
Industry/Technology Proportion	ElecIndustry_1971_2010.csv	IEA historical data, Huang et al., 2018	https://doi.org/10.5194/hess-22-2117-2018
Heating Proportion	ElecBuildingHeat_1971_2010.csv	IEA historical data, Huang et al., 2018	https://doi.org/10.5194/hess-22-2117-2018
Cooling Proportion	ElecBuildingCool_1971_2010.csv	IEA historical data, Huang et al., 2018	https://doi.org/10.5194/hess-22-2117-2018
Other Proportion	ElecBuildingOthers_1971_2010.csv	IEA historical data, Huang et al., 2018	https://doi.org/10.5194/hess-22-2117-2018
WFDEI	watch_wfdei_monthly_1971_2010.npz	Weedon et al., 2014	https://doi.org/10.1002/2014WR015638

Workflow 4C - Livestock, Manufacturing, and Mining

For livestock, manufacturing, and mining, a uniform distribution is applied. The withdrawal or consumption for the year is divided between months according to the number of days.

\[\text{withdrawal}_\text{month} = \text{withdrawal}_\text{year} \times \frac{\text{days}_\text{month}}{\text{days}_\text{year}}.\] \[\text{consumption}_\text{month} = \text{consumption}_\text{year} \times \frac{\text{days}_\text{month}}{\text{days}_\text{year}}.\]

Workflow 4D - Irrigation

Temporal downscaling for irrigation water withdrawal and consumption is based on weighted irrigation profiles for each of the 235 basins. Monthly irrigation withdrawal values from the PCR-GLOBWB global hydrological model are averaged across the years 1971-2010, then aggregated to the basin scale. The monthly irrigation withdrawal percentages for a basin are applied to all crops in each of its cells:

\[\text{withdrawal}_\text{month} = \text{withdrawal}_\text{year} \times \text{percent}_\text{basin, month}.\]

In the event that the model has no monthly data for a basin with nonzero irrigation, the profile of the nearest available basin is used.

Name	File	Reference	Source
PCR‐GLOBWB	pcrglobwb_wfdei_varsoc_pirrww_global_monthly_1971_2010.nc	Van Beek et al., 2011; Wada et al., 2011	original data files were obtained from ISI‐MIP (Warszawski et al., 2014), then processed into gridded monthly percentage values
Basin Distances	dist.csv		calculated from basin locations

Workflow 5 - Results

Annual

Total

Withdrawal

Total Annual Water Withdrawal by SSP-RCP-GCM

Consumption

Total Annual Water Consumption by SSP-RCP-GCM

By Sector

Withdrawal

Global Annual Water Withdrawal by SSP-RCP-GCM and Sector

Consumption

Global Annual Water Consumption by SSP-RCP-GCM and Sector

Crops

Withdrawal

Global Annual Water Withdrawal by SSP-RCP-GCM and Crop

Consumption

Global Annual Water Consumption by SSP-RCP-GCM and Crop

GCMs

GFDL

Global Annual Water Withdrawal by SSP-RCP-Sector, GCM: gfdl

HADGEM

TGlobal Annual Water Withdrawal by SSP-RCP-Sector, GCM: hadgem

IPSL

Global Annual Water Withdrawal by SSP-RCP-Sector, GCM: ipsl

MIROC

Global Annual Water Withdrawal by SSP-RCP-Sector, GCM: miroc

NORESM

Global Annual Water Withdrawal by SSP-RCP-Sector, GCM: noresm

Monthly

All Sectors

Monthly Example

No Irrigation

Without Irrigation

Crops

Without Irrigation

–> –>

Validation

It is important to confirm that the water withdrawal and consumption totals are preserved at all stages of the downscaling process. Floating point calculations lead to some loss of precision, but the downscaling algorithms themselves should not meaningfully change the totals.

While this is a necessary check, it does not necessarily say anything about the “correctness” of our downscaled results. Validation for the downscaling algorithms used by Tethys can be found in their respective sources. Additionally, we compare the spatial and temporal distributions of our results with those of similar data sets.

Spatial

For each region (or region-basin intersection), year, sector (or crop), and scenario, we plot the withdrawal and consumption value from GCAM compared to the value reaggregated from Tethys annual outputs.

6 of the 434 region-basin intersections are smaller than the output resolution, and are dropped, causing a correspondingly small loss of water in the irrigation sector.

Withdrawals

Sectors

GCAM inputs vs grids Reaggregated

Crops

GCAM inputs vs grids Reaggregated

Consumption

Sectors

GCAM inputs vs grids Reaggregated

Crops

GCAM inputs vs grids Reaggregated

Temporal

For each of the same categories as above, we compare the GCAM value with the value reaggregated from Tethys monthly outputs.

Withdrawals

Sectors

Annual vs Reaggregated Monthly

Crops

Annual vs Reaggregated Monthly

Consumption

Sectors

Annual vs Reaggregated Monthly

Crops

Annual vs Reaggregated Monthly

Similar Data Sets

There are inherent challenges with validating water demand projections across alternative futures. Since this work is primarily concerned with the downscaling of existing projections to a gridded monthly scale, we look at how spatial and temporal patterns in our year 2010 outputs (for which all scenarios are identical) compare to those of similar data sets.

The first data set we look at is from Huang et al. (2018), which used an earlier version of Tethys on historical data from 1971-2010. The underlying data have more regions and different totals, but many of the downscaling methods are identical, leading to similar results.

The second data set we compared is from Mekonnen, M.M. and Hoekstra, A.Y. (2011). It contains monthly total blue water consumption values representing an average of years 1996-2005, which is close enough to 2010 to compare with our data set, though this probably accounts for some of the differences. The sectoral breakdown is also different, but the data are at the same spatial resolution, so we can compare monthly totals for each grid cell.

Directly comparing cell values between data sets shows the differences, but gives a somewhat limited picture of the overall distributions. To get a sense for the spatial patterns we map the 2010 (or 1996-2005) withdrawal and consumption totals from each set.