Quickstarter#
In this example, we’ll use the test precipitation data supplied in the basd
repo
to see how to generate bias adjusted and statistical downscaled output.
Loading Packages#
Start by importing basd
, xarray
, dask
, and some other utility packages.
import os
import pkg_resources
import basd
from dask.distributed import Client, LocalCluster
import xarray as xr
Reading Data#
Then define the paths to our input data. For input into basd
you need three datasets:
A reference dataset. This is an observational dataset over a past period (we’ll call it the reference period), used to as reference for the dataset we want to adjust/downscale.
A simulated dataset over the reference period. This could be a CMIP6 dataset for example.
The same or associated simulated dataset some other period (the application period).
Locate the test precipitation data and define the paths:
input_dir = pkg_resources.resource_filename('basd', 'data')
output_dir = pkg_resources.resource_filename('basd', 'data/output')
and read in the data using xarray
:
pr_obs_hist = xr.open_mfdataset(os.path.join(input_dir, 'pr_obs-hist_fine_1979-2014.nc'), chunks={'time': 100})
pr_sim_hist = xr.open_mfdataset(os.path.join(input_dir, 'pr_sim-hist_coarse_1979-2014.nc'), chunks={'time': 100})
pr_sim_fut = xr.open_mfdataset(os.path.join(input_dir, 'pr_sim-fut_coarse_2065-2100.nc'), chunks={'time': 100})
If you’re familiar with xarray
you may notice the xr.open_mfdataset
command to be slightly
different than the default option. What we’ve done is lazily loaded the data using dask
and dask.array
as the backend data structure in which we’ve loaded into. This allows
us to load data and perform computations only when we’re ready, which we need when working
with large datasets. We’ve also “chunked” the data along the time dimension at every 100th
data-point. This is breaking up our data into smaller pieces, which again helps to not load
a large amount of data at once, but also to parallelize the processes that we have coming up.
Using Dask Distributed#
As mentioned, basd
makes use of the dask
package, including using distributed computing.
Users are free to edit the workflow for what works best for them and their machine. However,
because basd
has some GIL locked dependencies, make sure to use multi-processing.
A basic set-up looks like this:
cluster = LocalCluster(processes=True, threads_per_worker=1)
client = Client(cluster)
which creates a number of processes and workers depending on the core count of your machine
which will be accessible to dask
to distribute tasks to perform.
We also suggest using the with
utility for better management of closing the cluster at the
end of your task, or if any errors occur. You will see that used below.
Initializing Bias Adjustment#
We initialize our bias adjustment process by feeding basd
our input data and
other parameters. It then does some pre-processing to make sure the data and parameters
we supplied are valid inputs.
First let’s set our parameters by creating a basd.Parameters
object:
params = basd.Parameters(
lower_bound=0, lower_threshold=0.0000011574,
trend_preservation='mixed',
distribution='gamma',
if_all_invalid_use=0, n_iterations=20
)
The exact settings of your parameters object will be different for different climate variables. The choices made here will be discussed elsewhere, and we will provide default options for you for different variables based on literature, though you’re welcome to change values as suits your needs.
Then we can pass our parameters and input data to our initialization function:
ba = basd.init_bias_adjustment(
pr_obs_hist, pr_sim_hist, pr_sim_fut,
'pr', params, 1, 1
)
Running Bias Adjustment#
basd.adjust_bias(
init_output = ba, output_dir = output_dir,
day_file = ba_day_output_file, month_file = ba_mon_output_file,
clear_temp = True, encoding={ 'pr': coarse_encoding}
)