Python API#

stitches offers a programmatic API in Python.

Note

For questions or request for support, please reach out to the development team. Your feedback is much appreciated in evolving this API!

Core functionality#

stitches.match_neighborhood#

stitches.match_neighborhood(target_data, archive_data, tol=0, drop_hist_duplicates=True)[source]#

Calculate the Euclidean distance between target and archive data.

This function takes data frames of target and archive data and calculates the Euclidean distance between the target values (fx and dx) and the archive values.

Parameters:
  • target_data – Data frame of the target fx and dx values.

  • archive_data – Data frame of the archive fx and dx values.

  • tol (float) – Tolerance for the neighborhood of matching. Defaults to 0 degC, meaning only the nearest-neighbor is returned. Must be a float.

  • drop_hist_duplicates (bool) – Determines whether to consider historical values across SSP scenarios as duplicates (True) and drop all but one from matching, or to consider them as distinct points for matching (False). Defaults to True.

Returns:

Data frame with the target data and the corresponding matched archive data.

stitches.permute_stitching_recipes#

stitches.permute_stitching_recipes(N_matches, matched_data, archive, optional=None, testing=False)[source]#

Sample from matched_data to produce permutations of stitching recipes.

This function samples from matched_data (the results of match_neighborhood(target, archive, tol)) to produce permutations of possible stitching recipes that will match the target data.

Parameters:
  • N_matches (int) – The maximum number of matches per target data.

  • matched_data – Data output from match_neighborhood.

  • archive – The archive data to use for re-matching duplicate points.

  • optional – A previous output of this function that contains a list of already created recipes to avoid re-making (this is not implemented).

  • testing (bool) – When True, the behavior can be reliably replicated without setting global seeds. Defaults to False.

Returns:

A data frame with the same structure as the raw matched data, with duplicate matches replaced.

stitches.generate_gridded_recipe#

stitches.generate_gridded_recipe(messy_recipe, res='mon')[source]#

Create a recipe for the stitching process using a messy recipe.

Parameters:
  • messy_recipe – A data frame generated by the permute_recipes function.

  • res (str) – The resolution of the recipe, either ‘mon’ for monthly or ‘day’ for daily.

Returns:

A data frame formatted as a recipe for stitching.

stitches.make_recipe#

stitches.make_recipe(target_data, archive_data, N_matches, res='mon', tol=0.1, non_tas_variables=None, reproducible=False)[source]#

Generate a stitching recipe from target and archive data.

Parameters:
  • target_data – A pandas DataFrame of climate information to emulate.

  • archive_data – A pandas DataFrame of temperature data to use as the archive to match on.

  • N_matches (int) – The maximum number of matches per target data.

  • res (str) – Resolution of the stitched data, either ‘mon’ or ‘day’.

  • tol (float) – Tolerance used in the matching process, default is 0.1.

  • non_tas_variables (list[str]) – List of variables other than tas to stitch together; defaults to None, which stitches tas only.

  • reproducible (bool) – If True, ensures reproducible behavior by using the testing=True argument in permute_stitching_recipes(); defaults to False.

Returns:

A pandas DataFrame of a formatted recipe.

stitches.gridded_stitching#

stitches.gridded_stitching(out_dir, rp)[source]#

Stitch the gridded NetCDFs for variables contained in the recipe file and save them.

Parameters:
  • out_dir (str) – Directory location where to write the NetCDF files.

  • rp – DataFrame of the recipe including variables to stitch.

Returns:

List of the NetCDF file paths.

stitches.gmat_stitching#

stitches.gmat_stitching(rp)[source]#

Stitch together a time series of global tas data based on a recipe data frame.

Parameters:

rp – A fully formatted recipe data frame as a pandas DataFrame.

Returns:

A pandas DataFrame of stitched together tas data.

stitches.fetch_pangeo_table#

stitches.fetch_pangeo_table()[source]#

Fetch the Pangeo CMIP6 archive table of contents as a pandas DataFrame.

Retrieve a copy of the Pangeo CMIP6 archive contents, which includes information about the available models, sources, experiments, ensembles, and more.

Returns:

A pandas DataFrame with details on the datasets available for download from Pangeo.

stitches.fetch_nc#

stitches.fetch_nc(zstore)[source]#

Extract data for a single file from Pangeo.

Parameters:

zstore (str) – The location of the CMIP6 data file on Pangeo.

Returns:

An xarray Dataset containing CMIP6 data downloaded from Pangeo.

stitches.make_tas_archive#

stitches.make_tas_archive(anomaly_startYr=1995, anomaly_endYr=2014)[source]#

Create the archive from Pangeo-hosted CMIP6 data.

This function processes CMIP6 data hosted on Pangeo to create an archive of temperature anomaly files. It calculates anomalies based on a specified reference period.

Parameters:
  • anomaly_startYr (int) – Start year of the reference period for anomaly calculation.

  • anomaly_endYr (int) – End year of the reference period for anomaly calculation.

Returns:

List of paths to the created tas files.

Return type:

list

stitches.make_matching_archive#

stitches.make_matching_archive(smoothing_window=9, chunk_window=9, add_staggered=False)[source]#

Create an archive of rate of change (dx) and mean (fx) values.

This function processes the CMIP6 archive to produce values used in the matching portion of the stitching pipeline.

Parameters:
  • smoothing_window (int) – The size of the smoothing window to be applied to the time series. Defaults to 9.

  • chunk_window (int) – The size of the chunks of data to summarize with dx & fx. Defaults to 9.

  • add_staggered (bool) – If True, staggered windows will be added to the archive. Defaults to False.

Returns:

The file location of the matching archive.

stitches.make_pangeo_table#

stitches.make_pangeo_table()[source]#

Create a copy of the Pangeo files that have corresponding entries in the matching archive.

This function is used in the stitching process to ensure that only relevant Pangeo files are considered. It writes out a file to the package data directory.

Returns:

None

stitches.make_pangeo_comparison#

stitches.make_pangeo_comparison()[source]#

Create a copy of the entire Pangeo archive for testing.

This function is used to check for updates in the Pangeo archive. If an update is detected, it may suggest updating the internal package data.

Returns:

None. Writes a file to package data.

stitches.install_package_data#

stitches.install_package_data(data_dir=None)[source]#

Download and unpack Zenodo-minted stitches package data.

This function matches the current installed stitches distribution and unpacks the data into the specified directory or the default data directory of the package.

Parameters:

data_dir (str) – Optional. Full path to the directory to store the data. Default is the data directory of the package.

Returns:

None