xsdba package¶

Statistical correction and bias adjustment tools for xarray.

Submodules¶

xsdba._adjustment module¶

Adjustment Algorithms¶

This file defines the different steps, to be wrapped into the Adjustment objects.

xsdba._adjustment.mbcn_adjust(ref: DataArray, hist: DataArray, sim: DataArray, ds: Dataset, g_idxs: DataArray, gw_idxs: DataArray, pts_dims: tuple[str, str], interp: str, extrapolation: str, base: Callable, base_kws_vars: dict, adj_kws: dict, period_dim: str | None) → Dataset[source]¶

Perform the adjustment portion MBCn multivariate bias correction technique.

The function mbcn_train() pre-computes the adjustment factors for each rotation in the npdf portion of the MBCn algorithm. The rest of adjustment is performed here in mbcn_adjust`.

Parameters:

ref (xr.DataArray) – training target.
hist (xr.DataArray) – training data.
sim (xr.DataArray) – data to adjust (stacked with multivariate dimension).
g_idxs (xr.DataArray) – Indices of the times in each time group.
gw_idxs (xr.DataArray) – Indices of the times in each windowed time group.
ds (xr.Dataset) –

Dataset variables:
rot_matrices : Rotation matrices used in the training step. af_q : Adjustment factors obtained in the training step for the npdf transform
pts_dims ([str, str]) – The name of the “multivariate” dimension and its primed counterpart. Defaults to “multivar”, which is the normal case when using xsdba.stack_variables(), and “multivar_prime”.
interp (str) – Interpolation method for the npdf transform (same as in the training step).
extrapolation (str) – Extrapolation method for the npdf transform (same as in the training step).
base (BaseAdjustment) – Bias-adjustment class used for the univariate bias correction.
base_kws_vars (Dict) – Options for univariate training for the scenario that is reordered with the output of npdf transform. The arguments are those expected by TrainAdjust classes along with - kinds : Dict of correction kinds for each variable (e.g. {“pr”:”*”, “tasmax”:”+”}).
adj_kws (Dict) – Options for univariate adjust for the scenario that is reordered with the output of npdf transform.
period_dim (str, optional) – Name of the period dimension used when stacking time periods of sim using xsdba.stack_periods(). If specified, the interpolation of the npdf transform is performed only once and applied on all periods simultaneously. This should be more performant, but also more memory intensive. Defaults to None: No optimization will be attempted.

Returns:

The adjusted data.

Return type:

xr.Dataset

xsdba._adjustment.mbcn_train(ds: Dataset, rot_matrices: DataArray, pts_dims: Sequence[str], quantiles: ndarray, gw_idxs: DataArray, interp: str, extrapolation: str, n_escore: int) → Dataset[source]¶

Npdf transform training.

Adjusting factors obtained for each rotation in the npdf transform and conserved to be applied in the adjusting step in mcbn_adjust().

Parameters:

ds (xr.Dataset) –

Dataset variables:
ref : training target hist : training data
rot_matrices (xr.DataArray) – The rotation matrices as a 3D array (‘iterations’, <pts_dims[0]>, <pts_dims[1]>), with shape (n_iter, <N>, <N>).
pts_dims (sequence of str) – The name of the “multivariate” dimension and its primed counterpart. Defaults to “multivar”, which is the normal case when using xsdba.stack_variables(), and “multivar_prime”.
quantiles (array-like) – The quantiles to compute.
gw_idxs (xr.DataArray) – Indices of the times in each windowed time group.
interp (str) – The interpolation method to use.
extrapolation (str) – The extrapolation method to use.
n_escore (int) – Number of elements to include in the e_score test (0 for all, < 0 to skip).

Returns:

The dataset containing the adjustment factors and the quantiles over the training data (only the npdf transform of mbcn).

Return type:

xr.Dataset

xsdba._adjustment.npdf_transform(ds: Dataset, **kwargs) → Dataset[source]¶

N-pdf transform : Iterative univariate adjustment in random rotated spaces.

Parameters:

ds (xr.Dataset) –

Dataset variables:
ref : Reference multivariate timeseries. hist : simulated timeseries on the reference period. sim : Simulated timeseries on the projected period. rot_matrices : Random rotation matrices.
**kwargs – pts_dim : multivariate dimension name. base : Adjustment class. base_kws : Kwargs for initialising the adjustment object. adj_kws : Kwargs of the adjust call. n_escore : Number of elements to include in the e_score test (0 for all, < 0 to skip).

Returns:

Dataset variables:: scenh : Scenario in the reference period (source hist transferred to target ref inside training). scens : Scenario in the projected period (source sim transferred to target ref outside training). escores : Index estimating the dissimilarity between scenh and hist.

Return type:

xr.Dataset

Notes

If n_escore is negative, escores will be filled with NaNs.

xsdba._processing module¶

Compute Functions Submodule¶

Here are defined the functions wrapped by map_blocks or map_groups. The user-facing, metadata-handling functions should be defined in processing.py.

xsdba.adjustment module¶

Adjustment Methods¶

class xsdba.adjustment.BaseAdjustment(*args, _trained=False, **kwargs)[source]¶

Bases: ParametrizableWithDataset

Base class for adjustment objects.

Children classes should implement the train and / or the adjust method.

This base class defined the basic input and output checks. It should only be used for a real adjustment if neither TrainAdjust nor Adjust fit the algorithm.

class xsdba.adjustment.DetrendedQuantileMapping(*args, _trained=False, **kwargs)[source]¶

Bases: TrainAdjust

Detrended Quantile Mapping bias-adjustment.

Train step

nquantiles¶

The number of quantiles to use. See equally_spaced_nodes(). An array of quantiles [0, 1] can also be passed. Defaults to 20 quantiles.

Type:: int or 1d array of floats

kind¶

The adjustment kind, either additive or multiplicative. Defaults to “+”.

Type:: {‘+’, ‘*’}

group¶

The grouping information. See xsdba.base.Grouper for details. Default is “time”, meaning a single adjustment group along dimension “time”.

Type:: str or Grouper

adapt_freq_thresh¶

Threshold for frequency adaptation. See xsdba.processing.adapt_freq for details. Default is None, meaning that frequency adaptation is not performed.

Type:: str, optional

max_tail_factor¶

If not None, values to adjust (after preprossing steps) that are above max_tail_factor * the value of the last quantile of hist (before the preprocessing steps, stored in hist_q_raw) are not adjusted. We keep the input simulation with only the preprocessing steps instead. If None, hist_q_raw output will just be a dummy variable.

Type:: float, optional

Adjust step

interp¶

The interpolation method to use when interpolating the adjustment factors. Defaults to “nearest”.

Type:: {‘nearest’, ‘linear’, ‘cubic’}

detrend¶

The method to use when detrending. If an int is passed, it is understood as a PolyDetrend (polynomial detrending) degree. Defaults to 1 (linear detrending).

Type:: int or BaseDetrend instance

extrapolation¶

The type of extrapolation to use. Defaults to “constant”.

Type:: {‘constant’, ‘nan’}

Notes

The algorithm follows these steps, 1-3 being the ‘train’ and 4-6, the ‘adjust’ steps.

A scaling factor that would make the mean of hist match the mean of ref is computed.
ref and hist are normalized by removing the “dayofyear” mean.
Adjustment factors are computed between the quantiles of the normalized ref and hist.
sim is corrected by the scaling factor, and either normalized by “dayofyear” and detrended group-wise or directly detrended per “dayofyear”, using a linear fit (modifiable).
Values of detrended sim are matched to the corresponding quantiles of normalized hist and corrected accordingly.
The trend is put back on the result.

\[F^{-1}_{ref}\left\{F_{hist}\left[\frac{\overline{hist}\cdot sim}{\overline{sim}}\right]\right\}\frac{\overline{sim}}{\overline{hist}}\]

where \(F\) is the cumulative distribution function (CDF) and \(\overline{xyz}\) is the linear trend of the data. This equation is valid for multiplicative adjustment. Based on the DQM method of [Cannon et al., 2015].

References

Cannon, Sobie, and Murdock [2015]

class xsdba.adjustment.EmpiricalQuantileMapping(*args, _trained=False, **kwargs)[source]¶

Bases: TrainAdjust

Empirical Quantile Mapping bias-adjustment.

Train step

nquantiles¶

The number of quantiles to use. Two endpoints at 1e-6 and 1 - 1e-6 will be added. An array of quantiles [0, 1] can also be passed. Defaults to 20 quantiles.

Type:: int or 1d array of floats

kind¶

The adjustment kind, either additive or multiplicative. Defaults to “+”.

Type:: {‘+’, ‘*’}

group¶

The grouping information. See xsdba.base.Grouper for details. Default is “time”, meaning an single adjustment group along dimension “time”.

Type:: Union[str, Grouper]

adapt_freq_thresh¶

Threshold for frequency adaptation. See xsdba.processing.adapt_freq for details. Default is None, meaning that frequency adaptation is not performed.

Type:: str, optional

max_tail_factor¶

Type:: float, optional

Adjust step

interp¶

The interpolation method to use when interpolating the adjustment factors. Defaults to “nearest”.

Type:: {‘nearest’, ‘linear’, ‘cubic’}

extrapolation¶

The type of extrapolation to use. Defaults to “constant”.

Type:: {‘constant’, ‘nan’}

Notes

Adjustment factors are computed between the quantiles of ref and sim. Values of sim are matched to the corresponding quantiles of hist and corrected accordingly.

\[F^{-1}_{ref} (F_{hist}(sim))\]

where \(F\) is the cumulative distribution function (CDF) and mod stands for model data.

References

Déqué [2007]

class xsdba.adjustment.ExtremeValues(*args, _trained=False, **kwargs)[source]¶

Bases: TrainAdjust

Adjustment correction for extreme values.

This adjustment method should be considered experimental and used with care. See the Warnings and Notes sections for more details.

Parameters:

step (Adjust)
cluster_thresh (Quantified (str with units or DataArray with units)) – The threshold value for defining clusters. For precipitation data, a common choice is “1 mm/day” (in the units of the data).
q_thresh (float) – The quantile threshold for the peak-over-threshold selection of values in clusters x > cluster_thresh. See Notes. Defaults to 0.95.
ref_params (xr.DataArray, optional) – Distribution parameters to use instead of fitting a GenPareto distribution on ref.
step
scen (DataArray) – This is a second-order adjustment, so the adjust method needs the first-order adjusted timeseries in addition to the raw “sim”.
interp ({'nearest', 'linear', 'cubic'}) – The interpolation method to use when interpolating the adjustment factors. Defaults to “linear”.
extrapolation ({'constant', 'nan'}) – The type of extrapolation to use. Defaults to “constant”.
frac (float) – Fraction of the correction space where a transition between scen and the newly corrected extremes happens. The correction space is defined as the range of values between the q_thresh quantile of hist and the maximum value of sim. See Notes, ]0, 1]. Defaults to 0.70.
power (float) – Shape of the transition function. See Notes. Defaults to 3 (cubic transition function).

Warning

This method has been primarily designed for precipitation data and may not be suitable for other variables without careful consideration.
The actual value of thresh is the average of q_thresh in hist and ref. This might produce unexpected results if hist and ref have very different distributions of large values.
Results can be very sensitive to the choice of q_thresh, frac and power parameters. In limited testing made in Southern Quebec using an ensemble of 12 CMIP6 climate models and 2 reference datasets (both reanalyses), the best results were obtained with a relatively low q_thresh (~0.95), combined with a smooth transition (frac ~ 0.6 to 0.7, power ~ 3). However, these values may not be optimal for other regions and datasets.
While this is not currently implemented within the method itself, assumptions that underlie the theoretical framework of extreme value theory should be taken in consideration. In particular, using coherent seasons (e.g. separately correcting winter and summer extremes) has been shown to improve results.
Non-stationarity is not explicitly accounted for in this method, but can be partially addressed by wrapping this method in a moving window approach, using the xsdba.stack_periods function.

Notes

This method is designed to correct the tail of the distribution of sim according to the bias between the tails of ref and hist. This is a second-order adjustment, meaning that it should be applied after a first-order adjustment (e.g. quantile mapping) has already been performed on sim. Based on the “extremes” method of [Roy et al., 2023].

Extreme values are extracted from ref, hist and sim by finding all “clusters”, i.e. runs of consecutive values above cluster_thresh. The q_thresh`th percentile of these values is taken on `ref and hist and its average becomes thresh, the extreme value threshold. The maximal values of each cluster, if they exceed that new threshold, are used to fit Generalized Pareto distributions for both ref and hist. The probabilities associated with each of these extremes in hist is used to find the corresponding value according to ref’s distribution. Adjustment factors are computed as the bias between those new extremes and the original ones, in the probability space.

In the adjust step, a Generalized Pareto distribution is fitted on the cluster-maximums of sim and is used to associate a probability to each extreme. The adjustment factors are applied based on these probabilities. This correction is applied to all values of sim above thresh, so multiple values can be corrected in the same cluster.

Finally, the adjusted values (\(C_i\)) are mixed with the pre-adjusted ones (scen, \(D_i\)) using the following transition function:

\[V_i = C_i * \tau + D_i * (1 - \tau)\]

Where \(\tau\) is a function of sim’s extreme values (unadjusted, \(S_i\)) and of arguments frac (\(f\)) and power (\(p\)):

\[\tau = \left(\frac{1}{f}\frac{S - min(S)}{max(S) - min(S)}\right)^p\]

Code based on an internal Matlab source and partly ib the biascorrect_extremes function of the julia package “ClimateTools.jl” [Roy et al., 2021].

Because of limitations imposed by the lazy computing nature of the dask backend, it is not possible to know the number of cluster extremes in ref and hist at the moment the output data structure is created. This is why the code tries to estimate that number and usually overestimates it. In the training dataset, this translated into a quantile dimension that is too large and variables af and px_hist are assigned NaNs on extra elements. This has no incidence on the calculations themselves but requires more memory than is useful.

References

Roy, Smith, Kelman, Nolet-Gravel, Saba, Thomet, TagBot, and Forget [2021] Roy, Rondeau-Genesse, Jalbert, and Fournier [2023]

class xsdba.adjustment.LOCI(*args, _trained=False, **kwargs)[source]¶

Bases: TrainAdjust

Local Intensity Scaling (LOCI) bias-adjustment.

This bias adjustment method is designed to correct daily precipitation time series by considering wet and dry days separately [Schmidli et al., 2006].

Multiplicative adjustment factors are computed such that the mean of hist matches the mean of ref for values above a threshold.

The threshold on the training target ref is first mapped to hist by finding the quantile in hist having the same exceedance probability as thresh in ref. The adjustment factor is then given by

\[s = \frac{\left \langle ref: ref \geq t_{ref} \right\rangle - t_{ref}}{\left \langle hist : hist \geq t_{hist} \right\rangle - t_{hist}}\]

In the case of precipitations, the adjustment factor is the ratio of wet-days intensity.

For an adjustment factor s, the bias-adjustment of sim is:

\[sim(t) = \max\left(t_{ref} + s \cdot (hist(t) - t_{hist}), 0\right)\]

Train step

group¶

The grouping information. See xsdba.base.Grouper for details. Default is “time”, meaning a single adjustment group along dimension “time”.

Type:: str or Grouper

thresh¶

The threshold in ref above which the values are scaled.

Type:: str

Adjust step

interp¶

The interpolation method to use then interpolating the adjustment factors. Defaults to “linear”.

Type:: {‘nearest’, ‘linear’, ‘cubic’}

References

Schmidli, Frei, and Vidale [2006]

class xsdba.adjustment.MBCn(*args, _trained=False, **kwargs)[source]¶

Bases: TrainAdjust

Multivariate bias correction function using the N-dimensional probability density function transform.

A multivariate bias-adjustment algorithm described by Cannon [2018] based on a color-correction algorithm described by Pitie et al. [2005].

This algorithm in itself, when used with QuantileDeltaMapping, is NOT trend-preserving. The full MBCn algorithm includes a reordering step provided here by xsdba.processing.reordering().

See notes for an explanation of the algorithm.

Train step

base_kws¶

Arguments passed to the training in the npdf transform.

Type:: dict, optional

adj_kws¶

Arguments passed to the adjusting in the npdf transform.

Type:: dict, optional

n_escore¶

The number of elements to send to the escore function. The default, 0, means all elements are included. Pass -1 to skip computing the escore completely. Small numbers result in less significant scores, but the execution time goes up quickly with large values.

Type:: int

n_iter¶

The number of iterations to perform. Defaults to 20.

Type:: int

pts_dim¶

The name of the “multivariate” dimension. Defaults to “multivar”, which is the normal case when using xsdba.stack_variables().

Type:: str

rot_matrices¶

The rotation matrices as a 3D array (‘iterations’, <pts_dim>, <anything>), with shape (n_iter, <N>, <N>). If left empty, random rotation matrices will be automatically generated.

Type:: xr.DataArray, optional

Adjust step

base¶

Bias-adjustment class used for the univariate bias correction.

Type:: BaseAdjustment

period_dim¶

Name of the period dimension used when stacking time periods of sim using xsdba.stack_periods(). If specified, the interpolation of the npdf transform is performed only once and applied on all periods simultaneously. This should be more performant, but also more memory intensive.

Type:: str, optional

Training(only npdf transform training)¶

1. Standardize `ref` and `hist` (see ``xsdba.processing.standardize``.)

2. Rotate the datasets in the N-dimensional variable space with :math:`\mathbf{R}`, a random rotation NxN matrix.

.. math::: tilde{mathbf{T}} = mathbf{T}mathbf{R} tilde{mathbf{H}} = mathbf{H}mathbf{R}

3. QuantileDeltaMapping is used to perform bias adjustment :math:`\mathcal{F}` on the rotated datasets.

The adjustment factor is conserved for later use in the adjusting step. The adjustments are made in additive mode,

for each variable :math:`i`.

.. math::: hat{mathbf{H}}_i, hat{mathbf{S}}_i = mathcal{F}left(tilde{mathbf{T}}_i, tilde{mathbf{H}}_i, tilde{mathbf{S}}_iright)

4. The bias-adjusted datasets are rotated back.

.. math::: mathbf{H}’ = hat{mathbf{H}}mathbf{R} \ mathbf{S}’ = hat{mathbf{S}}mathbf{R}

5. Repeat steps 2,3,4 three steps ``n_iter`` times, i.e. the number of randomly generated rotation matrices.

Adjusting¶

1. Perform the same steps as in training, with `ref, hist` replaced with `sim`. Step 3. of the training is modified, here we

simply reuse the adjustment factors previously found in the training step to bias correct the standardized `sim` directly.

2. Using the original (unstandardized) `ref,hist, sim`, perform a univariate bias adjustment using the ``base_scen`` class

on `sim`.

3. Reorder the dataset found in step 2. according to the ranks of the dataset found in step 1.

The original algorithm :cite:p:`pitie_n-dimensional_2005`, stops the iteration when some distance score converges.

Following cite:t:`cannon_multivariate_2018` and the MBCn implementation in :cite:t:`cannon_mbc_2020`, we

instead fix the number of iterations.

As done by cite:t:`cannon_multivariate_2018`, the distance score chosen is the "Energy distance" from

:cite:t:`szekely_testing_2004`. (see

Type:: xsdba.processing.escore()).

The random matrices are generated following a method laid out by :cite:t:`mezzadri_how_2007`.

Notes

The grouping of time dimensions is passed through base_kws. Three types of grouping are allowed: “time” or xsdba.Grouper(“time”) “time.dayofyear”; `xsdba.Grouper(“time.dayofyear”, window); and `xsdba.Grouper(“5D”, window), where window must be an odd integer that counts the number of 5-day subgroups. The window moves in 5-day strides too. This last option is a specific option to MBCn.
The historical reference (\(T\), for “target”), simulated historical (\(H\)) and simulated projected (\(S\)) datasets are constructed by stacking the timeseries of N variables together using xsdba.stack_variables.

References

Cannon [2018], Cannon [2020], Mezzadri [2007], Pitie, Kokaram, and Dahyot [2005], Szekely and Rizzo [2004]

class xsdba.adjustment.NpdfTransform(*args, _trained=False, **kwargs)[source]¶

Bases: Adjust

N-dimensional probability density function transform.

This adjustment object combines both training and adjust steps in the adjust class method.

A multivariate bias-adjustment algorithm described by Cannon [2018], as part of the MBCn algorithm, based on a color-correction algorithm described by Pitie et al. [2005].

This algorithm in itself, when used with QuantileDeltaMapping, is NOT trend-preserving. The full MBCn algorithm includes a reordering step provided here by xsdba.processing.reordering().

See notes for an explanation of the algorithm.

Parameters:

base (BaseAdjustment) – An univariate bias-adjustment class. This is untested for anything else than QuantileDeltaMapping.
base_kws (dict, optional) – Arguments passed to the training of the univariate adjustment.
n_escore (int) – The number of elements to send to the escore function. The default, 0, means all elements are included. Pass -1 to skip computing the escore completely. Small numbers result in less significant scores, but the execution time goes up quickly with large values.
n_iter (int) – The number of iterations to perform. Defaults to 20.
pts_dim (str) – The name of the “multivariate” dimension. Defaults to “multivar”, which is the normal case when using xsdba.stack_variables().
adj_kws (dict, optional) – Dictionary of arguments to pass to the adjust method of the univariate adjustment.
rot_matrices (xr.DataArray, optional) – The rotation matrices as a 3D array (‘iterations’, <pts_dim>, <anything>), with shape (n_iter, <N>, <N>). If left empty, random rotation matrices will be automatically generated.

Notes

The historical reference (\(T\), for “target”), simulated historical (\(H\)) and simulated projected (\(S\)) datasets are constructed by stacking the timeseries of N variables together. The algorithm is broken into the following steps:

Rotate the datasets in the N-dimensional variable space with \(\mathbf{R}\), a random rotation NxN matrix.

\[\tilde{\mathbf{T}} = \mathbf{T}\mathbf{R} \ \tilde{\mathbf{H}} = \mathbf{H}\mathbf{R} \ \tilde{\mathbf{S}} = \mathbf{S}\mathbf{R}\]

2. A univariate bias-adjustment \(\mathcal{F}\) is used on the rotated datasets. The adjustments are made in additive mode, for each variable \(i\).

\[\hat{\mathbf{H}}_i, \hat{\mathbf{S}}_i = \mathcal{F}\left(\tilde{\mathbf{T}}_i, \tilde{\mathbf{H}}_i, \tilde{\mathbf{S}}_i\right)\]

The bias-adjusted datasets are rotated back.

\[\begin{split}\mathbf{H}' = \hat{\mathbf{H}}\mathbf{R} \\ \mathbf{S}' = \hat{\mathbf{S}}\mathbf{R}\end{split}\]

These three steps are repeated a certain number of times, prescribed by argument n_iter. At each iteration, a new random rotation matrix is generated.

The original algorithm [Pitie et al., 2005], stops the iteration when some distance score converges. Following cite:t:cannon_multivariate_2018 and the MBCn implementation in Cannon [2020], we instead fix the number of iterations.

As done by cite:t:cannon_multivariate_2018, the distance score chosen is the “Energy distance” from Szekely and Rizzo [2004]. (see: xsdba.processing.escore()).

The random matrices are generated following a method laid out by Mezzadri [2007].

This is only part of the full MBCn algorithm, see :<notebooks/example.ipynb> for an example on how to replicate the full method with xsdba. This includes a standardization of the simulated data beforehand, an initial univariate adjustment and the reordering of those adjusted series according to the rank structure of the output of this algorithm.

References

Cannon [2018], Cannon [2020], Mezzadri [2007], Pitie, Kokaram, and Dahyot [2005], Szekely and Rizzo [2004]

class xsdba.adjustment.OTC(*args, _trained=False, **kwargs)[source]¶

Bases: Adjust

Optimal Transport Correction.

Following Robin et al. [2019], this multivariate bias correction method finds the optimal transport mapping between simulated and observed data. The correction of every simulated data point is the observed point it is mapped to.

See notes for an explanation of the algorithm.

Parameters:

bin_width (dict or float, optional) – Bin widths for specified dimensions if is dict. For all dimensions if float. Will be estimated with Freedman-Diaconis rule by default.
bin_origin (dict or float, optional) – Bin origins for specified dimensions if is dict. For all dimensions if float. Default is 0.
num_iter_max (int, optional) – Maximum number of iterations used in the earth mover distance algorithm. Default is 100_000_000.
jitter_inside_bins (bool) – If False, output points are located at the center of their bin. If True, a random location is picked uniformly inside their bin. Default is True.
adapt_freq_thresh (dict or str, optional) – Threshold for frequency adaptation per variable. See xsdba.processing.adapt_freq for details. Frequency adaptation is not applied to missing variables if is dict. Applied to all variables if is string.
normalization ({'standardize', 'max_distance', 'max_value'}, optional) – Per-variable transformation applied before the distances are calculated. Default is “max_distance”. See notes for details.
group (str or Grouper) – The grouping information. See xsdba.base.Grouper for details. Default is “time”, meaning a single adjustment group along dimension “time”.
pts_dim (str) – The name of the “multivariate” dimension. Defaults to “multivar”, which is the normal case when using xsdba.base.stack_variables().

Notes

The simulated and observed data sets \(X\) and \(Y\) are discretized and standardized using histograms whose bin length along dimension v is given by bin_width[v]. An optimal transport plan \(P^*\) is found by solving the linear program

\[\begin{split}\mathop{\arg\!\min}_{P} \langle P,C\rangle \\ s.t. P\mathbf{1} = X \\ P^T\mathbf{1} = Y \\ P \geq 0\end{split}\]

where \(C_{ij}\) is the squared euclidean distance between the bin at position \(i\) of \(X\)’s histogram and the bin at position \(j\) of \(Y\)’s.

All data points belonging to input bin at position \(i\) are then separately assigned to output bin at position \(j\) with probability \(P_{ij}\). A transformation of bin positions can be applied before computing the distances \(C_{ij}\) to make variables on different scales more evenly taken into consideration by the optimization step. Available transformations are

normalization = ‘standardize’ :

\[i_v' = \frac{i_v - mean(i_v)}{std(i_v)} \quad\quad\quad j_v' = \frac{j_v - mean(j_v)}{std(j_v)}\]
normalization = ‘max_distance’ :

\[i_v' = \frac{i_v}{max \{|i_v - j_v|\}} \quad\quad\quad j_v' = \frac{j_v}{max \{|i_v - j_v|\}}\]

such that

\[max \{|i_v' - j_v'|\} = max \{|i_w' - j_w'|\} = 1\]
normalization = ‘max_value’ :

\[i_v' = \frac{i_v}{max\{i_v\}} \quad\quad\quad j_v' = \frac{j_v}{max\{j_v\}}\]

for variables \(v, w\). Default is ‘max_distance’.

Note that POT must be installed to use this method.

This implementation is strongly inspired by Robin [2021]. The differences from this implementation are :

bin_width and bin_origin are dictionaries or float
Freedman-Diaconis rule is used to find the bin width when unspecified, and fallbacks to Scott’s rule when 0 is obtained
jitter_inside_bins argument
adapt_freq_thresh argument
transform argument
group argument
pts_dim argument

References

Robin, Vrac, Naveau, and Yiou [2019], Robin [2021]

class xsdba.adjustment.PrincipalComponents(*args, _trained=False, **kwargs)[source]¶

Bases: TrainAdjust

Principal component adjustment.

This bias-correction method maps model simulation values to the observation space through principal components [Hnilica et al., 2017]. Values in the simulation space (multiple variables, or multiple sites) can be thought of as coordinate along axes, such as variable, temperature, etc. Principal components (PC) are a linear combinations of the original variables where the coefficients are the eigenvectors of the covariance matrix. Values can then be expressed as coordinates along the PC axes. The method makes the assumption that bias-corrected values have the same coordinates along the PC axes of the observations. By converting from the observation PC space to the original space, we get bias corrected values. See Notes for a mathematical explanation.

group¶

The main dimension and grouping information. See Notes. See xsdba.base.Grouper for details. The adjustment will be performed on each group independently. Default is “time”, meaning a single adjustment group along dimension “time”.

Type:: str or Grouper

best_orientation¶

Which method to use when searching for the best principal component orientation. See best_pc_orientation_simple() and best_pc_orientation_full(). “full” is more precise, but it is much slower.

Type:: {‘simple’, ‘full’}

crd_dim¶

The data dimension along which the multiple simulation space dimensions are taken. For a multivariate adjustment, this usually is “multivar”, as returned by sdba.stack_variables. For a multisite adjustment, this should be the spatial dimension. The training algorithm currently doesn’t fully support chunking. Chunking is maintained along other dimensions than crd_dim. crd_dim has to be a single chunk and will be converted if necessary. Chunking along group.dim and group.add_dims is not supported.

Type:: str

Warning

Be aware that principal components is meant here as the algebraic operation defining a coordinate system based on the eigenvectors, not statistical principal component analysis.

Notes

The input data is understood as a set of N points in a \(M\)-dimensional space.

\(M\) is taken along crd_dim.
\(N\) is taken along the dimensions given through group : (the main dim but also, if requested, the add_dims and window).

The principal components (PC) of hist and ref are used to defined new coordinate systems, centered on their respective means. The training step creates a matrix defining the transformation from hist to ref:

\[scen = e_{R} + \mathrm{\mathbf{T}}(sim - e_{H})\]

Where:

\[\mathrm{\mathbf{T}} = \mathrm{\mathbf{R}}\mathrm{\mathbf{H}}^{-1}\]

\(\mathrm{\mathbf{R}}\) is the matrix transforming from the PC coordinates computed on ref to the data coordinates. Similarly, \(\mathrm{\mathbf{H}}\) is transform from the hist PC to the data coordinates (\(\mathrm{\mathbf{H}}\) is the inverse transformation). \(e_R\) and \(e_H\) are the centroids of the ref and hist distributions respectively. Upon running the adjust step, one may decide to use \(e_S\), the centroid of the sim distribution, instead of \(e_H\).

References

Alavoine and Grenier [2022], Hnilica, Hanel, and Puš [2017]

class xsdba.adjustment.QuantileDeltaMapping(*args, _trained=False, **kwargs)[source]¶

Bases: EmpiricalQuantileMapping

Quantile Delta Mapping bias-adjustment.

Train step

nquantiles¶

The number of quantiles to use. See equally_spaced_nodes(). An array of quantiles [0, 1] can also be passed. Defaults to 20 quantiles.

Type:: int or 1d array of floats

kind¶

The adjustment kind, either additive or multiplicative. Defaults to “+”.

Type:: {‘+’, ‘*’}

group¶

The grouping information. See xsdba.base.Grouper for details. Default is “time”, meaning a single adjustment group along dimension “time”.

Type:: str or Grouper

max_tail_factor¶

Type:: float, optional

Adjust step

interp¶

The interpolation method to use when interpolating the adjustment factors. Defaults to “nearest”.

Type:: {‘nearest’, ‘linear’, ‘cubic’}

extrapolation¶

The type of extrapolation to use. Defaults to “constant”.

Type:: {‘constant’, ‘nan’}

rank_window¶

Whether to rank simulated values over the full grouping window. Effectively, the default is False, meaning values are ranked within exact groups only, e.g., a specific day of year. In xsdba>=0.8, this option will be deprecated in favour of honoring the grouping window (equivalent to True).

Type:: bool

quantiles¶

The quantile of each value of sim. The adjustment factor is interpolated using this as the “quantile” axis on ds.af. This is an extra output that requires activation with xsdba.set_options(extra_output=True).

Type:: xr.DataArray

Notes

Adjustment factors are computed between the quantiles of ref and hist. Quantiles of sim are matched to the corresponding quantiles of hist and corrected accordingly.

\[sim\frac{F^{-1}_{ref}\left[F_{sim}(sim)\right]}{F^{-1}_{hist}\left[F_{sim}(sim)\right]}\]

where \(F\) is the cumulative distribution function (CDF). This equation is valid for multiplicative adjustment. The algorithm is based on the “QDM” method of [Cannon et al., 2015].

References

Cannon, Sobie, and Murdock [2015]

class xsdba.adjustment.Scaling(*args, _trained=False, **kwargs)[source]¶

Bases: TrainAdjust

Scaling bias-adjustment.

Simple bias-adjustment method scaling variables by an additive or multiplicative factor so that the mean of hist matches the mean of ref.

Parameters:

step (Adjust)
group (str or Grouper) – The grouping information. See xsdba.base.Grouper for details. Default is “time”, meaning a single adjustment group along dimension “time”.
kind ({'+', '*'}) – The adjustment kind, either additive or multiplicative. Defaults to “+”.
step
interp ({'nearest', 'linear', 'cubic'}) – The interpolation method to use then interpolating the adjustment factors. Defaults to “nearest”.

class xsdba.adjustment.dOTC(*args, _trained=False, **kwargs)[source]¶

Bases: Adjust

Dynamical Optimal Transport Correction.

This method is the dynamical version of OTC, as presented by Robin et al. [2019]. The temporal evolution of the model is found for every point by mapping the historical to the future dataset with optimal transport. A mapping between historical and reference data is found in the same way, and the temporal evolution of model data is applied to their assigned reference.

See notes for an explanation of the algorithm.

This implementation is strongly inspired by Robin [2021].

Parameters:

bin_width (dict or float, optional) – Bin widths for specified dimensions if is dict. For all dimensions if float. Will be estimated with Freedman-Diaconis rule by default.
bin_origin (dict or float, optional) – Bin origins for specified dimensions if is dict. For all dimensions if float. Default is 0.
num_iter_max (int, optional) – Maximum number of iterations used in the network simplex algorithm.
cov_factor ({None, 'std', 'cholesky'}) – A rescaling of the temporal evolution before it is applied to the reference. Note that “cholesky” cannot be used if some variables are multiplicative. See notes for details.
jitter_inside_bins (bool) – If False, output points are located at the center of their bin. If True, a random location is picked uniformly inside their bin. Default is True.
kind (dict or str, optional) – Keys are variable names and values are adjustment kinds, either additive or multiplicative. Unspecified dimensions are treated as “+”. Applied to all variables if is string.
adapt_freq_thresh (dict or str, optional) – Threshold for frequency adaptation per variable. See xsdba.processing.adapt_freq for details. Frequency adaptation is not applied to missing variables if is dict. Applied to all variables if is string.
normalization ({'standardize', 'max_distance', 'max_value'}, optional) – Per-variable transformation applied before the distances are calculated in the optimal transport. Default is “max_distance”. See OTC for details.
group (Union[str, Grouper]) – The grouping information. See xsdba.base.Grouper for details. Default is “time”, meaning a single adjustment group along dimension “time”.
pts_dim (str) – The name of the “multivariate” dimension. Defaults to “multivar”, which is the normal case when using xsdba.base.stack_variables().

Notes

The simulated historical, simulated future and observed data sets \(X0\), \(X1\) and \(Y0\) are discretized and standardized using histograms whose bin length along dimension k is given by bin_width[k]. Mappings between \(Y0\) and \(X0\) on the one hand and between \(X0\) and \(X1\) on the other are found by optimal transport (see OTC). The latter mapping is used to compute the temporal evolution of model data. This evolution is computed additively or multiplicatively for each variable depending on its kind, and is applied to observed data with

\[\begin{split}Y1_i & := Y0_i + D \cdot v_i \;\; or \\ Y1_i & := Y0_i * D \cdot v_i\end{split}\]

where

\(v_i\) is the temporal evolution of historical simulated point \(i \in X0\) to \(j \in X1\)
\(Y0_i\) is the observed data mapped to \(i\)
\(D\) is a correction factor given by
- \(I\) if cov_factor is None
- \(diag(\frac{\sigma_{Y0}}{\sigma_{X0}})\) if cov_factor = “std”
- \(\frac{Chol(Y0)}{Chol(X0)}\) where \(Chol\) is the Cholesky decomposition if cov_factor = “cholesky”
\(Y1_i\) is the correction of the future simulated data mapped to \(i\).

Note that POT must be installed to use this method.

This implementation is strongly inspired by Robin [2021]. The differences from this reference are :

bin_width and bin_origin are dictionaries or float.
Freedman-Diaconis rule is used to find the bin width when unspecified, and fallbacks to Scott’s rule when 0 is obtained.
jitter_inside_bins argument
adapt_freq_thresh argument
transform argument
group argument
pts_dim argument
kind argument

References

Robin, Vrac, Naveau, and Yiou [2019], Robin [2021]

xsdba.base module¶

Base Classes and Developer Tools¶

class xsdba.base.Grouper(group: str, window: int = 1, add_dims: Sequence[str] | set[str] | None = None)[source]¶

Bases: Parametrizable

Grouper inherited class for parameterizable classes.

ADD_DIMS = '<ADD_DIMS>'¶

DIM = '<DIM>'¶

PROP = '<PROP>'¶

apply(func: Callable | str, da: DataArray | dict[str, DataArray] | Dataset, main_only: bool = False, **kwargs) → DataArray | Dataset[source]¶

Apply a function group-wise on DataArrays.

Parameters:

func (Callable or str) – The function to apply to the groups, either a callable or a xr.core.groupby.GroupBy method name as a string. The function will be called as func(group, dim=dims, **kwargs). See main_only for the behaviour of dims.
da (xr.DataArray or dict[str, xr.DataArray] or xr.Dataset) – The DataArray on which to apply the function. Multiple arrays can be passed through a dictionary. A dataset will be created before grouping.
main_only (bool) – Whether to call the function with the main dimension only (if True) or with all grouping dims (if False, default) (including the window and dimensions given through add_dims). The dimensions used are also written in the “group_compute_dims” attribute. If all the input arrays are missing one of the ‘add_dims’, it is silently omitted.
**kwargs – Other keyword arguments to pass to the function.

Returns:

Attributes “group”, “group_window” and “group_compute_dims” are added.

If the function did not reduce the array:

The output is sorted along the main dimension.
The output is rechunked to match the chunks on the input If multiple inputs with differing chunking were given as inputs, the chunking with the smallest number of chunks is used.

If the function reduces the array:

If there is only one group, the singleton dimension is squeezed out of the output
The output is rechunked as to have only 1 chunk along the new dimension.

Return type:

xr.DataArray or xr.Dataset

Notes

For the special case where a Dataset is returned, but only some of its variable where reduced by the grouping, xarray’s GroupBy.map will broadcast everything back to the ungrouped dimensions. To overcome this issue, function may add a “_group_apply_reshape” attribute set to True on the variables that should be reduced and these will be re-grouped by calling da.groupby(self.name).first().

static filter_add_dims(dim: list[str])[source]¶

Filter the dimensions to be reduced by removing those in add_dims.

The first dimension is never removed as it is considered the “main” dimension and not having it is an error. “window” is also kept if present.

Parameters:: dim (str or sequence of str) – Dimension(s) to reduce. Dimensions that do not conform with Grouper.DIM (add_dims) are removed.
Return type:: list of str, the filtered dimensions list

static filter_dim(da: DataArray, dim: str | list[str])[source]¶

Filter the dimensions to be reduced by removing those not on the variable.

The first dimension is never removed as it is considered the “main” dimension and not having it is an error. This is meant to be used within a function sent to Grouper.apply(), like those decorated with map_groups().

Parameters:

da (DataArray) – A DataArray from which we get the list of valid dimensions.
dim (str or sequence of str) – Dimension(s) to reduce. The first one is not removed, the others are kept only if they appear on da.

Return type:

list of str, the filtered dimensions list

property freq¶

Format a frequency string corresponding to the group.

For use with xarray’s resampling functions.

classmethod from_kwargs(**kwargs) → dict[str, Grouper][source]¶: Parameterize groups using kwargs.

get_coordinate(ds: Dataset | None = None) → DataArray[source]¶

Return the coordinate as in the output of group.apply.

Currently, only implemented for groupings with prop == month or dayofyear. For prop == dayfofyear, a ds (Dataset or DataArray) can be passed to infer the max day of year from the available years and calendar.

get_index(da: DataArray | Dataset, interp: bool | None = None) → DataArray[source]¶

Return the group index of each element along the main dimension.

Parameters:

da (xr.DataArray or xr.Dataset) – The input array/dataset for which the group index is returned. It must have Grouper.dim as a coordinate.
interp (bool, optional) – If True, the returned index can be used for interpolation. Only value for month grouping, where integer values represent the middle of the month, all other days are linearly interpolated in between.

Returns:

The index of each element along Grouper.dim. If Grouper.dim is time and Grouper.prop is None, a uniform array of True is returned. If Grouper.prop is a time accessor (month, dayofyear, etc.), a numerical array is returned, with a special case of month and interp=True. If Grouper.dim is not time, the dim is simply returned.

Return type:

xr.DataArray

group(da: DataArray | Dataset | None = None, main_only: bool = False, **das: DataArray) → GroupBy[source]¶

Return a xr.core.groupby.GroupBy object.

More than one array can be combined to a dataset before grouping using the das kwargs. A new window dimension is added if self.window is larger than 1. If Grouper.dim is ‘time’, but ‘prop’ is None, the whole array is grouped together.

When multiple arrays are passed, some of them can be grouped along the same group as self. They are broadcast, merged to the grouping dataset and regrouped in the output.

property prop_name¶: Create a significant name for the grouping.

class xsdba.base.Parametrizable(dict=None, /, **kwargs)[source]¶

Bases: UserDict

Helper base class resembling a dictionary.

This object is _completely_ defined by the content of its internal dictionary, accessible through item access (self[‘attr’]) or in self.parameters. When serializing and restoring this object, only members of that internal dict are preserved. All other attributes set directly with self.attr = value will not be preserved upon serialization and restoration of the object with [json]pickle dictionary. Other variables set with self.var = data will be lost in the serialization process. This class is best serialized and restored with jsonpickle.

property parameters: dict¶: All parameters as a dictionary. Read-only.

class xsdba.base.ParametrizableWithDataset(dict=None, /, **kwargs)[source]¶

Bases: Parametrizable

Parametrizable class that also has a ds attribute storing a dataset.

classmethod from_dataset(ds: Dataset)[source]¶

Create an instance from a dataset.

The dataset must have a global attribute with a name corresponding to cls._attribute, and that attribute must be the result of jsonpickle.encode(object) where object is of the same type as this object.

set_dataset(ds: Dataset) → None[source]¶

Store an xarray dataset in the ds attribute.

Useful with custom object initialization or if some external processing was performed.

xsdba.base.compare_offsets(freqA: str, op: str, freqB: str) → bool[source]¶

Compare offsets string based on their approximate length, according to a given operator.

Offset are compared based on their length approximated for a period starting after 1970-01-01 00:00:00. If the offsets are from the same category (same first letter), only the multiplier prefix is compared (QS-DEC == QS-JAN, MS < 2MS). “Business” offsets are not implemented.

Parameters:

freqA (str) – RHS Date offset string (‘YS’, ‘1D’, ‘QS-DEC’, …).
op ({'<', '<=', '==', '>', '>=', '!='}) – Operator to use.
freqB (str) – LHS Date offset string (‘YS’, ‘1D’, ‘QS-DEC’, …).

Returns:

Return freqA op freqB.

Return type:

bool

xsdba.base.construct_offset(mult: int, base: str, start_anchored: bool, anchor: str | None)[source]¶

Reconstruct an offset string from its parts.

Parameters:

mult (int) – The period multiplier (>= 1).
base (str) – The base period string (one char).
start_anchored (bool) – If True and base in [Y, Q, M], adds the “S” flag, False add “E”.
anchor (str, optional) – The month anchor of the offset. Defaults to JAN for bases YS and QS and to DEC for bases YE and QE.

Returns:

An offset string, conformant to pandas-like naming conventions.

Return type:

str

Notes

This provides the mirror opposite functionality of parse_offset().

xsdba.base.duck_empty(dims: <property object at 0x7bf0105f6b60>, sizes, dtype='float64', chunks=None) → DataArray[source]¶: Return an empty DataArray based on a numpy or dask backend, depending on the “chunks” argument.

xsdba.base.ensure_chunk_size(da: DataArray, **minchunks: int) → DataArray[source]¶

Ensure that the input DataArray has chunks of at least the given size.

If only one chunk is too small, it is merged with an adjacent chunk. If many chunks are too small, they are grouped together by merging adjacent chunks.

Parameters:

da (xr.DataArray) – The input DataArray, with or without the dask backend. Does nothing when passed a non-dask array.
**minchunks (dict[str, int]) – A kwarg mapping from dimension name to minimum chunk size. Pass -1 to force a single chunk along that dimension.

Return type:

xr.DataArray

xsdba.base.get_op(op: str, constrain: Sequence[str] | None = None) → Callable[source]¶

Get python’s comparing function according to its name of representation and validate allowed usage.

Parameters:

op (str) – Operator.
constrain (sequence of str, optional) – A tuple of allowed operators.

xsdba.base.map_blocks(reduces: Sequence[str] | None = None, **out_vars) → Callable[source]¶

Decorator for declaring functions and wrapping them into a map_blocks.

Takes care of constructing the template dataset. Dimension order is not preserved. The decorated function must always have the signature: func(ds, **kwargs), where ds is a DataArray or a Dataset. It must always output a dataset matching the mapping passed to the decorator.

Parameters:

reduces (sequence of strings) – Name of the dimensions that are removed by the function.
**out_vars – Mapping from variable names in the output to their new dimensions. The placeholders Grouper.PROP, Grouper.DIM and Grouper.ADD_DIMS can be used to signify group.prop,``group.dim`` and group.add_dims respectively. If an output keeps a dimension that another loses, that dimension name must be given in reduces and in the list of new dimensions of the first output.

xsdba.base.map_groups(reduces: Sequence[str] | None = None, main_only: bool = False, **out_vars) → Callable[source]¶

Decorator for declaring functions acting only on groups and wrapping them into a map_blocks.

This is the same as map_blocks but adds a call to group.apply() in the mapped func and the default value of reduces is changed.

The decorated function must have the signature: func(ds, dim, **kwargs). Where ds is a DataAray or Dataset, dim is the group.dim (and add_dims). The group argument is stripped from the kwargs, but must evidently be provided in the call.

Parameters:

reduces (sequence of str, optional) – Dimensions that are removed from the inputs by the function. Defaults to [Grouper.DIM, Grouper.ADD_DIMS] if main_only is False, and [Grouper.DIM] if main_only is True. See map_blocks().
main_only (bool) – Same as for Grouper.apply().
**out_vars – Mapping from variable names in the output to their new dimensions. The placeholders Grouper.PROP, Grouper.DIM and Grouper.ADD_DIMS can be used to signify group.prop,``group.dim`` and group.add_dims, respectively. If an output keeps a dimension that another loses, that dimension name must be given in reduces and in the list of new dimensions of the first output.

See also

map_blocks

xsdba.base.parse_group(func: Callable, kwargs=None, allow_only=None) → Callable[source]¶

Parse the kwargs given to a function to set the group arg with a Grouper object.

This function can be used as a decorator, in which case the parsing and updating of the kwargs is done at call time. It can also be called with a function from which extract the default group and kwargs to update, in which case it returns the updated kwargs.

If allow_only is given, an exception is raised when the parsed group is not within that list.

xsdba.base.parse_offset(freq: str) → tuple[int, str, bool, str | None][source]¶

Parse an offset string.

Parse a frequency offset and, if needed, convert to cftime-compatible components.

Parameters:

freq (str) – Frequency offset.

Returns:

multiplier (int) – Multiplier of the base frequency. “[n]W” is always replaced with “[7n]D”, as xarray doesn’t support “W” for cftime indexes.
offset_base (str) – Base frequency.
is_start_anchored (bool) – Whether coordinates of this frequency should correspond to the beginning of the period (True) or its end (False). Can only be False when base is Y, Q or M; in other words, xsdba assumes frequencies finer than monthly are all start-anchored.
anchor (str, optional) – Anchor date for bases Y or Q. As xarray doesn’t support “W”, neither does xsdba (anchor information is lost when given).

xsdba.base.stack_periods(da: Dataset | DataArray, window: int = 30, stride: int | None = None, min_length: int | None = None, freq: str = 'YS', dim: str = 'period', start: str = '1970-01-01', align_days: bool = True, pad_value=<NA>)[source]¶

Construct a multi-period array.

Stack different equal-length periods of da into a new ‘period’ dimension.

This is similar to da.rolling(time=window).construct(dim, stride=stride), but adapted for arguments in terms of a base temporal frequency that might be non-uniform (years, months, etc.). It is reversible for some cases (see stride). A rolling-construct method will be much more performant for uniform periods (days, weeks).

Parameters:

da (xr.Dataset or xr.DataArray) – An xarray object with a time dimension. Must have a uniform timestep length. Output might be strange if this does not use a uniform calendar (noleap, 360_day, all_leap).
window (int) – The length of the moving window as a multiple of freq.
stride (int, optional) – At which interval to take the windows, as a multiple of freq. For the operation to be reversible with unstack_periods(), it must divide window into an odd number of parts. Default is window (no overlap between periods).
min_length (int, optional) – Windows shorter than this are not included in the output. Given as a multiple of freq. Default is window (every window must be complete). Similar to the min_periods argument of da.rolling. If freq is annual or quarterly and min_length == ``window, the first period is considered complete if the first timestep is in the first month of the period.
freq (str) – Units of window, stride and min_length, as a frequency string. Must be larger or equal to the data’s sampling frequency. Note that this function offers an easier interface for non-uniform period (like years or months) but is much slower than a rolling-construct method.
dim (str) – The new dimension name.
start (str) – The start argument passed to xarray.date_range() to generate the new placeholder time coordinate.
align_days (bool) – When True (default), an error is raised if the output would have unaligned days across periods. If freq = ‘YS’, day-of-year alignment is checked and if freq is “MS” or “QS”, we check day-in-month. Only uniform-calendar will pass the test for freq=’YS’. For other frequencies, only the 360_day calendar will work. This check is ignored if the sampling rate of the data is coarser than “D”.
pad_value (Any) – When some periods are shorter than others, this value is used to pad them at the end. Passed directly as argument fill_value to xarray.concat(), the default is the same as on that function.

Returns:

A DataArray with a new period dimension and a time dimension with the length of the longest window. The new time coordinate has the same frequency as the input data but is generated using xarray.date_range() with the given start value. That coordinate is the same for all periods, depending on the choice of window and freq, it might make sense. But for unequal periods or non-uniform calendars, it will certainly not. If stride is a divisor of window, the correct timeseries can be reconstructed with unstack_periods(). The coordinate of period is the first timestep of each window.

Return type:

xr.DataArray

xsdba.base.unstack_periods(da: DataArray | Dataset, dim: str = 'period')[source]¶

Unstack an array constructed with stack_periods().

Can only work with periods stacked with a stride that divides window in an odd number of sections. When stride is smaller than window, only the center-most stride of each window is kept, except for the beginning and end which are taken from the first and last windows.

Parameters:

da (xr.DataArray) – As constructed by stack_periods(), attributes of the period coordinates must have been preserved.
dim (str) – The period dimension name.

Notes

The following table shows which strides are included (o) in the unstacked output.

In this example, stride was a fifth of window and min_length was four (4) times stride. The row index i the period index in the stacked dataset, columns are the stride-long section of the original timeseries.

Unstacking example with `stride < window`.¶
i	0	1	2	3	4	5	6
3				x	x	o	o
2			x	x	o	x	x
1		x	x	o	x	x
0	o	o	o	x	x

xsdba.base.uses_dask(*das: DataArray | Dataset) → bool[source]¶

Evaluate whether dask is installed and array is loaded as a dask array.

Parameters:: *das (xr.DataArray or xr.Dataset) – DataArrays or Datasets to check.
Returns:: True if any of the passed objects is using dask.
Return type:: bool

xsdba.cli module¶

Console script for xsdba.

xsdba.cli.main() → None[source]¶: Console script for xsdba.

xsdba.detrending module¶

Detrending Objects Utilities¶

class xsdba.detrending.BaseDetrend(*, group: Grouper | str = 'time', kind: str = '+', mult_skip_zeros: bool = False, **kwargs)[source]¶

Bases: ParametrizableWithDataset

Base class for detrending objects.

Defines three methods:

fit(da) : Compute trend from da and return a new _fitted_ Detrend object. detrend(da) : Return detrended array. retrend(da) : Puts trend back on da.

A fitted Detrend object is unique to the trend coordinate of the object used in fit, (usually ‘time’). The computed trend is stored in Detrend.ds.trend.

Subclasses should implement _get_trend_group() or _get_trend(). The first will be called in a group.apply(..., main_only=True), and should return a single DataArray. The second allows the use of functions wrapped in map_groups() and should also return a single DataArray.

The subclasses may reimplement _detrend and _retrend.

detrend(da: DataArray)[source]¶: Remove the previously fitted trend from a DataArray.

fit(da: DataArray)[source]¶

Extract the trend of a DataArray along a specific dimension.

Returns a new object that can be used for detrending and retrending. Fitted objects are unique to the fitted coordinate used.

property fitted¶: Return whether instance is fitted.

retrend(da: DataArray)[source]¶: Put the previously fitted trend back on a DataArray.

class xsdba.detrending.LoessDetrend(group='time', kind='+', f=0.2, niter=1, d=0, weights='tricube', equal_spacing=None, skipna=True, mult_skip_zeros: bool = False)[source]¶

Bases: BaseDetrend

Detrend time series using a LOESS regression.

The fit is a piecewise linear regression. For each point, the contribution of all neighbors is weighted by a bell-shaped curve (gaussian) with parameters sigma (std). The x-coordinate of the DataArray is scaled to [0,1] before the regression is computed.

group¶

The grouping information. See xsdba.base.Grouper for details. The fit is performed along the group’s main dim.

Type:: str or Grouper

kind¶

The way the trend is removed or added, either additive or multiplicative.

Type:: {‘*’, ‘+’}

d¶

Order of the local regression. Only 0 and 1 currently implemented.

Type:: {0, 1}

f¶

Parameter controlling the span of the weights, between 0 and 1.

Type:: float

niter¶

Number of robustness iterations to execute.

Type:: int

weights¶

Shape of the weighting function: “tricube” : a smooth top-hat like curve, f gives the span of non-zero values. “gaussian” : a gaussian curve, f gives the span for 95% of the values.

Type:: [“tricube”, “gaussian”]

skipna¶

If True (default), missing values are not included in the loess trend computation and thus are not propagated. The output will have the same missing values as the input.

Type:: bool

mult_skip_zeros¶

If True and kind is “*”, zero values in the trend are not used for detrending, and the original value is kept instead.

Type:: bool

Notes

LOESS smoothing is computationally expensive. As it relies on a loop on gridpoints, it can be useful to use smaller than usual chunks. Moreover, it suffers from heavy boundary effects. As a rule of thumb, the outermost N * f/2 points should be considered dubious. (N is the number of points along each group)

class xsdba.detrending.MeanDetrend(*, group: Grouper | str = 'time', kind: str = '+', mult_skip_zeros: bool = False, **kwargs)[source]¶

Bases: BaseDetrend

Simple detrending removing only the mean from the data, quite similar to normalizing.

class xsdba.detrending.NoDetrend(*, group: Grouper | str = 'time', kind: str = '+', mult_skip_zeros: bool = False, **kwargs)[source]¶

Bases: BaseDetrend

Convenience class for polymorphism. Does nothing.

class xsdba.detrending.PolyDetrend(group='time', kind='+', degree=4, preserve_mean=False, mult_skip_zeros=False)[source]¶

Bases: BaseDetrend

Detrend time series using a polynomial regression.

group¶

The grouping information. See xsdba.base.Grouper for details. The fit is performed along the group’s main dim.

Type:: Union[str, Grouper]

kind¶

The way the trend is removed or added, either additive or multiplicative.

Type:: {‘*’, ‘+’}

degree¶

The order of the polynomial to fit.

Type:: int

preserve_mean¶

Whether to preserve the mean when de/re-trending. If True, the trend has its mean removed before it is used.

Type:: bool

mult_skip_zeros¶

If True and kind is “*”, zero values in the trend are not used for detrending, and the original value is kept instead.

Type:: bool

class xsdba.detrending.RollingMeanDetrend(group='time', kind='+', win=30, weights=None, min_periods=None, mult_skip_zeros=False)[source]¶

Bases: BaseDetrend

Detrend time series using a rolling mean.

group¶

The grouping information. See xsdba.base.Grouper for details. The fit is performed along the group’s main dim.

Type:: str or Grouper

kind¶

The way the trend is removed or added, either additive or multiplicative.

Type:: {‘*’, ‘+’}

win¶

The size of the rolling window. Units are the steps of the grouped data, which means this detrending is best use with either group=’time’ or group=’time.dayofyear’. Other grouping will have large jumps included within the windows and :py`:class:LoessDetrend might offer a better solution.

Type:: int

weights¶

Sequence of length win. Defaults to None, which means a flat window.

Type:: sequence of floats, optional

min_periods¶

Minimum number of observations in window required to have a value, otherwise the result is NaN. See xarray.DataArray.rolling(). Defaults to None, which sets it equal to win. Setting both weights and this is not implemented yet.

Type:: int, optional

mult_skip_zeros¶

If True and kind is “*”, zero values in the trend are not used for detrending, and the original value is kept instead.

Type:: bool

Notes

As for the LoessDetrend detrending, important boundary effects are to be expected.

xsdba.formatting module¶

Formatting Utilities¶

xsdba.formatting.gen_call_string(funcname: str, *args: Any, **kwargs: Any) → str[source]¶

Generate a signature string for use in the history attribute.

DataArrays and Dataset are replaced with their name, while Nones, floats, ints and strings are printed directly. All other objects have their type printed between < >.

Arguments given through positional arguments are printed positionnally and those given through keywords are printed prefixed by their name.

Parameters:: funcname (str) – Name of the function.

Examples

>>> A = xr.DataArray([1], dims=("x",), name="A")
>>> gen_call_string("func", A, b=2.0, c="3", d=[10] * 100)
"func(A, b=2.0, c='3', d=<list>)"

xsdba.formatting.merge_attributes(attribute: str, *inputs_list: Sequence[DataArray | Dataset], new_line: str = '\n', missing_str: str | None = None, **inputs_kws: dict[str, DataArray | Dataset]) → str[source]¶

Merge attributes from several DataArrays or Datasets.

If more than one input is given, its name (if available) is prepended as: “<input name> : <input attribute>”.

Parameters:

attribute (str) – The attribute to merge.
*inputs_list (xr.DataArray or xr.Dataset) – The datasets or variables that were used to produce the new object. Inputs given that way will be prefixed by their “name” attribute if available.
new_line (str) – The character to put between each instance of the attributes. Usually, in CF-conventions, the history attributes uses ‘\n’ while cell_methods uses ‘ ‘.
missing_str (str) – A string that is printed if an input doesn’t have the attribute. Defaults to None, in which case the input is simply skipped.
**inputs_kws (xr.DataArray or xr.Dataset) – Mapping from names to the datasets or variables that were used to produce the new object. Inputs given that way will be prefixes by the passed name.

Returns:

The new attribute made from the combination of the ones from all the inputs.

Return type:

str

xsdba.formatting.update_history(hist_str: str, *inputs_list: Sequence[DataArray | Dataset], new_name: str | None = None, **inputs_kws: dict[str, DataArray | Dataset]) → str[source]¶

Return a history string with the timestamped message and the combination of the history of all inputs.

The new history entry is formatted as “[<timestamp>] <new_name>: <hist_str> - xsdba version: <xsdba.__version__>.”

Parameters:

hist_str (str) – The string describing what has been done on the data.
*inputs_list (xr.DataArray or xr.Dataset) – The datasets or variables that were used to produce the new object. Inputs given that way will be prefixed by their “name” attribute if available.
new_name (str, optional) – The name of the newly created variable or dataset to prefix hist_msg.
**inputs_kws (dict of xr.DataArray or xr.Dataset) – Mapping from names to the datasets or variables that were used to produce the new object. Inputs given that way will be prefixes by the passed name.

Returns:

The combine history of all inputs starting with hist_str.

Return type:

str

See also

merge_attributes

xsdba.formatting.update_xsdba_history(func: Callable)[source]¶

Decorator that auto-generates and fills the history attribute.

The history is generated from the signature of the function and added to the first output. Because of a limitation of the boltons wrapper, all arguments passed to the wrapped function will be printed as keyword arguments.

xsdba.loess module¶

LOESS Smoothing Submodule¶

xsdba.loess.loess_smoothing(da: DataArray, dim: str = 'time', d: Literal[0, 1] = 1, f: float = 0.5, niter: int = 2, weights: Literal['tricube', 'gaussian'] | Callable = 'tricube', equal_spacing: bool | None = None, skipna: bool = True)[source]¶

Locally weighted regression in 1D: fits a nonparametric regression curve to a scatter plot.

Returns a smoothed curve along given dimension. The regression is computed for each point using a subset of neighbouring points as given from evaluating the weighting function locally. Follows the procedure of Cleveland [1979].

Parameters:

da (xr.DataArray) – The data to smooth using the loess approach.
dim (str) – Name of the dimension along which to perform the loess.
d ([0, 1]) – Degree of the local regression.
f (float) – Parameter controlling the shape of the weight curve. Behavior depends on the weighting function, but it usually represents the span of the weighting function in reference to x-coordinates normalized from 0 to 1.
niter (int) – Number of robustness iterations to execute.
weights (["tricube", "gaussian"] or Callable) – Shape of the weighting function, see notes. The user can provide a function or a string: “tricube” : a smooth top-hat like curve. “gaussian” : a gaussian curve, f gives the span for 95% of the values.
equal_spacing (bool, optional) – Whether to use the equal spacing optimization. If None (the default), it is activated only if the x-axis is equally-spaced. When activated, dx = x[1] - x[0].
skipna (bool) – If True (default), skip missing values (as marked by NaN). The output will have the same missing values as the input.

Notes

As stated in Cleveland [1979], the weighting function \(W(x)\) should respect the following conditions:

\(W(x) > 0\) for \(|x| < 1\)
\(W(-x) = W(x)\)
\(W(x)\) is non-increasing for \(x \ge 0\)
\(W(x) = 0\) for \(|x| \ge 0\)

If a Callable is provided, it should only accept the 1D np.ndarray \(x\) which is an absolute value function going from 1 to 0 to 1 around \(x_i\), for all values where \(x - x_i < h_i\) with \(h_i\) the distance of the rth nearest neighbor of \(x_i\), \(r = f * size(x)\).

References

Cleveland [1979]

Code adapted from: Gramfort [2015]

xsdba.measures module¶

Measures Submodule¶

Measures compare adjusted simulations to a reference

SDBA diagnostic tests are made up of properties and measures. Measures compare adjusted simulations to a reference, through statistical properties or directly. This framework for the diagnostic tests was inspired by the VALUE project.

This module depends on xclim. Run pip install xsdba[‘extras’] to install it.

class xsdba.measures.StatisticalMeasure(**kwds)[source]¶

Bases: Indicator

Base indicator class for statistical measures used when validating bias-adjusted outputs.

Statistical measures use input data where the time dimension was reduced, usually by the computation of a xsdba.properties.StatisticalProperty instance. They usually take two arrays as input: “sim” and “ref”, “sim” being measured against “ref”. The two arrays must have identical coordinates on their common dimensions.

Statistical measures are generally unit-generic. If the inputs have different units, “sim” is converted to match “ref”.

realm = 'generic'¶

class xsdba.measures.StatisticalPropertyMeasure(**kwds)[source]¶

Bases: Indicator

Base indicator class for statistical properties that include the comparison measure, used when validating bias-adjusted outputs.

StatisticalPropertyMeasure objects combine the functionalities of xsdba.properties.StatisticalProperty and xsdba.properties.StatisticalMeasure.

Statistical properties usually reduce the time dimension and sometimes more dimensions (for example in spatial properties), sometimes adding a grouping dimension according to the passed value of group (e.g.: group=’time.month’ means the loss of the time dimension and the addition of a month one).

Statistical measures usually take two arrays as input: “sim” and “ref”, “sim” being measured against “ref”.

Statistical property-measures are generally unit-generic. If the inputs have different units, “sim” is converted to match “ref”.

allowed_groups = None¶: A list of allowed groupings. A subset of dayofyear, week, month, season or group. The latter stands for no temporal grouping.

aspect = None¶

marginal, temporal, multivariate or spatial.

Type:: The aspect the statistical property studies

realm = 'generic'¶

xsdba.nbutils module¶

Numba-accelerated Utilities¶

xsdba.nbutils.quantile(da: DataArray, q: ndarray, dim: str | Sequence[Hashable]) → DataArray[source]¶

Compute the quantiles from a fixed list q.

Parameters:

da (xarray.DataArray) – The data to compute the quantiles on.
q (array-like) – The quantiles to compute.
dim (str or sequence of str) – The dimension along which to compute the quantiles.

Returns:

The quantiles computed along the dim dimension.

Return type:

xarray.DataArray

xsdba.nbutils.remove_NaNs(x)[source]¶: Remove NaN values from series.

xsdba.nbutils.vecquantiles(da: DataArray, rnk: DataArray, dim: str | Sequence[Hashable]) → DataArray[source]¶

For when the quantile (rnk) is different for each point.

da and rnk must share all dimensions but dim.

Parameters:

da (xarray.DataArray) – The data to compute the quantiles on.
rnk (xarray.DataArray) – The quantiles to compute.
dim (str or sequence of str) – The dimension along which to compute the quantiles.

Returns:

The quantiles computed along the dim dimension.

Return type:

xarray.DataArray

xsdba.options module¶

Options Management for xsdba¶

Global or contextual options for xsdba, similar to xarray.set_options.

class xsdba.options.set_options(**kwargs)[source]¶

Bases: object

Set options for xsdba in a controlled context.

extra_output¶

Whether to add diagnostic variables to outputs of sdba’s train, adjust and processing operations. Details about these additional variables are given in the object’s docstring. When activated, adjust will return a Dataset with scen and those extra diagnostics For processing functions, see the doc, the output type might change, or not depending on the algorithm. Default: False.

Type:: bool

Examples

You can use set_options either as a context manager:

>>> import xclim
>>> ds = xr.open_dataset(path_to_tas_file).tas
>>> with xsdba.set_options(extra_output=True):
...     out = xsdba.MBCn.train(ref, hist)

Or to set global options:

import xsdba

xsdba.set_options(extra_output=True)

xsdba.processing module¶

Pre- and Post-Processing Submodule¶

xsdba.processing.adapt_freq(ref: DataArray, sim: DataArray, *, group: Grouper | str, thresh: str = '0 mm d-1') → tuple[DataArray, DataArray, DataArray][source]¶

Adapt frequency of values under thresh of sim, in order to match ref.

This is useful when the dry-day frequency in the simulations is higher than in the references. This function will create new non-null values for sim/hist, so that adjustment factors are less wet-biased. Based on Themeßl et al. [2012].

Parameters:

ref (xr.Dataset) – Target/reference data, usually observed data, with a “time” dimension.
sim (xr.Dataset) – Simulated data, with a “time” dimension.
group (str or Grouper) – Grouping information, see base.Grouper.
thresh (str) – Threshold below which values are considered zero, a quantity with units.

Returns:

sim_adj (xr.DataArray) – Simulated data with the same frequency of values under threshold than ref. Adjustment is made group-wise.
pth (xr.DataArray) – For each group, the smallest value of sim that was not frequency-adjusted. All values smaller were either left as zero values or given a random value between thresh and pth. NaN where frequency adaptation wasn’t needed.
dP0 (xr.DataArray) – For each group, the percentage of values that were corrected in sim.

Notes

With \(P_0^r\) the frequency of values under threshold \(T_0\) in the reference (ref) and \(P_0^s\) the same for the simulated values, \(\\Delta P_0 = \\frac{P_0^s - P_0^r}{P_0^s}\), when positive, represents the proportion of values under \(T_0\) that need to be corrected.

The correction replaces a proportion \(\\Delta P_0\) of the values under \(T_0\) in sim by a uniform random number between \(T_0\) and \(P_{th}\), where \(P_{th} = F_{ref}^{-1}( F_{sim}( T_0 ) )\) and F(x) is the empirical cumulative distribution function (CDF).

References

Themeßl, Gobiet, and Heinrich [2012]

xsdba.processing.escore(tgt: DataArray, sim: DataArray, dims: Sequence[str] = ('variables', 'time'), N: int = 0, scale: bool = False) → DataArray[source]¶

Energy score, or energy dissimilarity metric, based on Szekely and Rizzo [2004] and Cannon [2018].

Parameters:

tgt (xr.DataArray) – Target observations.
sim (xr.DataArray) – Candidate observations. Must have the same dimensions as tgt.
dims (sequence of 2 strings) – The name of the dimensions along which the variables and observation points are listed. tgt and sim can have different length along the second one, but must be equal along the first one. The result will keep all other dimensions.
N (int) – If larger than 0, the number of observations to use in the score computation. The points are taken evenly distributed along obs_dim.
scale (bool) – Whether to scale the data before computing the score. If True, both arrays as scaled according to the mean and standard deviation of tgt along obs_dim. (std computed with ddof=1 and both statistics excluding NaN values).

Returns:

Return e-score with dimensions not in dims.

Return type:

xr.DataArray

Notes

Explanation adapted from the “energy” R package documentation. The e-distance between two clusters \(C_i\), \(C_j\) (tgt and sim) of size \(n_i,n_j\) proposed by Szekely and Rizzo [2004] is defined by:

\[e(C_i,C_j) = \frac{1}{2}\frac{n_i n_j}{n_i + n_j} \left[2 M_{ij} − M_{ii} − M_{jj}\right]\]

where

\[M_{ij} = \frac{1}{n_i n_j} \sum_{p = 1}^{n_i} \sum_{q = 1}^{n_j} \left\Vert X_{ip} − X{jq} \right\Vert.\]

\(\Vert\cdot\Vert\) denotes Euclidean norm, \(X_{ip}\) denotes the p-th observation in the i-th cluster.

The input scaling and the factor \(\frac{1}{2}\) in the first equation are additions of Cannon [2018] to the metric. With that factor, the test becomes identical to the one defined by Baringhaus and Franz [2004]. This version is tested against values taken from Alex Cannon’s MBC R package [Cannon, 2020].

References

Baringhaus and Franz [2004], Cannon [2018], Cannon [2020], Szekely and Rizzo [2004].

xsdba.processing.from_additive_space(data: DataArray, lower_bound: str | None = None, upper_bound: str | None = None, trans: str | None = None, units: str | None = None)[source]¶

Transform back to the physical space a variable that was transformed with to_additive_space.

Based on Alavoine and Grenier [2022]. If parameters are not present on the attributes of the data, they must be all given are arguments.

Parameters:

data (xr.DataArray) – A variable that was transformed by to_additive_space().
lower_bound (str, optional) – The smallest physical value of the variable, as a Quantity string. The final data will have no value smaller or equal to this bound. If None (default), the xsdba_transform_lower attribute is looked up on data.
upper_bound (str, optional) – The largest physical value of the variable, as a Quantity string. Only relevant for the logit transformation. The final data will have no value larger or equal to this bound. If None (default), the xsdba_transform_upper attribute is looked up on data.
trans ({'log', 'logit'}, optional) – The transformation to use. See notes. If None (the default), the xsdba_transform attribute is looked up on data.
units (str, optional) – The units of the data before transformation to the additive space. If None (the default), the xsdba_transform_units attribute is looked up on data.

Returns:

The physical variable. Attributes are conserved, even if some might be incorrect. Except units which are taken from xsdba_transform_units if available. All xsdba_transform* attributes are deleted.

Return type:

xr.DataArray

See also

to_additive_space: For the original transformation.

Notes

Given a variable that is not usable in an additive adjustment, to_additive_space() applied a transformation to a space where additive methods are sensible. Given \(Y\) the transformed variable, \(b_-\) the lower physical bound of that variable and \(b_+\) the upper physical bound, two back-transformations are currently implemented to get \(X\), the physical variable.

log

\[X = e^{Y} + b_-\]
logit

\[X' = \frac{1}{1 + e^{-Y}} X = X * (b_+ - b_-) + b_-\]

References

Alavoine and Grenier [2022].

xsdba.processing.grouped_time_indexes(times, group)[source]¶

Time indexes for every group blocks.

Time indexes can be used to implement a pseudo-“numpy.groupies” approach to grouping.

Parameters:

times (xr.DataArray) – Time dimension in the dataset of interest.
group (str or Grouper) – Grouping information, see base.Grouper.

Returns:

g_idxs (xr.DataArray) – Time indexes of the blocks (only using group.name and not group.window).
gw_idxs (xr.DataArray) – Time indexes of the blocks (with rolling window if any).

xsdba.processing.jitter(x: xr.DataArray, lower: str | None = None, upper: str | None = None, minimum: str | None = None, maximum: str | None = None) → DataArray[source]¶

Replace values under a threshold and values above another by a uniform random noise.

Parameters:

x (xr.DataArray) – Values.
lower (str, optional) – Threshold under which to add uniform random noise to values, a quantity with units. If None, no jittering is performed on the lower end.
upper (str, optional) – Threshold over which to add uniform random noise to values, a quantity with units. If None, no jittering is performed on the upper end.
minimum (str, optional) – Lower limit (excluded) for the lower end random noise, a quantity with units. If None but lower is not None, 0 is used.
maximum (str, optional) – Upper limit (excluded) for the upper end random noise, a quantity with units. If upper is not None, it must be given.

Returns:

Same as x but values < lower are replaced by a uniform noise in range (minimum, lower) and values >= upper are replaced by a uniform noise in range [upper, maximum). The two noise distributions are independent.

Return type:

xr.DataArray

Warning

Not to be confused with R’s jitter, which adds uniform noise instead of replacing values.

xsdba.processing.jitter_over_thresh(x: DataArray, thresh: str, upper_bnd: str) → DataArray[source]¶

Replace values greater than threshold by a uniform random noise.

Parameters:

x (xr.DataArray) – Values.
thresh (str) – Threshold over which to add uniform random noise to values, a quantity with units.
upper_bnd (str) – Maximum possible value for the random noise, a quantity with units.

Return type:

xr.DataArray.

Warning

Not to be confused with R’s jitter, which adds uniform noise instead of replacing values.

Notes

If thresh is low, this will change the mean value of x.

xsdba.processing.jitter_under_thresh(x: DataArray, thresh: str) → DataArray[source]¶

Replace values smaller than threshold by a uniform random noise.

Parameters:

x (xr.DataArray) – Values.
thresh (str) – Threshold under which to add uniform random noise to values, a quantity with units.

Return type:

xr.DataArray.

Warning

Not to be confused with R’s jitter, which adds uniform noise instead of replacing values.

Notes

If thresh is high, this will change the mean value of x.

xsdba.processing.normalize(data: xr.DataArray, norm: xr.DataArray | None = None, *, group: Grouper | str, kind: str = '+') → tuple[DataArray, DataArray][source]¶

Normalize an array by removing its mean.

Normalization if performed group-wise and according to kind.

Parameters:

data (xr.DataArray) – The variable to normalize.
norm (xr.DataArray, optional) – If present, it is used instead of computing the norm again.
group (str or Grouper) – Grouping information. See xsdba.base.Grouper for details..
kind ({'+', '*'}) – If kind is “+”, the mean is subtracted from the mean and if it is ‘*’, it is divided from the data.

Returns:

xr.DataArray – Groupwise anomaly.
norm (xr.DataArray) – Mean over each group.

xsdba.processing.reordering(ref: DataArray, sim: DataArray, group: str = 'time') → Dataset[source]¶

Reorder data in sim following the order of ref.

The rank structure of ref is used to reorder the elements of sim along dimension “time”, optionally doing the operation group-wise.

Parameters:

ref (xr.DataArray) – Array whose rank order sim should replicate.
sim (xr.DataArray) – Array to reorder.
group (str) – Grouping information. See xsdba.base.Grouper for details.

Returns:

Sim reordered according to ref’s rank order.

Return type:

xr.Dataset

References

Cannon [2018].

xsdba.processing.spectral_filter(da: ~xarray.DataArray, dims: list[str], lam_long: str | None, lam_short: str | None, delta: str | None, alpha_low_high: tuple[float, float] | None = None, mask_func: ~collections.abc.Callable = <function cos2_mask_func>)[source]¶

Filter coefficients of a Discrete Cosine Fourier transform between given thresholds and invert back to real space.

Parameters:

da (xr.DataArray) – Input physical field.
dims (list of str) – Dimensions on which to perform the spectral filter.
lam_long (str, optional) – Long wavelength threshold.
lam_short (str, optional) – Short wavelength threshold.
delta (str, Optional) – Nominal resolution of the grid. A string with units, e.g. delta==”55.5 km”. This converts the normalized wavenumber alpha to the unitful quantity wavelength through 2 delta / alpha. This is necessary if cutoff bounds are given as wavelengths (lam_long and lam_short).
alpha_low_high (tuple[float,float] | optional) – Low and high frequencies threshold (Long and short wavelength) for the radial normalized wavenumber (alpha). It should be numbers between 0 and 1.
mask_func (Callable) – Function used to create the mask. Default is cos2_mask_func, which applies a cosine squared filter to Fourier coefficients in momentum space.

Returns:

Filtered physical field.

Return type:

xr.DataArray

Notes

Either lam_long, lam_short and delta must be given, or just alpha_low_high.
If delta is specified, the normalized wavenumber alpha will be converted to a wavelength.
If the input field contains any nan, the output will be all nan values.

References

Denis, Côté, and Laprise [2002]

xsdba.processing.stack_variables(ds: Dataset, rechunk: bool = True, dim: str = 'multivar')[source]¶

Stack different variables of a dataset into a single DataArray with a new “variables” dimension.

Variable attributes are all added as lists of attributes to the new coordinate, prefixed with “_”. Variables are concatenated in the new dimension in alphabetical order, to ensure coherent behaviour with different datasets.

Parameters:

ds (xr.Dataset) – Input dataset.
rechunk (bool) – If True (default), dask arrays are rechunked with variables : -1.
dim (str) – Name of dimension along which variables are indexed.

Returns:

The transformed variable. Attributes are conserved, even if some might be incorrect, except for units, which are replaced with “”. Old units are stored in xsdba_transformation_units. A xsdba_transform attribute is added, set to the transformation method. xsdba_transform_lower and xsdba_transform_upper are also set if the requested bounds are different from the defaults.

Array with variables stacked along dim dimension. Units are set to “”.

Return type:

xr.DataArray

xsdba.processing.standardize(da: DataArray, mean: DataArray | None = None, std: DataArray | None = None, dim: str = 'time') → tuple[DataArray | Dataset, DataArray, DataArray][source]¶

Standardize a DataArray by centering its mean and scaling it by its standard deviation.

Either of both of mean and std can be provided if need be.

Returns:

out (xr.DataArray or xr.Dataset) – Standardized data.
mean (xr.DataArray) – Mean.
std (xr.DataArray) – Standard Deviation.

xsdba.processing.to_additive_space(data: xr.DataArray, lower_bound: str, upper_bound: str | None = None, trans: str = 'log', clip_next_to_bounds: str | None = None)[source]¶

Transform a non-additive variable into an additive space by the means of a log or logit transformation.

Based on Alavoine and Grenier [2022].

Parameters:

data (xr.DataArray) – A variable that can’t usually be bias-adjusted by additive methods.
lower_bound (str) – The smallest physical value of the variable, excluded, as a Quantity string. The data should only have values strictly larger than this bound.
upper_bound (str, optional) – The largest physical value of the variable, excluded, as a Quantity string. Only relevant for the logit transformation. The data should only have values strictly smaller than this bound.
trans ({'log', 'logit'}) – The transformation to use. See notes.
clip_next_to_bounds ({None, 'strict', 'permissive'}) – If not None, values are clipped to ensure data > lower_bound and data < upper_bound (if specified). Further, if trans is logit or log, also a clip the normalized data to ]0, 1[ or ]0,None, respectively. If ‘strict`, data must be in the range [lower_bound, upper_bound], else an error is thrown. If ‘permissive’, data will be clipped no matter the maximum and minimum.

See also

from_additive_space: For the inverse transformation.
jitter_under_thresh: Remove values exactly equal to the lower bound.
jitter_over_thresh: Remove values exactly equal to the upper bound.

Notes

Given a variable that is not usable in an additive adjustment, this applies a transformation to a space where additive methods are sensible. Given \(X\) the variable, \(b_-\) the lower physical bound of that variable and \(b_+\) the upper physical bound, two transformations are currently implemented to get \(Y\), the additive-ready variable. \(\ln\) is the natural logarithm.

log

\[Y = \ln\left( X - b_- \right)\]

Usually used for variables with only a lower bound, like precipitation (pr, prsn, etc) and daily temperature range (dtr). Both have a lower bound of 0.
logit

\[X' = (X - b_-) / (b_+ - b_-) Y = \ln\left(\frac{X'}{1 - X'} \right)\]

Usually used for variables with both a lower and a upper bound, like relative and specific humidity, cloud cover fraction, etc.

This will thus produce Infinity and NaN values where \(X == b_-\) or \(X == b_+\). We recommend using jitter_under_thresh() and jitter_over_thresh() to remove those issues.

If \(X \in [b_-, b_+]\), clip_next_to_bounds can be set to True, and boundary values will be slightly changed (with the smallest float32 increment) to ensure that \(X \in ]b_-, b_+[\).

References

Alavoine and Grenier [2022].

xsdba.processing.uniform_noise_like(da: DataArray, low: float = 1e-06, high: float = 0.001) → DataArray[source]¶

Return a uniform noise array of the same shape as da.

Noise is uniformly distributed between low and high. Alternative method to jitter_under_thresh for avoiding zeroes.

xsdba.processing.unstack_variables(da: DataArray, dim: str | None = None) → Dataset[source]¶

Unstack a DataArray created by stack_variables to a dataset.

Parameters:

da (xr.DataArray) – Array holding different variables along dim dimension.
dim (str, optional) – Name of dimension along which the variables are stacked. If not specified (default), dim is inferred from attributes of the coordinate.

Returns:

Dataset holding each variable in an individual DataArray.

Return type:

xr.Dataset

xsdba.processing.unstandardize(da: DataArray, mean: DataArray, std: DataArray)[source]¶: Rescale a standardized array by performing the inverse operation of standardize.

xsdba.properties module¶

Properties Submodule¶

SDBA diagnostic tests are made up of statistical properties and measures. Properties are calculated on both simulation and reference datasets. They collapse the time dimension to one value.

This framework for the diagnostic tests was inspired by the VALUE project. Statistical Properties is the xclim term for ‘indices’ in the VALUE project.

This module depends on xclim. Run pip install xsdba[‘extras’] to install it.

class xsdba.properties.StatisticalProperty(**kwds)[source]¶

Bases: Indicator

Base indicator class for statistical properties used for validating bias-adjusted outputs.

Statistical properties reduce the time dimension, sometimes adding a grouping dimension according to the passed value of group (e.g.: group=’time.month’ means the loss of the time dimension and the addition of a month one).

Statistical properties are generally unit-generic. To use those indicator in a workflow, it is recommended to wrap them with a virtual submodule, creating one specific indicator for each variable input (or at least for each possible dimensionality).

Statistical properties may restrict the sampling frequency of the input, they usually take in a single variable (named “da” in unit-generic instances).

allowed_groups = None¶: A list of allowed groupings. A subset of dayofyear, week, month, season or group. The latter stands for no temporal grouping.

aspect = None¶

marginal, temporal, multivariate or spatial.

Type:: The aspect the statistical property studies

get_measure()[source]¶: Get the statistical measure indicator that is best used with this statistical property.

measure = 'xsdba.measures.BIAS'¶: The default measure to use when comparing the properties of two datasets. This gives the registry id. See get_measure().

realm = 'generic'¶

xsdba.properties.first_eof()[source]¶: EOF Statistical Property (function removed).

Warning

Due to a licensing issue, eofs-based functionality has been permanently removed. Please excuse the inconvenience. For more information, see: https://github.com/Ouranosinc/xclim/issues/1620

xsdba.typing module¶

Typing Utilities¶

class xsdba.typing.DateStr¶

Type annotation for strings representing full dates (YYYY-MM-DD), may include time.

alias of str

class xsdba.typing.DayOfYearStr¶

Type annotation for strings representing dates without a year (MM-DD).

alias of str

class xsdba.typing.InputKind(*values)[source]¶

Bases: IntEnum

Constants for input parameter kinds.

For use by external parses to determine what kind of data the indicator expects. On the creation of an indicator, the appropriate constant is stored in xsdba.indicator.Indicator.parameters. The integer value is what gets stored in the output of xsdba.indicator.Indicator.json().

For developers : for each constant, the docstring specifies the annotation a parameter of an indice function should use in order to be picked up by the indicator constructor. Notice that we are using the annotation format as described in PEP 604, i.e. with ‘|’ indicating a union and without import objects from typing.

BOOL = 9¶

A boolean flag.

Annotation : bool, may be optional.

DATASET = 70¶

An xarray dataset.

Developers : as indices only accept DataArrays, this should only be added on the indicator’s constructor.

DATE = 7¶

A date in the YYYY-MM-DD format, may include a time.

Annotation : xsdba.typing.DateStr (may be optional).

DAY_OF_YEAR = 6¶

A date, but without a year, in the MM-DD format.

Annotation : xsdba.typing.DayOfYearStr (may be optional).

DICT = 10¶

A dictionary.

Annotation : dict or dict | None, may be optional.

FREQ_STR = 3¶

A string representing an “offset alias”, as defined by pandas.

See the Pandas documentation on Offset aliases for a list of valid aliases.

Annotation : str + freq as the parameter name.

KWARGS = 50¶

A mapping from argument name to value.

Developers : maps the **kwargs. Please use as little as possible.

NUMBER = 4¶

A number.

Annotation : int, float and unions thereof, potentially optional.

NUMBER_SEQUENCE = 8¶

A sequence of numbers

Annotation : Sequence[int], Sequence[float] and unions thereof, may include single int and float, may be optional.

OPTIONAL_VARIABLE = 1¶

An optional data variable (DataArray or variable name).

Annotation : xr.DataArray | None. The default should be None.

OTHER_PARAMETER = 99¶

An object that fits None of the previous kinds.

Developers : This is the fallback kind, it will raise an error in xsdba’s unit tests if used.

QUANTIFIED = 2¶

A quantity with units, either as a string (scalar), a pint.Quantity (scalar) or a DataArray (with units set).

Annotation : xsdba.typing.Quantified and an entry in the xsdba.units.declare_units() decorator. “Quantified” translates to str | xr.DataArray | pint.util.Quantity.

STRING = 5¶

A simple string.

Annotation : str or str | None. In most cases, this kind of parameter makes sense with choices indicated in the docstring’s version of the annotation with curly braces. # TODO : what about this notebook? removed reference to extendxclim

VARIABLE = 0¶

A data variable (DataArray or variable name).

Annotation : xr.DataArray.

class xsdba.typing.Quantified¶

Type annotation for thresholds and other not-exactly-a-variable quantities

alias of TypeVar(‘Quantified’, ~xarray.DataArray, str, ~pint.registry.Quantity)

xsdba.units module¶

Units Handling Submodule¶

xsdba.units.convert_units_to(source: Quantified, target: Quantified | Unit) → DataArray | float[source]¶

Convert a mathematical expression into a value with the same units as a DataArray.

If the dimensionalities of source and target units differ, automatic CF conversions will be applied when possible.

Parameters:

source (str or xr.DataArray or units.Quantity) – The value to be converted, e.g. ‘4C’ or ‘1 mm/d’.
target (str or xr.DataArray or units.Quantity or units.Unit) – Target array of values to which units must conform.

Returns:

The source value converted to target’s units. The outputted type is always similar to source initial type. Attributes are preserved unless an automatic CF conversion is performed, in which case only the new standard_name appears in the result.

Return type:

xr.DataArray or float

xsdba.units.harmonize_units(params_to_check)[source]¶: Compare units and perform a conversion if possible, otherwise raise a ValidationError.

xsdba.units.infer_sampling_units(da: DataArray, deffreq: str | None = 'D', dim: str = 'time') → tuple[int, str][source]¶

Infer a multiplier and the units corresponding to one sampling period.

Parameters:

da (xr.DataArray) – A DataArray from which to take coordinate dim.
deffreq (str, optional) – If no frequency is inferred from da[dim], take this one.
dim (str) – Dimension from which to infer the frequency.

Returns:

int – The magnitude (number of base periods per period).
str – Units as a string, understandable by pint.

Raises:

ValueError – If the frequency has no exact corresponding units.

xsdba.units.pint2cfattrs(value: Quantity | Unit, is_difference=None) → dict[source]¶

Return CF-compliant units attributes from a pint unit.

Parameters:

value (pint.Unit) – Input unit.
is_difference (bool) – Whether the value represent a difference in temperature, which is ambiguous in the case of absolute temperature scales like Kelvin or Rankine. It will automatically be set to True if units are “delta_*” units.

Returns:

Units following CF-Convention, using symbols.

Return type:

dict

xsdba.units.pint_multiply(da: DataArray, q: Quantity | str, out_units: str | None = None) → DataArray[source]¶

Multiply xarray.DataArray by pint.Quantity.

Parameters:

da (xr.DataArray) – Input array.
q (pint.Quantity or str) – Multiplicative factor.
out_units (str, optional) – Units the output array should be converted into.

Return type:

xr.DataArray

xsdba.units.str2pint(val: str) → Quantity[source]¶

Convert a string to a pint.Quantity, splitting the magnitude and the units.

Parameters:: val (str) – A quantity in the form “[{magnitude} ]{units}”, where magnitude can be cast to a float and units is understood by units2pint.
Returns:: Magnitude is 1 if no magnitude was present in the string.
Return type:: pint.Quantity

xsdba.units.units2pint(value: DataArray | Unit | Quantity | dict | str) → Unit[source]¶

Return the pint Unit for the DataArray units.

Parameters:: value (xr.DataArray or pint.Unit or pint.Quantity or dict or str) – Input data array or string representing a unit (may contain a magnitude).
Returns:: Units of the data array.
Return type:: pint.Unit

Notes

To avoid ambiguity related to differences in temperature vs absolute temperatures, set the units_metadata attribute to “temperature: difference” or “temperature: on_scale” on the DataArray.

xsdba.units.units2str(value: DataArray | str | Quantity | Unit) → str[source]¶

Return a str unit from various inputs.

Parameters:: value (xr.DataArray or str or pint.Quantity or pint.Unit) – Input data array or string representing a unit (with no magnitude).
Returns:: Units of the data array.
Return type:: pint.Unit

xsdba.utils module¶

Testing Utilities for xsdba¶

xsdba.utils.add_cyclic_bounds(da: DataArray, att: str, cyclic_coords: bool = True) → DataArray | Dataset[source]¶

Reindex an array to include the last slice at the beginning and the first at the end.

This is done to allow interpolation near the end-points.

Parameters:

da (xr.DataArray or xr.Dataset) – An array.
att (str) – The name of the coordinate to make cyclic.
cyclic_coords (bool) – If True, the coordinates are made cyclic as well. If False, the new values are guessed using the same step as their neighbour.

Returns:

A DataArray or Dataset but with the last element along att prepended and the last one appended.

Return type:

xr.DataArray or xr.Dataset

xsdba.utils.apply_correction(x: DataArray, factor: DataArray, kind: str | None = None) → DataArray[source]¶

Apply the additive or multiplicative correction/adjustment factors.

If kind is not given, default to the one stored in the “kind” attribute of factor.

xsdba.utils.best_pc_orientation_full(R: ndarray, Hinv: ndarray, Rmean: ndarray, Hmean: ndarray, hist: ndarray) → ndarray[source]¶

Return best orientation vector for A according to the method of Alavoine and Grenier [2022].

Eigenvectors returned by pc_matrix do not have a defined orientation. Given an inverse transform Hinv, a transform R, the actual and target origins Hmean and Rmean and the matrix of training observations hist, this computes a scenario for all possible orientations and return the orientation that maximizes the Spearman correlation coefficient of all variables. The correlation is computed for each variable individually, then averaged.

This trick is explained in Alavoine and Grenier [2022]. See docstring of sdba.adjustment.PrincipalComponentAdjustment().

Parameters:

R (np.ndarray) – MxM Matrix defining the final transformation.
Hinv (np.ndarray) – MxM Matrix defining the (inverse) first transformation.
Rmean (np.ndarray) – M vector defining the target distribution center point.
Hmean (np.ndarray) – M vector defining the original distribution center point.
hist (np.ndarray) – MxN matrix of all training observations of the M variables/sites.

Returns:

M vector of orientation correction (1 or -1).

Return type:

np.ndarray

See also

sdba.adjustment.PrincipalComponentAdjustment

References

Alavoine and Grenier [2022]

xsdba.utils.best_pc_orientation_simple(R: ndarray, Hinv: ndarray, val: float = 1000) → ndarray[source]¶

Return best orientation vector according to a simple test.

Eigenvectors returned by pc_matrix do not have a defined orientation. Given an inverse transform Hinv and a transform R, this returns the orientation minimizing the projected distance for a test point far from the origin.

This trick is inspired by the one exposed in Hnilica et al. [2017]. For each possible orientation vector, the test point is reprojected and the distance from the original point is computed. The orientation minimizing that distance is chosen.

Parameters:

R (np.ndarray) – MxM Matrix defining the final transformation.
Hinv (np.ndarray) – MxM Matrix defining the (inverse) first transformation.
val (float) – The coordinate of the test point (same for all axes). It should be much greater than the largest furthest point in the array used to define B.

Returns:

Mx1 vector of orientation correction (1 or -1).

Return type:

np.ndarray

See also

sdba.adjustment.PrincipalComponentAdjustment

References

Hnilica, Hanel, and Puš [2017]

xsdba.utils.bin_width_estimator(X)[source]¶

Estimate the bin width of an histogram.

References

Robin [2021]

xsdba.utils.broadcast(grouped: DataArray, x: DataArray, *, group: str | Grouper = 'time', interp: str = 'nearest', sel: dict[str, DataArray] | None = None) → DataArray[source]¶

Broadcast a grouped array back to the same shape as a given array.

Parameters:

grouped (xr.DataArray) – The grouped array to broadcast like x.
x (xr.DataArray) – The array to broadcast grouped to.
group (str or Grouper) – Grouping information. See xsdba.base.Grouper for details.
interp ({'nearest', 'linear', 'cubic'}) – The interpolation method to use.
sel (dict[str, xr.DataArray]) – Mapping of grouped coordinates to x coordinates (other than the grouping one).

Return type:

xr.DataArray

xsdba.utils.copy_all_attrs(ds: Dataset | DataArray, ref: Dataset | DataArray)[source]¶: Copy all attributes of ds to ref, including attributes of shared coordinates, and variables in the case of Datasets.

xsdba.utils.ecdf(x: DataArray, value: float, dim: str = 'time') → DataArray[source]¶

Return the empirical CDF of a sample at a given value.

Parameters:

x (array) – Sample.
value (float) – The value within the support of x for which to compute the CDF value.
dim (str) – Dimension name.

Returns:

Empirical CDF.

Return type:

xr.DataArray

xsdba.utils.ensure_longest_doy(func: Callable) → Callable[source]¶: Ensure that selected day is the longest day of year for x and y dims.

xsdba.utils.eps_cholesky(M, nit=26)[source]¶

Cholesky decomposition.

References

Higham [1988], Knol and ten Berge [1989], Robin [2021]

xsdba.utils.equally_spaced_nodes(n: int, eps: float | None = None) → ndarray[source]¶

Return nodes with n equally spaced points within [0, 1], optionally adding two end-points.

Parameters:

n (int) – Number of equally spaced nodes.
eps (float, optional) – Distance from 0 and 1 of added end nodes. If None (default), do not add endpoints.

Returns:

Nodes between 0 and 1. Nodes can be seen as the middle points of n equal bins.

Return type:

np.array

Warning

Passing a small eps will effectively clip the scenario to the bounds of the reference on the historical period in most cases. With normal quantile mapping algorithms, this can give strange result when the reference does not show as many extremes as the simulation does.

Notes

For n=4, eps=0 : 0—x——x——x——x—1

xsdba.utils.get_clusters(data: DataArray, u1, u2, dim: str = 'time') → Dataset[source]¶

Get cluster count, maximum and position along a given dim.

See get_clusters_1d. Used by adjustment.ExtremeValues.

Parameters:

data (1D ndarray) – Values to get clusters from.
u1 (float) – Extreme value threshold, at least one value in the cluster must exceed this.
u2 (float) – Cluster threshold, values above this can be part of a cluster.
dim (str) – Dimension name.

Returns:

With variables,

nclusters : Number of clusters for each point (with dim reduced), int
start : First index in the cluster (dim reduced, new cluster), int
end : Last index in the cluster, inclusive (dim reduced, new cluster), int
maxpos : Index of the maximal value within the cluster (dim reduced, new cluster), int
maximum : Maximal value within the cluster (dim reduced, new cluster), same dtype as data.

For start, end and maxpos, -1 means nan and should always correspond to a nan in maximum. The length along cluster is half the size of “dim”, the maximal theoretical number of clusters.

Return type:

xr.Dataset

xsdba.utils.get_clusters_1d(data: ndarray, u1: float, u2: float) → tuple[ndarray, ndarray, ndarray, ndarray][source]¶

Get clusters of a 1D array.

A cluster is defined as a sequence of values larger than u2 with at least one value larger than u1.

Parameters:

data (1D ndarray) – Values to get clusters from.
u1 (float) – Extreme value threshold, at least one value in the cluster must exceed this.
u2 (float) – Cluster threshold, values above this can be part of a cluster.

Return type:

(np.array, np.array, np.array, np.array)

References

getcluster of Extremes.jl (Jalbert [2022]).

xsdba.utils.get_correction(x: DataArray, y: DataArray, kind: str) → DataArray[source]¶: Return the additive or multiplicative correction/adjustment factors.

xsdba.utils.histogram(data, bin_width, bin_origin)[source]¶

Construct an histogram of the data.

Returns the position of the center of bins, their corresponding frequency and the bin of every data point.

xsdba.utils.interp_on_quantiles(newx: DataArray, xq: DataArray, yq: DataArray, *, group: str | Grouper = 'time', method: str = 'linear', extrapolation: str = 'constant')[source]¶

Interpolate values of yq on new values of x.

Interpolate in 2D with scipy.interpolate.griddata() if grouping is used, in 1D otherwise, with scipy.interpolate.interp1d. Any nans in xq or yq are removed from the input map. Similarly, nans in newx are left nans.

Parameters:

newx (xr.DataArray) – The values at which to evaluate yq. If group has group information, new should have a coordinate with the same name as the group name In that case, 2D interpolation is used.
xq (xr.DataArray) – Coordinates and values on which to interpolate. The interpolation is done along the “quantiles” dimension if group has no group information. If it does, interpolation is done in 2D on “quantiles” and on the group dimension.
yq (xr.DataArray) – Coordinates and values on which to interpolate. The interpolation is done along the “quantiles” dimension if group has no group information. If it does, interpolation is done in 2D on “quantiles” and on the group dimension.
group (str or Grouper) – The dimension and grouping information. (ex: “time” or “time.month”). Defaults to “time”.
method ({'nearest', 'linear', 'cubic'}) – The interpolation method.
extrapolation ({'constant', 'nan'}) – The extrapolation method used for values of newx outside the range of xq. See notes.

Notes

Extrapolation methods:

‘nan’ : Any value of newx outside the range of xq is set to ‘nan’.
‘constant’ : Values of newx smaller than the minimum of xq are set to the first value of yq and those larger than the maximum, set to the last one (first and last non-nan values along the “quantiles” dimension). When the grouping is “time.month”, these limits are linearly interpolated along the month dimension.

xsdba.utils.invert(x: DataArray, kind: str | None = None) → DataArray[source]¶

Invert a DataArray either by addition (-x) or by multiplication (1/x).

If kind is not given, default to the one stored in the “kind” attribute of x.

xsdba.utils.map_cdf(ds: Dataset, *, y_value: DataArray, dim: str)[source]¶

Return the value in x with the same CDF as y_value in y.

This function is meant to be wrapped in a Grouper.apply.

Parameters:

ds (xr.Dataset) –

Variables:
x, Values from which to pick, y, Reference values giving the ranking.
y_value (float, array) – Value within the support of y.
dim (str) – Dimension along which to compute quantile.

Returns:

Quantile of x with the same CDF as y_value in y.

Return type:

array

xsdba.utils.map_cdf_1d(x, y, y_value)[source]¶: Return the value in x with the same CDF as y_value in y.

xsdba.utils.optimal_transport(gridX, gridY, muX, muY, num_iter_max, normalization)[source]¶

Compute the optimal transportation plan on (transformations of) X and Y.

References

Robin [2021]

xsdba.utils.pc_matrix(arr: ndarray | Array) → ndarray | Array[source]¶

Construct a Principal Component matrix.

This matrix can be used to transform points in arr to principal components coordinates. Note that this function does not manage nans; if a single observation is null, all elements of the transformation matrix involving that variable will be nan.

Parameters:: arr (numpy.ndarray or dask.array.Array) – 2D array (M, N) of the M coordinates of N points.
Returns:: MxM Array of the same type as arr.
Return type:: numpy.ndarray or dask.array.Array

xsdba.utils.rand_rot_matrix(crd: DataArray, num: int = 1, new_dim: str | None = None) → DataArray[source]¶

Generate random rotation matrices.

Rotation matrices are members of the SO(n) group, where n is the matrix size (crd.size). They can be characterized as orthogonal matrices with determinant 1. A square matrix \(R\) is a rotation matrix if and only if \(R^t = R^{−1}\) and \(\mathrm{det} R = 1\).

Parameters:

crd (xr.DataArray) – 1D coordinate DataArray along which the rotation occurs. The output will be square with the same coordinate replicated, the second renamed to new_dim.
num (int) – If larger than 1 (default), the number of matrices to generate, stacked along a “matrices” dimension.
new_dim (str) – Name of the new “prime” dimension, defaults to the same name as crd + “_prime”.

Returns:

Float, NxN if num = 1, numxNxN otherwise, where N is the length of crd.

Return type:

xr.DataArray

References

Mezzadri [2007]

xsdba.utils.random_tiebreak(da, dim)[source]¶

Add random noise to a DataArray to break ties prior to ranking.

The noise is scaled to the smallest nonzero absolute difference between any two values along dim, ensuring it is large enough to break ties but small enough to never alter the order of non-tied values.

Parameters:

da (xr.DataArray) – Source array.
dim (str | list[str]) – Dimension(s) along which ties will later be ranked.

Returns:

DataArray with the same shape and coordinates as da, with small random noise added.

Return type:

xr.DataArray

xsdba.utils.rank(da: DataArray, dim: str | list[str] = 'time', pct: bool = False, use_random_tiebreak: bool = False) → DataArray[source]¶

Rank data along a dimension.

Replicates xr.DataArray.rank but as a function usable in a Grouper.apply(). Xarray’s docstring is below:

Equal values are assigned a rank that is the average of the ranks that would have been otherwise assigned to all the values within that set. Ranks begin at 1, not 0. If pct, computes percentage ranks, ranging from 0 to 1.

A list of dimensions can be provided and the ranks are then computed separately for each dimension.

Parameters:

da (xr.DataArray) – Source array.
dim (str | list[str], hashable) – Dimension(s) over which to compute rank.
pct (bool) – If True, compute percentage ranks, otherwise compute integer ranks. Percentage ranks range from 0 to 1, in opposition to xarray’s implementation, where they range from 1/N to 1.
use_random_tiebreak (bool) – If True, small random values are added to the input dataset to avoid equal values. The scale of the noise added is chosen according to the input data, such that the ranks are only modified to equal values and not between unequal values. Default value is False.

Returns:

DataArray with the same coordinates and dtype ‘float64’.

Return type:

DataArray

See also

xarray.DataArray.rank

Notes

The bottleneck library is required. nans in the input array are returned as nans.

xsdba.utils.sort_along_dim(da: DataArray, dim: str | list[str]) → DataArray[source]¶

Sort a DataArray along a dimension.

Parameters:

da (xr.DataArray) – Source array.
dim (str | list[str]) – Dimension(s) along which to sort.

Returns:

Sorted DataArray with the same shape and coordinates as da.

Return type:

xr.DataArray

xsdba.xsdba module¶

Main module.

xsdba package¶

Subpackages¶

Submodules¶

xsdba._adjustment module¶

Adjustment Algorithms¶

xsdba._processing module¶

Compute Functions Submodule¶

xsdba.adjustment module¶

Adjustment Methods¶

xsdba.base module¶

Base Classes and Developer Tools¶

xsdba.cli module¶

xsdba.detrending module¶

Detrending Objects Utilities¶

xsdba.formatting module¶

Formatting Utilities¶

xsdba.loess module¶

LOESS Smoothing Submodule¶

xsdba.measures module¶

Measures Submodule¶

xsdba.nbutils module¶

Numba-accelerated Utilities¶

xsdba.options module¶

Options Management for xsdba¶

xsdba.processing module¶

Pre- and Post-Processing Submodule¶

xsdba.properties module¶

Properties Submodule¶

xsdba.typing module¶

Typing Utilities¶

xsdba.units module¶

Units Handling Submodule¶

xsdba.utils module¶

Testing Utilities for xsdba¶

xsdba.xsdba module¶