resample_data

Context: data

Synopsis

Resample data with asymmetric error bars.

Syntax

resample_data(id: int | str | None = None, niter: int = 1000, seed: int
| None = None)

Returns: dict[str, np.ndarray]

Description

The function performs a parametric bootstrap assuming a skewed normal distribution centered on the observed data point with the variance given by the low and high measurement errors. The function simulates niter realizations of the data and fits each realization with the assumed model to obtain the best fit parameters. The function returns the best fit parameters for each realization, and displays the average and standard deviation for each parameter.

Examples

Example 1

Account for of asymmetric errors when calculating parameter uncertainties:

>>> set_stat("leastsq")
>>> set_method("levmar")
>>> load_ascii_with_errors(1, 'test.dat')
>>> set_model(polynom1d.p0)
>>> thaw(p0.c1)
>>> fit()
Dataset               = 1
Method                = levmar
Statistic             = leastsq
Initial fit statistic = 4322.56
Final fit statistic   = 247.768 at function evaluation 6
Data points           = 61
Degrees of freedom    = 59
Change in statistic   = 4074.79
p0.c0          3.2661       +/- 0.193009
p0.c1          2162.19      +/- 65.8445
>>> result = resample_data(1, niter=10)
p0.c0 : avg = 4.159973865314249 , std = 1.0575403309799554
p0.c1 : avg = 1943.5489865678633 , std = 268.64478808013547
>>> print(result['p0.c0'])
[5.856479033432613, 3.8252624107243465, ... 2.8704270612985345]
>>> print(result['p0.c1'])
[1510.049972062868, 1995.4742750432902, ... 2235.9753113309894]

Example 2

Display the PDF of the parameter values of the p0.c0 component from a run with 5000 iterations:

>>> sample = resample_data(1, 5000)
p0.c0 : avg = 3.966543284267264 , std = 0.9104639711036427
p0.c1 : avg = 1988.8417667057342 , std = 220.21903089622705
>>> plot_pdf(sample['p0.c0'], bins=40)

Example 3

The samples used for the analysis are returned as the samples key (as a 2D NumPy array of size number of iterations by number of data points), that can be used if further analysis is desired. In this case, the distribution of the first bin is shown as a CDF:

>>> sample = resample_data(1, 5000)
>>> samples = sample['samples']
>>> plot_cdf(samples[:, 0])

PARAMETERS

The parameters for this function are:

Parameter	Type information	Definition
id	int, str, or None, optional	The identifier of the data set to use.
niter	int, optional	The number of iterations to use. The default is 1000 .
seed	int, optional	The seed for the random number generator. The default is `none` . The `set_rng` routine should be used instead.

Return value

The return value from this function is:

sampled -- The keys are statistic, which contains the best-fit statistic value for each iteration, samples, which contains the resampled data used in the fits as a niter by ndata array, and the free parameters in the fit, containing a NumPy array containing the fit parameter for each iteration (of size niter).

Changes in CIAO

Changed in CIAO 4.17

The resampling now uses the chosen statistic and optimizer (set with set_stat and set_method). Previously the least-squares statistic and Levenberg-Marquardt method were always used.

Changed in CIAO 4.16

The random number generation is now controlled by the `set_rng` routine. The seed argument is now deprecated.

Added in CIAO 4.13

The samples and statistic keys were added to the return value and the parameter values are returned as NumPy arrays rather than as lists.

Bugs

See the bugs pages on the Sherpa website for an up-to-date listing of known bugs.