Sherpa Statistics

The following fit statistics are available in Sherpa:

sherpa> list_stats()
['cash',
 'chi2',
 'chi2constvar',
 'chi2datavar',
 'chi2gehrels',
 'chi2modvar',
 'chi2xspecvar',
 'cstat',
 'cstatnegativepenalty',
 'leastsq',
 'userstat',
 'wstat']

For a detailed explanation of the fitting concepts behind X-ray spectral analysis in Sherpa, see the Sherpa documents Spectral Fitting and Statistics: a chapter for "X-ray Astronomy Handbook" on the Sherpa References page.

cash

A Poisson log-likelihood function.

Counts are sampled from the Poisson distribution, and so the best way to assess the quality of model fits is to use the product of individual Poisson probabilities computed in each bin \(i\), or the likelihood \({\cal L}\):

\[ {\cal L}~=~\prod_i \frac{M_i^{D_i}}{D_i!} \exp(-M_i) \]

where \(M_{i} = S_{i} + B_{i}\) is the sum of source and background model amplitudes, and \(D_{i}\) is the number of observed counts, in bin \(i\).

The Cash statistic (Cash 1979, ApJ 228, 939) is derived by (1) taking the logarithm of the likelihood function, (2) changing its sign, (3) dropping the factorial term (which remains constant during fits to the same dataset), and (4) multiplying by two:

\[ C~=~2 \sum_i \left[ M_i - D_i {\log}M_i \right] \]

The factor of two exists so that the change in Cash statistic from one model fit to the next, \(\Delta C\), is distributed approximately as \(\Delta \chi^{2}\) when the number of counts in each bin is high (> 5). One can then in principle use \(\Delta C\) instead of \(\Delta \chi^{2}\) in certain model comparison tests. However, unlike \(\chi^{2}\), the Cash statistic may be used regardless of the number of counts in each bin.

The magnitude of the Cash statistic depends upon the number of bins included in the fit and the values of the data themselves. Hence one cannot analytically assign a goodness-of-fit measure to a given value of the Cash statistic. Such a measure can, in principle, be computed by performing Monte Carlo simulations. One would repeatedly sample new datasets from the best-fit model, and fit them, and note where the observed Cash statistic lies within the derived distribution of Cash statistics.

Note on Background Subtraction

The background should not be subtracted from the data when this statistic is used. It should be modeled simultaneously with the source.

Examples

Specify the fitting statistic and then confirm it has been set.
```
sherpa> set_stat('cash')
sherpa> print(get_stat_name())
cash
```

cstat

A Poisson log-likelihood function.

The CSTAT statistic is equivalent to the XSPEC implementation of the Cash statistic when the data set has no associated background or a model is to be fit to the background. The wstat statistic, added in CIAO 4.8, includes the background data, however, note that there is a bias when the background contains many channels with 0 counts: see the notebook by Giacomo Vianello.

The following is based on the Poisson data (cstat) discussion from XSPEC.

\[ {\cal L}~=~\prod_i \frac{M_i^{D_i}}{D_i!} \exp(-M_i) \]

where \(M_{i} = S_{i} + B_{i}\) is the sum of source and background model amplitudes, and \(D_{i}\) is the number of observed counts, in bin \(i\).

The CSTAT statistic is derived by (1) taking the natural logarithm of the likelihood function, (2) changing its sign, (3) approximating the factorial term by \(\log{D_{i}!} = D_{i} \log{D_{i}}\), (4) adding an extra data-dependent term, and (4) multiplying by two:

\[ C~=~2 \sum_i \left[ M_i - D_i + D_i ({\log}D_i - {\log}M_i )\right] \]

The factor of two exists so that the change in CSTAT statistic from one model fit to the next, \(\Delta C\), is distributed approximately as \(\Delta \chi^{2}\) when the number of counts in each bin is high (> 5). One can then in principle use \(\Delta C\) instead of \(\Delta \chi^{2}\) in certain model comparison tests. However, unlike \(\chi^{2}\), the CSTAT statistic may be used regardless of the number of counts in each bin.

The advantage of CSTAT over Sherpa's implementation of CASH is that one can assign an approximate goodness-of-fit measure to a given value of the CSTAT statistic, i.e., the observed statistic, divided by the number of degrees of freedom, should be of order 1 for good fits.

Note on Background Subtraction

The background should not be subtracted from the data when this statistic is used. It should be modeled simultaneously with the source. An alternative is to use the wstat statistic, added in CIAO 4.8.

Examples

Specify the fitting statistic and then confirm it has been set.

sherpa> set_stat("cstat")
sherpa> print(get_stat_name())
'cstat'

sherpa> set_stat("cstatnegativepenalty")
sherpa> print(get_stat_name())
'cstatnegativepenalty'

Penalized CSTAT for Negative Model Values

[New] Added in CIAO 4.18, the cstatnegativepenalty statistic is the same as the CSTAT statistic except when the model predicts one or more negative values. In this case the statistic is penalized by the absolute value of the negative value, or values. The aim is to help the optimizer out in moving the fit to a more-physical location in the search space.

wstat

A Poisson log-likelihood function with Poisson background.

The WSTAT statistic is equivalent to the XSPEC implementation of the Cash statistic when the data set has an associated background component. The cstat statistic should be used instead if no background is present, or a model is to be fit to the background.

The statistic is described in the Poisson data with Poisson background (cstat) section of the XSPEC documentation.

Note on Background Subtraction

The background should not be subtracted from the data when this statistic is used. It will be automatically included in the fit, but note that care should be taken when interpreting the results, since the behavior of the statistic with low counts is not guaranteed.

A good discussion of a bias related to the low counts background data with many empty channels is presented by Giacomo Vianello in the Notebook on GitHub.

An alternative is to use the cash or cstat , and fit a model to the background. This avoids the bias when the background is low.

Warning

The background is included when fitting the data, and calculating statistics, but commands like plot_model and plot_fit do not . This means that the displayed fit will be lower than the data (the difference depends on the contribution of the background to the source region).

Examples

Specify the fitting statistic and then confirm it has been set.

sherpa> set_stat("wstat")
sherpa> print(get_stat_name())
wstat

\(\chi^{2}\) statistics:

A Gaussian log-likelihood function.

Counts are sampled from the Gaussian (Normal) distribution, and so the best way to assess the quality of model fits is to use the product of individual Gaussian probabilities computed in each bin \(i\), or the likelihood \({\cal L}\):

\[ {\cal L}~=~\prod_i \frac{1}{\sigma^2\sqrt{2\pi}} \exp \frac{(N_i-M_i)^2}{2\sigma^2} \]

where \(M_{i} = S_{i} + B_{i}\) is the sum of source and background model amplitudes, and \(N_{i}\) is the number of observed counts, in bin \(i\).

The \(\chi^{2}\) statistic can be derived by taking the natural log of the likelihood function, (2) multiplying by two, and (3) ignoring the terms which depend only on the data.

\[ \chi^2 = \sum_i \frac{[N_{i,S}-B_i(x_i,p_B)-S_i(x_i,p_S)]^2}{\sigma_i^2} \]

where \(N_{i,S}\) is the total number of observed counts in bin \(i\) of the on-source region; \(B_{i}\left(x_{i},p_{B} \right)\) is the number of predicted background model counts in bin \(i\) of the on-source region (zero for background-subtracted data), rescaled from bin \(i\) of the off-source region, and computed as a function of the model argument \(x_{i}\) (e.g., energy or time) and set of background model parameter values \(p_{B}\); \(S_{i}\left(x_{i},p_{S} \right)\) is the number of predicted source model counts in bin \(i\), as a function of the model argument \(x_{i}\) and set of source model parameter values \(p_{S}\) and \(\sigma_{i}\) is the error in bin \(i\).

The options for assigning \(\sigma_{i}\) are described in the documentation for chi2datavar, chi2gehrels, and chi2modvar. In each of these descriptions, \(N_{i,B}\) is the total number of observed counts in bin \(i\) of the off-source region; \(A_{B}\) is the off-source "area," which could be the size of the region from which the background is extracted, or the length of a background time segment, or a product of the two, etc.; and \(A_{S}\) is the on-source "area."

In the analysis of PHA data, \(A_{B}\) is the product of the BACKSCAL and EXPTIME FITS header keyword values, provided in the file containing the background data. \(A_{S}\) is computed similarly, from keyword values in the source data file.

Note that in the current version of Sherpa, it is assumed that there is a one-to-one mapping between a given background region bin and a given source region bin. For instance, in the analysis of PHA data, it is assumed that the input background counts spectrum is binned in exactly the same way as the input source counts spectrum, and any filter applied to the source spectrum automatically applied to the background spectrum. This means that currently, the user cannot, e.g., specify arbitrary background and source regions in two dimensions and get correct results. This will be changed in a future version of Sherpa.

(However, this limitation only applies when analyzing background data that have been entered with the load_bkg command. One can always enter the background as a separate dataset and jointly fit the source and background regions.)

leastsq

The leastsq statistic is equivalent to chi2constvar, except the variance is always set to 1.

Examples

Set the fitting statistic and then confirm the new value:

sherpa> set_stat("leastsq")
sherpa> print(get_stat_name())
leastsq

chi2constvar

\(\chi^{2}\) statistic with constant variance computed from the counts data. (CHI PARENT | CHI CVAR in CIAO 3.4)

In some applications, the variance can be assumed to be constant and for this statistics the variance is calculated as the mean number of counts, or

\[ \sigma_i^2 = \frac{1}{N} \sum_{j=1}^N \left[ N_{j,S} + \left(\frac{A_S}{A_B}\right)^2 N_{j,B} \right] \, , \]

where \(N\) is the number of on-source (and off-source) bins included in the fit. The background term appears only if a background region is specified and background subtraction is done.

See the section on \(\chi^{2}\) statistics for more information, including definitions of the additional quantities shown in the equation.

Examples

Specify the fitting statistic and then confirm it has been set.

sherpa> set_stat('chi2constvar')
sherpa> print(get_stat_name())
chi2constvar

chi2datavar

\(\chi^{2}\) statistic with variance computed from the data. (CHI DVAR in CIAO 3.4)

If the number of counts in each bin is large (> 5), then the shape of the Poisson distribution from which the counts are sampled tends asymptotically towards that of a Gaussian distribution, with variance

\[ \sigma_i^2 = N_{i,S} + \left(\frac{A_S}{A_B}\right)^2 N_{i,B} \, . \]

The background term appears only if a background region is specified and background subtraction is done. See the section on \(\chi^{2}\) statistics for more information, including definitions of the additional quantities shown in the equation.

Examples

Specify the fitting statistic and then confirm it has been set.

sherpa> set_stat('chi2datavar')
sherpa> print(get_stat_name())
chi2datavar

chi2gehrels

\(\chi^{2}\) statistic with the Gehrels variance function.

This is the Sherpa default statistic.

If the number of counts in each bin is small (< 5), then we cannot assume that the Poisson distribution from which the counts are sampled has a nearly Gaussian shape. The standard deviation (i.e., the square-root of the variance) for this low-count case has been derived by Gehrels (1986):

\[ \sigma_{i,S} = 1 + \sqrt{N_{i,S}+0.75} \]
\[ \sigma_{i,B} = 1 + \sqrt{N_{i,B}+0.75} \]

Higher-order terms have been dropped from the expression; it is accurate to approximately one percent. If one does not perform background subtraction, then \(\sigma_{i} = \sigma_{i,S}\); otherwise, one may use standard error propagation to estimate that

\[ \sigma_i^2 = \sigma_{i,S}^2 + \left(\frac{A_S}{A_B}\right)^2 \sigma_{i,B}^2\, . \]

Note on Background Subtraction

We have not determined the accuracy of the latter expression, thus the user should proceed with caution when subtracting background from the raw data when using this statistic. An approach preferable to background subtraction is to model the background and data simultaneously.

Examples

Specify the fitting statistic and then confirm it has been set.

sherpa> set_stat("chi2gehrels")
sherpa> print(get_stat_name())
chi2gehrels

chi2modvar

\(\chi^{2}\) statistic with variance computed from model amplitudes. (CHI MVAR in CIAO 3.4)

This statistic is equivalent to chi2datavar , except that the variance is estimated using the background and source model amplitudes rather than the observed counts data:

\[ \sigma_i^2 = S_i + \left(\frac{A_S}{A_B}\right)^2 B_{i,\rm off} \, , \]

where \(B_{i,\rm off}\) is the background model amplitude in bin \(i\) of the off-source region. See the section on \(\chi^{2}\) statistics for more information, including definitions of the additional quantities shown in the equation.

Note on Background Subtraction

The background should not be subtracted from the data when this statistic is used. chi2modvar underestimates the variance when fitting background-subtracted data.

Examples

Specify the fitting statistic and then confirm it has been set.

sherpa> set_stat("chi2modvar")
sherpa> print(get_stat_name())
chi2modvar

chi2xspecvar

\(\chi^{2}\) statistic with variance computed from data amplitudes.

This statistic is equivalent to chi2datavar , except that when the number of counts in a channel bin is less than 1 the variance is set to 1.

Primini [removed]

Warning

The primini iterated-fit statistic was removed in CIAO 4.14. This documentation is provided for historical purposes.

This is a \(\chi^{2}\) statistic with variance computed from model amplitudes derived in the iterative process.

Iterative fitting with the Primini method takes effect when the fit function is called, after changing the current iterative fitting method from the default value of 'none' to 'primini' with 'set_iter_method("primini").' This is identical to running 'fit()' in Sherpa in the usual way, except that it modifies the chosen \(\chi^{2}\) fit statistic to use the Primini variance function. This 'Iterative Weighting' (IW; see Wheaton et al. 1995, ApJ 438, 322) attempts to remove biased estimates of model parameters which is inherent in \(\chi^{2}\) statistics (see Kearns, Primini, & Alexander, 1995, ADASS IV, 331).

The variance in bin \(i\) is estimated to be:

\[ {\sigma^2}_{i}^{j} = S(i,\hat{\theta}_{S}^{j-1}) + \left(\frac{A_S}{A_B}\right)^2 B_{\rm off}(i,\hat{\theta}_{B}^{j-1}) \, , \]

where \(j\) is the number of iterations that have been carried out in the fitting process, \(B_{\rm off}\) is the background model amplitude in bin \(i\) of the off-source region, and \(\hat{\theta}_{S}^{j-1}\) and \(\hat{\theta}_{B}^{j-1}\) are the set of source and background model parameter values derived during the iteration previous to the current one. The variances are set to an array of ones on the first iteration.

In addition to reducing parameter estimate bias, this statistic can be used even when the number of counts in each bin is small (< 5), although the user should proceed with caution.

Note on Background Subtraction

The background should not be subtracted from the data when this statistic is used. The Primini fit statistic underestimates the variance when fitting background-subtracted data.

Example

To use the Primini iterative fitting method (with \(\chi^{2}\) statistics only), set the current iterative fitting method to "primini" with the set_iter_method function:
```
sherpa> print(get_iter_method_name())
none
sherpa> set_iter_method("primini")
sherpa> set_stat("chi2gehrels")
sherpa> fit()
```

User statistic

It is possible for the user to create and implement his or her own fit statistic function within Sherpa. A fit statistic is the measure one uses to compare predicted model data to real measured data. It is by minimizing the fit statistic that one finds a good fit for a model.

The load_user_stat() Sherpa function accommodates user-defined functions for a statistic and statistical errors, in addition to defining a list of model parameters and hyperparameters for prior distributions (if prior desired).

Changes in CIAO 4.8

The function interface for user statistics has changed, since the bkg parameter has been added (with a default value of None).

Examples

A simple user-defined statistic in Python Sherpa would be defined as follows:

sherpa> def my_stat_func(data, model, staterror, syserror=None, weight=None, bkg=None): 
    "A simple function to replicate \(\chi^{2}\)"
    fvec = ((data - model) / staterror)
    stat = (fvec**2).sum()
    return (stat, fvec)

sherpa> def my_staterr_func(data):
    "A simple staterror function"
    return numpy.sqrt(data)

sherpa> load_user_stat("mystat", my_stat_func, my_staterr_func)

sherpa> set_stat(mystat)

A more complex user-defined statistic, with prior distributions, would look like this:
```
sherpa> def my_stat_func(data, model, staterror=None, syserror=None, weight=None, bkg=None):
    ...
    return (stat, fvec)

sherpa> load_user_stat("mystat", my_stat_func, priors=dict(mugamma=0.017, ..., gamma=abs1.nh, ... ))

sherpa> set_stat(mystat)
```
In these examples, 'stat' is a scalar statistic value and 'fvec' is an array of the statistic contributions per data bin. This method caters to both types of optimization methods in Sherpa: levmar expects the statistic contribution per bin, whereas simplex and moncar expect a single scalar statistic value.

/ sherpa / statistics /