Synopsis
Compare two models using the F test.
Syntax
calc_ftest(dof1, stat1, dof2, stat2) dof1 - int or sequence of int stat1 - number or sequence of number dof2 - int or sequence of int stat2 - number or sequence of number
Description
The F-test is a model comparison test; that is, it is a test used to select from two competing models which best describes a particular data set. A model comparison test statistic, T, is created from the best-fit statistics of each fit; as with all statistics, it is sampled from a probability distribution p(T). The test significance is defined as the integral of p(T) from the observed value of T to infinity. The significance quantifies the probability that one would select the more complex model when in fact the null hypothesis is correct. See also `calc_mlr` .
Examples
Example 1
>>> calc_ftest(11, 16.3, 10, 10.2) 0.03452352914891555
Example 2
>>> calc_ftest([11, 11], [16.3, 16.3], [10, 9], [10.2, 10.5]) array([0.03452353, 0.13819987])
PARAMETERS
The parameters for this function are:
Parameter | Definition |
---|---|
dof1 | degrees of freedom of the simple model |
stat1 | best-fit chi-square statistic value of the simple model |
dof2 | degrees of freedom of the complex model |
stat2 | best-fit chi-square statistic value of the complex model |
Return value
The return value from this function is:
sig -- The significance, or p-value. A standard threshold for selecting the more complex model is significance < 0.05 (the '95% criterion' of statistics).
Notes
The F test uses the ratio of the reduced chi2, which follows the F-distribution, (stat1/dof1) / (stat2/dof2). The incomplete Beta function is used to calculate the integral of the tail of the F-distribution.
The F test should only be used when:
- the simpler of the two models is nested within the other; that is, one can obtain the simpler model by setting the extra parameters of the more complex model (often to zero or one);
- the extra parameters have values sampled from normal distributions under the null hypothesis (i.e., if one samples many datasets given the null hypothesis and fits these data with the more complex model, the distributions of values for the extra parameters must be Gaussian);
- those normal distributions are not truncated by parameter space boundaries;
- the best-fit statistics are sampled from the chi-square distribution.
See Protassov et al. 2002 [1] for more discussion.
References
- [1] Protassov et al., Statistics, Handle with Care: Detecting Multiple Model Components with the Likelihood Ratio Test, Astrophysical Journal, vol 571, pages 545-559, 2002, http://adsabs.harvard.edu/abs/2002ApJ...571..545P
Bugs
See the bugs pages on the Sherpa website for an up-to-date listing of known bugs.
See Also
- data
- copy_data, dataspace1d, dataspace2d, datastack, delete_data, fake, get_axes, get_bkg_chisqr_plot, get_bkg_delchi_plot, get_bkg_fit_plot, get_bkg_model_plot, get_bkg_plot, get_bkg_ratio_plot, get_bkg_resid_plot, get_bkg_source_plot, get_counts, get_data, get_data_contour, get_data_contour_prefs, get_data_image, get_data_plot, get_data_plot_prefs, get_dep, get_dims, get_error, get_quality, get_specresp, get_staterror, get_syserror, group, group_adapt, group_adapt_snr, group_bins, group_counts, group_snr, group_width, load_ascii, load_data, load_grouping, load_quality, set_data, set_quality, ungroup, unpack_ascii, unpack_data
- filtering
- get_filter, load_filter, set_filter
- info
- get_default_id, list_data_ids, list_response_ids
- modeling
- clean
- plotting
- plot_data, set_xlinear, set_xlog, set_ylinear, set_ylog
- saving
- save_error, save_filter, save_grouping, save_quality, save_staterror, save_syserror
- utilities
- calc_data_sum, calc_data_sum2d, calc_kcorr, calc_mlr, calc_model_sum2d, calc_source_sum2d, get_rate, incbet
- visualization
- contour, contour_data, contour_ratio, histogram1d, histogram2d, image_data, rebin