Changing the grouping scheme of a data set within Sherpa
Sherpa Threads (CIAO 4.12 Sherpa v1)
Overview
Synopsis:
In order to use Gaussian statistics to fit a model to a data set, it is often necessary to "group" the data—i.e., combine channels until you have enough counts—before use. It is possible to set and change the grouping of a file after it has been read into Sherpa by using the group commands: set_grouping, group, group_counts, group_snr, group_adapt, group_adapt_snr, group_bins, and group_width.
This thread shows how you can use these functions when fitting PHA data.
Last Update: 12 Dec 2019 - reviewed for CIAO 4.12: replaced ChIPS with matplotlib syntax.
Contents
Getting Started
Loading the data
This thread uses the same data set as used in the Introduction to Fitting PHA Spectra thread. We load the data set twice—using load_pha—so that we can easily see the effect of changing the grouping scheme (the screen output has been omitted for clarity):
sherpa> load_pha(1, "3c273.pi") ... sherpa> load_pha(2, "3c273.pi") ...
We note that the data in 3c273.pi was created so that each group contained at least 15 counts—by using the NUM_CTS grouptype option of dmgroup—as can be seen by using the dmhistory tool.
sherpa> !dmhistory 3c273.pi dmgroup | tr ' ' "\012" dmgroup infile="3c273.pi" outfile="./3c273.tmp" grouptype="NUM_CTS" grouptypeval="15" binspec="" xcolumn="channel" ycolumn="counts" tabspec="" tabcolumn="" stopspec="" stopcolumn="" errcolumn="" clobber="no" verbose="0" maxlength="0"
The command was preceded by "!" to tell Sherpa to execute it as a shell command, and "| tr ' ' "\012"" was used to add a new-line character between parameter values (to make the screen output easier to read).
Grouping the data
We use the group_counts function to group the second data set by 30 counts per group, which is double that of the first data set. The two data sets are then plotted using logarithmic scaling on both axes:
sherpa> group_counts(2, 30)
sherpa> set_xlog()
sherpa> set_ylog()
sherpa> plot("data", 1, "data", 2)
sherpa> plt.gcf().subplots_adjust(hspace=0.75)
The resulting plot (Figure 1) shows how the data looks before and after re-grouping.
Figure 1: Data grouped by 15 and 30 counts per group
In this thread, we use the group_counts command as an example. The other available grouping commands are:
- group_bins() — this function divides the channels into the specified number ("num") of bins.
- group_width() — this function divides the channels such that there are "num" bins in each group.
- group_snr() — this function allows the user to adaptively group PHA spectral data by signal-to-noise ratio, i.e., group bins of data until each group exceeds at least the minimum specified signal-to-noise ratio.
- group_adapt() — this function allows the user to adaptively group PHA spectral data by counts, i.e., group bins of data until the number of counts in each group exceeds the minimum number of counts specified in the 'min' argument, keeping bright features ungrouped while grouping low signal-to-noise regions.
- group_adapt_snr() — this function allows the user to adaptively group PHA spectral data by signal-to-noise ratio, i.e., group bins of data until each group exceeds at least the minimum specified signal-to-noise ratio.
The group_counts and related functions can also be used on data that was not grouped before being read into Sherpa; this can be useful for two reasons:
- You can use the functions to find the best grouping scheme for your data without having to re-run the dmgroup tool and re-load the data into Sherpa.
- You can fit the un-grouped data with the Cash statistic and then use the functions to make it easier to compare the fit to the data in plots.
Fitting the data
Once the data has been re-grouped you can use it just like any other data set. Here we repeat the fit made in the Introduction to fitting PHA spectra to see what difference the alternate grouping scheme makes.
sherpa> notice_id(2, 0.1, 6.0) sherpa> subtract(2) sherpa> set_source(2, xsphabs.abs1 * powlaw1d.p1) sherpa> abs1.nh = 0.07 sherpa> freeze(abs1) sherpa> guess(2, p1) sherpa> fit(2) Dataset = 2 Method = levmar Statistic = chi2gehrels Initial fit statistic = 997.017 Final fit statistic = 27.5501 at function evaluation 19 Data points = 22 Degrees of freedom = 20 Probability [Q-value] = 0.120485 Reduced statistic = 1.3775 Change in statistic = 969.467 p1.gamma 2.15315 +/- 0.0724515 p1.ampl 0.000230433 +/- 1.3657e-05 sherpa> show_fit() Optimization Method: LevMar name = levmar ftol = 1.19209289551e-07 xtol = 1.19209289551e-07 gtol = 1.19209289551e-07 maxfev = None epsfcn = 1.19209289551e-07 factor = 100.0 verbose = 0 Statistic: Chi2Gehrels Chi Squared with Gehrels variance. The variance is estimated from the number of counts in each bin, but unlike `Chi2DataVar`, the Gaussian approximation is not used. This makes it more-suitable for use with low-count data. The standard deviation for each bin is calculated using the approximation from [1]_: sigma(i,S) = 1 + sqrt(N(i,s) + 0.75) where the higher-order terms have been dropped. This is accurate to approximately one percent. For data where the background has not been subtracted then the error term is: sigma(i) = sigma(i,S) whereas with background subtraction, sigma(i)^2 = sigma(i,S)^2 + [A(S)/A(B)]^2 sigma(i,B)^2 Notes ----- The accuracy of the error term when the background has been subtracted has not been determined. A preferable approach to background subtraction is to model the background as well as the source signal. References ---------- .. [1] "Confidence limits for small numbers of events in astrophysical data", Gehrels, N. 1986, ApJ, vol 303, p. 336-346. http://adsabs.harvard.edu/abs/1986ApJ...303..336G Method = levmar Statistic = chi2gehrels Initial fit statistic = 997.017 Final fit statistic = 27.5501 at function evaluation 19 Data points = 22 Degrees of freedom = 20 Probability [Q-value] = 0.120485 Reduced statistic = 1.3775 Change in statistic = 969.467 p1.gamma 2.15315 +/- 0.0724515 p1.ampl 0.000230433 +/- 1.3657e-05 sherpa> covar(2) Dataset = 2 Confidence Method = covariance Iterative Fit Method = None Fitting Method = levmar Statistic = chi2gehrels covariance 1-sigma (68.2689%) bounds: Param Best-Fit Lower Bound Upper Bound ----- -------- ----------- ----------- p1.gamma 2.15315 -0.0758287 0.0758287 p1.ampl 0.000230433 -1.3896e-05 1.3896e-05
which can be compared to the original results:
sherpa> notice_id(1, 0.1, 6.0) sherpa> subtract(1) sherpa> set_source(1, abs1 * p1) sherpa> guess(p1) sherpa> fit(1) Dataset = 1 Method = levmar Statistic = chi2gehrels Initial fit statistic = 79.4714 Final fit statistic = 37.4897 at function evaluation 10 Data points = 44 Degrees of freedom = 42 Probability [Q-value] = 0.669101 Reduced statistic = 0.892612 Change in statistic = 41.9817 p1.gamma 2.15012 +/- 0.0786166 p1.ampl 0.000222779 +/- 1.43832e-05 sherpa> covar(1) Dataset = 1 Confidence Method = covariance Iterative Fit Method = None Fitting Method = levmar Statistic = chi2gehrels covariance 1-sigma (68.2689%) bounds: Param Best-Fit Lower Bound Upper Bound ----- -------- ----------- ----------- p1.gamma 2.15012 -0.0821769 0.0821769 p1.ampl 0.000222779 -1.46276e-05 1.46276e-05
The following shows both fits (the extra commands are used to make the two plots have the same X-axis, and to add plot labels):
sherpa> plot("fit", 1, "fit", 2) sherpa> import matplotlib.ticker as mtick sherpa> fig = plt.gcf() sherpa> plt.sca(fig.axes[0]) sherpa> plt.xlabel("") sherpa> plt.tick_params(which="both",direction="in",bottom=True,top=True,right=True,left=True,labelbottom=False) sherpa> gca().yaxis.set_major_formatter(mtick.FormatStrFormatter("%g")) sherpa> plt.sca(fig.axes[1]) sherpa> plt.title("") sherpa> gca().xaxis.set_major_formatter(mtick.FormatStrFormatter("%g")) sherpa> gca().yaxis.set_major_formatter(mtick.FormatStrFormatter("%g")) sherpa> plt.tick_params(which="both",direction="in",bottom=True,top=True,right=True,left=True,labelbottom=False) sherpa> plt.gcf().subplots_adjust(hspace=0) sherpa> fig.axes[0].get_shared_x_axes().join(fig.axes[0],fig.axes[1]) sherpa> fig.axes[1].autoscale() sherpa> plt.sca(fig.axes[0]) sherpa> plt.text(0.5,0.0006,"15 counts per group",color="lime",fontsize=8) sherpa> plt.sca(fig.axes[1]) sherpa> plt.text(0.5,0.0006,"30 counts per group",color="lime",fontsize=8)
which creates this plot (Figure 2).
Figure 2: Fit to both data sets
Scripting It
The file fit.py is a Python script which performs the primary commands used above; it can be executed by typing %run -i fit.py on the Sherpa command line.
The Sherpa script command may be used to save everything typed on the command line in a Sherpa session:
sherpa> script(filename="sherpa.log", clobber=False)
(Note that restoring a Sherpa session from such a file could be problematic since it may include syntax errors, unwanted fitting trials, et cetera.)
Summary
This thread shows how you can use the grouping functionality in Sherpa to change the grouping scheme of a PHA file once it has been read into Sherpa. This allows you to see how sensitive the fit results are to the grouping scheme by changing the number of counts per group or using a different method for grouping the data.
History
14 Dec 2004 | updated for CIAO 3.2: script version and path |
17 Jun 2005 | updated information in Get Started on loading the script |
21 Dec 2005 | reviewed for CIAO 3.3: no changes |
01 Dec 2006 | reviewed for CIAO 3.4: no changes |
14 Dec 2008 | updated for CIAO 4.1: sherpa_utils.sl script replaced by new Sherpa 4.1 grouping functionality |
29 Apr 2009 | new script command is available with CIAO 4.1.2 |
17 Dec 2009 | updated for CIAO 4.2: new group_bins and group_width commands |
13 Jul 2010 | updated for CIAO 4.2 Sherpa v2: removal of S-Lang version of thread. |
15 Dec 2010 | updated for CIAO 4.3: new set_xlog and set_ylog commands are available for setting the axis scale of plots to logarithmic |
15 Dec 2011 | reviewed for CIAO 4.4: added a warning about filtering/grouping source and background data sets |
13 Dec 2012 | updated for CIAO 4.5: noted that grouping a data set in Sherpa no longer clears the existing data filter; removed an outdated warning about filtering/grouping source and background data sets, as the associated bug has been fixed. |
03 Dec 2013 | reviewed for CIAO 4.6: no changes |
02 Dec 2015 | reviewed for CIAO 4.8: no content change. |
09 Nov 2016 | reviewed for CIAO 4.9: updated screen outputs and moved closing notes to admonition block in "Grouping the data" section; no new content. |
12 Apr 2018 | reviewed for CIAO 4.10: no content change. |
06 Dec 2018 | reviewed for CIAO 4.11: screen output revised. |
12 Dec 2019 | reviewed for CIAO 4.12: replaced ChIPS with matplotlib syntax. |