Datastack is a CIAO Sherpa extension package for manipulating and fitting a stack of related datasets. It provides stack-enabled (i.e. vectorized) versions of the key Sherpa commands used to load data, set source models, get and set parameters, fit, and plot.
For X-ray spectral analysis this means that a group of related datasets, for instance 10 observations of the same source taken at different times, can be analyzed as if they were a single merged dataset. This has the important advantage of using the appropriate individual response functions while hiding the complexity (or tedium) of defining models and parameters for all the datasets.
The example below shows what is required to simultaneously fit three spectra of a source taken from different epochs. Bear in mind that these commands would also work in the case of 20 or even 100 spectra, at which point the utility of datastack becomes more obvious.
from datastack import *
load_pha([], 'data/acis_1_pha3.fits')
load_pha([], 'data/acis_2_pha3.fits')
load_pha([], 'data/acis_3_pha3.fits')
subtract([])
group_counts([], 10)
notice([], 0.5, 7)
set_source([], xsphabs.gal * powlaw1d.powID)
link([], 'pow.gamma')
set_par([], 'gal.nh', 0.04)
fit([])
plot_fit_resid([])
That’s all it takes. To try this out go into the datastack installation source directory (Installation):
cd examples
sherpa
Then cut and paste the example commands and watch it go.
The datastack package is available for download at http://cxc.harvard.edu/contrib/datastack/downloads. That directory also contains the example data files examples_data.tar.gz.
The datastack package includes a single module that must be made available to the CIAO python so that Sherpa can import it. The first step is to untar the package tarball, change into the source directory, and initialize the CIAO environment:
tar zxvf datastack-<version>.tar.gz
tar zxvf examples_data.tar.gz -C datastack-<version>/
cd datastack-<version>
source /PATH/TO/ciao/bin/ciao.csh
There are three methods for installing. Choose ONE of the three.
Best:
The following command installs the datastack module to $HOME/.local/lib/python2.7:
python setup.py install --user
Using the --user switch you can make a local repository of packages that will run within Sherpa and/or the system Python 2.7.
Also good:
If you have write access to the CIAO installation you can just use the CIAO python to install the modules into the CIAO python library. Assuming you are in the CIAO environment do:
python setup.py install
This puts the new modules straight in to the CIAO python library so that any time you (or others in the case of a system-wide installation) enter the CIAO environment then datastack will be available. You do NOT need to set PYTHONPATH. The only downside is that the module will likely need to be re-installed after a new CIAO release.
Quick and dirty:
An alternate and simple installation strategy is to just leave the module file in the source directory and set the PYTHONPATH environment variable to point to the source directory:
setenv PYTHONPATH $PWD
This method is fine in the short term but you always have to make sure PYTHONPATH is set appropriately (perhaps in your ~/.cshrc file). And if you start doing much with Python you will have PYTHONPATH conflicts and things will get messy.
To test the installation change to the source distribution directory and do the following:
cd examples
sherpa
execfile("fit_spectra.py")
This should run through in a reasonable time and produce output indicating the simultaneous fit of four spectra.
The script examples/fit_spectra.py shows an example of simultanously fitting four X-ray spectra with a power law and Galactic absorption:
from datastack import *
clear_stack([])
load_pha([], 'data/acis_1_pha3.fits')
load_pha([], 'data/acis_2_pha3.fits')
load_pha([], 'data/acis_3_pha3.fits')
load_pha([], 'data/acis_4_pha3.fits')
group_counts([], 15)
ignore([], None, 0.5)
ignore([], 7, None)
subtract([])
set_source([], xsphabs.gal * powlaw1d.powID)
link([], 'pow.gamma')
set_par([], 'gal.nh', 0.04)
freeze([], "gal.nh")
fit([])
print 'pow.ampl:', get_par([], 'pow.ampl')
plot_fit_resid([])
Let’s break this down.
from datastack import *
This line overrides a subset of the native Sherpa commands with the stack-enabled versions. The section Overloading the Sherpa commands describes this a little more carefully and shows why this isn’t so scary. The section Object-oriented datastacks shows how to avoid the import * which will bother Pythonistas.
Next the script clears any existing datasets in the data stack. This is a good idea if you are running a script over and over to debug. Without clearing the stack the subsequent load data commands will put another copy of all the datasets onto the stack.
clear_stack([])
Next we load a few datasets into the stack. This introduces the appearance of [] (a Python empty list) as the first argument of stack-enabled commands. It informs the command to operate on the stack of data instead of a single dataset like normal. This will load datasets and automatically assign id values of 1, 2, 3, and 4 respectively:
load_pha([], "data/acis_1_pha3.fits")
load_pha([], "data/acis_2_pha3.fits")
load_pha([], "data/acis_3_pha3.fits")
load_pha([], "data/acis_4_pha3.fits")
Now we tell Sherpa to group by counts, ignore data outside 0.5 to 7 keV, and subtract the background for each dataset in the stack. Underneath this just runs the corresponding native Sherpa command on each dataset.:
group_counts([], 15)
ignore([], None, 0.5)
ignore([], 7, None)
subtract([])
In this case we are using [] to apply the command to all datasets. But it is possible to specify subsets of the stack by providing a list of the dataset IDs, for instance subtract([1,2]) or ignore([3], None, 0.5).
Next is setting the source model for fitting. This is where the datastack module is doing more than simply iterating over the native Sherpa commands:
set_source([], xsphabs.gal * powlaw1d.powID)
In this example there is a common galactic absorption that applies for all datasets but we want an independent power law component for each. We use the usual Sherpa model syntax but insert the ID string in the model component name to tell set_source() to create an independent powerlaw component for each dataset. The above command is essentially equivalent to the following:
set_source(1, xsphabs.gal * powlaw1d.pow1)
set_source(2, xsphabs.gal * powlaw1d.pow2)
set_source(3, xsphabs.gal * powlaw1d.pow3)
set_source(4, xsphabs.gal * powlaw1d.pow4)
Now let’s say that we want to fit just a single powerlaw photon index gamma for the data stack but leave the normalizations independent. This is done by linking the gamma parameters:
link([], "pow.gamma")
Notice that the ID is only needed when setting the source model. In all the other commands that refer to model components there is no need for the ID (and in fact you cannot supply it).
Now we need to set the initial powerlaw index and then set the Galactic absorption and freeze it:
set_par([], "gal.nh", 0.04)
freeze([], "gal.nh")
Finally we fit, print some fit values, and plot the fit with residuals:
fit([])
print "pow.ampl:", get_par([], "pow.ampl.val")
plot_fit_resid([])
This brings up four plot windows with the fit residuals for each dataset in a separate window.
By default datastack is chatty about what it’s doing but you can disable this with:
set_stack_verbose(False)
When the datastack module is imported it automatically creates a default (or “session”) data stack that is used whenever a stack-enabled command is called with a stack specifier as the first argument. The data stack itself is just a list of dataset ids plus a little extra information mostly pertaining to the source model expression.
Some non-Sherpa commands are available for data stack manipulation:
Command | Description |
---|---|
clear_stack | Clear the stack |
show_stack | Show synopsis for stack elements |
get_stack_ids | Get a list of the ids in the stack |
set_stack_verbose | Set verbose mode (False => quiet) |
Non-default or multiple data stacks are supported using object-oriented datastacks.
The datastack module overrides certain Sherpa commands so that they can accept a stack specifier as the first argument. A stack specifier is a Python list that gives the dataset ids within the stack on which to operate. An empty list [] implies operating on all elements of the stack. Examples of valid stack specifiers are:
[] # All elements
[1,3,4] # Dataset ids 1, 3, and 4 within stack
["pha1", "pha2", 3] # Dataset ids can be strings
If an id is given that it is not a member of the stack then an error will be reported.
The stack specifier is what distinguishes between a stack command and a native Sherpa command. If the first argument to a function is a list then the stack-enabled version is called, otherwise the native Sherpa command is called. Within a Sherpa session these can be mixed freely:
subtract([2,3,4]) # Subtract bkg for ids=2,3,4 in stack
subtract(5) # Subtract bkg for id=5 (native Sherpa)
This distinction is generally sufficient. Nevertheless some users may prefer to use object-oriented datastacks to keep everything tidy from the Python programmer perspective.
A subset of Sherpa data loading commands are supported. These commands will execute the native Sherpa command and add the dataset to the stack. The stack specifier defines the id of the new dataset. If an empty list [] is given then the first available integer id will be used (starting from 1 and incrementing until an unused dataset id is found). Available commands are:
Command | Description |
---|---|
load_arrays | Load NumPy arrays into a dataset |
load_ascii | Load ASCII data by id |
load_data | Load spectrum, table, or ASCII data by id |
load_image | Load image data by id |
load_pha | Load PHA data by id |
The datastack module supports five model definition commands:
Command | Description |
---|---|
set_model | Set a Sherpa model by model id |
set_source | Set a Sherpa model by model id (alias) |
set_bkg_model | Set a bkg model by data id and bkg id |
set_full_model | Same as Sherpa set_full_model |
set_bkg_full_model | Same as Sherpa set_bkg_full_model |
For each of these commands datastack extends the Sherpa model definition language to substitute the dataeset id for any instance the # symbol within the model component name. To support this extended syntax the model definition must be a quoted Python string, unlike the native Sherpa command that allows for a Python expression e.g. set_source(1, xsphabs.gal * powlaw1d.pow1).
Consider this example:
set_source([], "xsphabs.gal * powlaw1d.pow_ID")
Here there is a common model (gal) that applies for all datasets and an independent model pow_ID for each dataset. The above command is essentially equivalent to the following:
set_source(1, "xsphabs.gal * powlaw1d.pow_1")
set_source(2, "xsphabs.gal * powlaw1d.pow_2")
set_source(3, "xsphabs.gal * powlaw1d.pow_3")
Datastack supports the following commands for manipulating model parameters:
Command | Description |
---|---|
get_par | Return a list of model parameters |
set_par | Set initial values for a model parameter |
thaw | Thaw a list of parameters |
freeze | Freeze a list of parameters |
link | Link a parameter with an associated value |
unlink | Unlink a parameter |
The get_par and set_par commands are extended in datastack to accept any attribute of a model component and not just a parameter name. For instance:
set_source([], "gauss1d.gaussID") # create model components "gaussID"
set_par([], "gauss.integrate", False) # disable model integration for stack
set_par([], "gauss.fwhm.min", 2.0) # set min for gauss.fwhm parameter
set_par([], "gauss.pos", 5.0) # set gauss.pos parameter value
pars = get_par([], "gauss.pos") # get a list of parameter objects
parvals = get_par([], "gauss.pos.val") # get the position values
The get_par command returns a list of the corresponding attribute values. The set_par command only accepts a single value and applies this to all elements in the stack. To set a parameter value to a corresponding list use code like the following:
parname = "gauss.pos"
parvals = [1.2, 3.0, 5.6]
for i in get_stack_ids():
set_par([i], parname, parvals[i])
The link command has a different behavior in the datastack context. It only accepts a single parameter name and links the first stack dataset with all subsequent ones. Assume that the current data stack has 3 datasets with ids 1, 2, and 3. Then both of the commands:
link([1,2,3], "gauss.fwhm")
link([], "gauss.fwhm")
are equivalent to:
link("gauss2.fwhm", "gauss1.fwhm")
link("gauss3.fwhm", "gauss1.fwhm")
The ability within the native Sherpa link command to link a parameter to an arbitrary model expression is not currently supported within the datastack module.
Plotting of data stacks is supported by opening a new plot window for each dataset in the stack and then running the native Sherpa command accordingly. The plot window will be identified by the dataset id.
The following plotting commands are supported. There is no datastack support for most ChIPS commands.
Command | Description |
---|---|
plot_arf | Plot ancillary response data |
plot_bkg | Plot background counts |
plot_bkg_chisqr | Plot background chi squared contributions |
plot_bkg_delchi | Plot background delta chi |
plot_bkg_fit | Plot background counts with fitted background model |
plot_bkg_fit_delchi | Send background fit and background delta chi plots to the visualizer |
plot_bkg_fit_resid | Send background fit and background residuals plots to the visualizer |
plot_bkg_model | Plot background convolved model |
plot_bkg_ratio | Plot background ratio |
plot_bkg_resid | Plot background residuals |
plot_bkg_source | Plot the unconvolved background model |
plot_chisqr | Send a chi^2 plot to the visualizer |
plot_data | Send a data plot to the visualizer |
plot_delchi | Send a delta chi plot to the visualizer |
plot_energy_flux | Send a energy flux distribution to the visualizer |
plot_fit | Send a fit plot to the visualizer |
plot_fit_delchi | Send fit and delta chi plots to the visualizer |
plot_fit_resid | Send fit and residuals plots to the visualizer |
plot_model | Send a model plot to the visualizer |
plot_order | Plot convolved source model by multiple response order |
plot_psf | Send a PSF plot to the visualizer |
plot_ratio | Send a ratio plot to the visualizer |
plot_resid | Send a residuals plot to the visualizer |
plot_source | Plot unconvolved source model |
The following commands are also available within datastack. These simply apply the native command to each of the stack datasets, passing along any arguments and returning a list corresponding the return value for each dataset.
Command | Description |
---|---|
get_bkg | Return an background PHA dataset by data id and bkg_id |
get_bkg_model | Return the background convolved model by data id and bkg id |
get_source | Return a Sherpa model by model id |
group_adapt | Create and set grouping flags adaptively so that each group contains |
group_adapt_snr | Create and set grouping flags adaptively so that each group contains |
group_bins | Create and set grouping flags by number of bins with equal-widths |
group_counts | Create and set grouping flags using minimum number of counts per bin |
group_snr | Create and set grouping flags so each group has a signal-to-noise |
group_width | Create and set grouping flags by a bin width. |
ignore | Exclusive 1D ignore on interval(s) for all available |
notice | Exclusive 1D notice on interval(s) for all available |
subtract | Subtract background counts |
This list is a small subset of Sherpa commands where stacking could be utilized. Most commands which can take a dataset id as the first argument are likely to function properly within the datastack context. The create_stack_func is expected in a near-term release to create a stacking-enabled version of any appropriate Sherpa command.
The Datastack functionality is largely implemented with a single DataStack class. Users may work with a DataStack object instead of the session-based syntax shown in most of the examples.
The first level up from the pure session-based syntax (where a stack specifier is a Python list) is to explicitly create the datastack as an object and consistently use that object as the stack specifier:
from datastack import *
ds = DataStack()
load_pha(ds, 'data/acis_1_pha3.fits')
load_pha(ds, 'data/acis_2_pha3.fits')
group_counts(ds, 20)
ignore(ds[3,4], 6.0, None) # Ignore for datasets 3, 4
This is illustrated in more detail in examples/partial_object.py.
The next level is to use a fully object-oriented approach:
import datastack
ds = datastack.DataStack()
ds.load_pha('data/acis_1_pha3.fits')
ds.load_pha('data/acis_2_pha3.fits')
ds.group_counts(20)
ds[3,4].ignore(6.0, None) # Ignore for datasets 3, 4
This gets rid of the undesirable from datastack import * line that is needed in the session-based view. This object-oriented approach is what is really happening under the hood. The default session-based version uses these methods with a special internal DATASTACK object that gets created when the module is imported. See examples/object_oriented.py for a working example.
Most people won’t care, but here is the code that is used to wrap every Sherpa user function (i.e. every function in sherpa.astro.ui) for the session-based interface.
In a nutshell, the code first checks if there are any args. If not, then it runs the native Sherpa function. Otherwise check to see if the first one is a stack specifier (a list or a DataStack). If not, then again just run the native Sherpa function. At this point the user is trying to call a stack function, so try calling the like-named method for the appropriate DataStack object. If this fails then raise an exception.
def _sherpa_ui_wrap(func):
def wrap(*args, **kwargs):
wrapfunc = func
if args:
if isinstance(args[0], DataStack):
datastack, args = args[0], args[1:]
elif isinstance(args[0], list):
datastack, args = (DATASTACK[args[0]] if args[0] else DATASTACK), args[1:]
else:
return func(*args, **kwargs) # No stack specifier so use native sherpa func
try:
wrapfunc = getattr(datastack, func.__name__)
except AttributeError:
raise AttributeError(
'{0} is not a stack-enabled function.'.format(func.__name__))
return wrapfunc(*args, **kwargs)
wrap.__name__ = func.__name__
wrap.__doc__ = func.__doc__
return wrap