Synopsis
Sherpa extension package for modeling stacks of data.
Syntax
from sherpa.astro import datastack
Description
Datastack is a Sherpa extension package for manipulating a stack of related datasets and simultaneously fitting a model to them. It provides stack-enabled (i.e. vectorized) versions of the key Sherpa commands used to load data, set source models, get and set parameters, fit, and plot. The current implementation is based on Tom Aldcroft's contrib package (https://cxc.harvard.edu/contrib/datastack/) wrapping Sherpa functions. The addition of the CIAO DM stack syntax allows reading and filtering the stacks of data using standard DM filtering syntax.
For X-ray spectral analysis, a datastack means that a group of related datasets, for instance 10 observations of the same source taken at different times, can be analyzed as if they were a single merged dataset. This has the important advantage of using the appropriate individual response functions while hiding the complexity (or tedium) of defining models and parameters for all the datasets.
Loading the Routines
This package needs to be imported to a Sherpa session, for example:
from sherpa.astro import datastack
Datastack File Input
The file stack syntax (e.g. '@file.lis') is available for Sherpa load_* commands. The details of the DM stack concept can be found in the "stack" ahelp file.
The following load_* functions support file stack syntax:
- load_pha
- load_ascii
- load_data
- load_bkg
The datastack-specific load_* calls are slightly different than the default Sherpa calls, as they do not take a data ID. The "@" character entered with the filename signals a stack file and all the files listed in the "@file.lis" are added to the datastack. Individual files can be added to the datastack later. You can also use multiple datastack identifiers, providing a way to treat several datastacks in one Sherpa session.
sherpa> from sherpa.astro.datastack import * sherpa> load_data("@file.lis") read ARF file obs1/q9407.warf read RMF file obs1/q9407.wrmf read ARF (background) file obs1/q9407_bkg.warf read RMF (background) file obs1/q9407_bkg.wrmf read background file obs1/q9407_bkg.pi read ARF file obs2/q9708.warf read RMF file obs2/q9708.wrmf
After importing the datastack package, the Sherpa load_data command loads the PHA files listed in the file.lis stack, as well as the associated response files pointed to by the headers of those PHA files.
sherpa> show_stack() 1: obs1/q9407.pi OBS_ID: 9407 MJD_OBS: 54437.6143274 2: obs2/q9708.pi OBS_ID: 9708 MJD_OBS: 54445.5404309 unix% more file.lis obs1/q9407.pi obs2/q9708.pi
Note that after importing datastack with "from sherpa.astro.datastack import *", datastacks are used by default to manage Sherpa data IDs. New files can either be added to the existing datastack, or input as a separate file. The default datastack is referred to as [].
The following syntax is allowed:
- load_pha("@pha.lis") # the files listed in "pha.lis" are added to the default datastack
- load_pha("myid","file.pi") # the file is not added to the datastack, it is assigned an id="myid"
- load_pha("file.pi") # in this case the file is added to the datastack
- load_pha([],"file.pi") # the file is added to the default datastack
- ds = DataStack() # create a new datastack
- load_pha(ds,"@pha.lis") # adds files in pha.lis to the datastack called "ds"
- load_pha(ds,"file.pi") # adds file.pi to the ds datastack
An example of an unsupported syntax is:
- load_pha("myid","@pha.lis") # this will result in an error, as the file stack syntax is not accepted when setting the ID
load_arrays
The datastack version of load_arrays has a slightly different interface than the one in standard Sherpa. In standard Sherpa, load_arrays must always get an ID as the first argument.
load_arrays accepts an array of arrays as the first argument when no datastack identifier is provided, as in the example below:
x1 = ... an array ... y1 = ... an array ... x2 = ... an array ... y2 = ... an array ... x3 = ... an array ... y3 = ... an array ... sherpa> load_arrays([[x1,y1], [x2,y2], [x3,y3]])
Datastack Identifier
By default, the datastack is identified by '[]'. In general, it can also be:
- a datastack instance reference
- an array (any iterable) of dataset IDs
- a datastack instance subset
For example:
sherpa> ds = DataStack()
'ds' is both a datastack instance and identifier, and another valid identifier is:
sherpa> ds[1,2]
meaning ds[1,2] is an identifier for datasets 1 and 2 from the datastack 'ds'. If picking an identifier from the default datastack, the syntax:
sherpa> [3,4]
may be used, where [3,4] is the datastack identifier for datasets 3 and 4 from the default datastack.
Datastack Functions
The following functions are compatible with datastacks:
- clear_models
- clear_stack
- conf
- fit
- fit_bkg
- freeze
- get_arf
- get_bkg
- get_bkg_arf
- get_bkg_model
- get_bkg_rmf
- get_bkg_scale
- get_model
- get_par
- get_response
- get_rmf
- get_source
- get_stack_ids
- group_adapt
- group_adapt_snr
- group_bins
- group_counts
- group_snr
- group_width
- ids
- ignore
- link
- load_arf
- load_arrays
- load_ascii
- load_bkg
- load_bkg_arf
- load_bkg_rmf
- load_data
- load_filter
- load_grouping
- load_pha
- load_rmf
- notice
- plot_arf
- plot_bkg
- plot_bkg_chisqr
- plot_bkg_delchi
- plot_bkg_fit
- plot_bkg_fit_delchi
- plot_bkg_fit_resid
- plot_bkg_model
- plot_bkg_ratio
- plot_bkg_resid
- plot_bkg_source
- plot_chisqr
- plot_data
- plot_delchi
- plot_fit
- plot_fit_delchi
- plot_fit_resid
- plot_model
- plot_order
- plot_psf
- plot_ratio
- plot_resid
- plot_savefig
- plot_set_xscale
- plot_set_yscale
- plot_source
- plot_title
- plot_xlabel
- plot_xlim
- plot_ylabel
- plot_ylim
- query
- query_by_header_keyword
- query_by_obsid
- savefig
- set_bkg_full_model
- set_bkg_model
- set_full_model
- set_model
- set_par
- set_source
- show_stack
- subtract
- thaw
- ungroup
- unlink
- unsubtract
In order for the function to operate on a datastack, the datastack identifier has to be included as the first argument of that function.
Setting Models for a Datastack
The following model function can be used for assigning a model to a datastack:
- set_source
- set_model
- set_bkg_model
- set_full_model
- set_bkg_full_model
They can be used to assign a model to all the datasets in a datastack or to a dataset subset with the data indicator '__ID'.
Examples
Example 1
sherpa> from sherpa.astro.datastack import * sherpa> load_data("@file.lis")
Import the datastack package into Sherpa and use the load_data command to load the PHA files listed in file.lis into the default datastack instance.
Example 2
sherpa> set_source([],'xsphabs.abs1*xsapec.kt1')
Here the same model is assigned to both datasets in the default datastack, identified by '[]'.
sherpa> show_source() Model: 1 (xsphabs.abs1 * xsapec.kt1) Param Type Value Min Max Units ----- ---- ----- --- --- ----- abs1.nH thawed 1 0 100000 10^22 atoms / cm^2 kt1.kT thawed 1 0.008 64 keV kt1.Abundanc frozen 1 0 5 kt1.redshift frozen 0 -0.999 10 kt1.norm thawed 1 0 1e+24 Model: 2 (xsphabs.abs1 * xsapec.kt1) Param Type Value Min Max Units ----- ---- ----- --- --- ----- abs1.nH thawed 1 0 100000 10^22 atoms / cm^2 kt1.kT thawed 1 0.008 64 keV kt1.Abundanc frozen 1 0 5 kt1.redshift frozen 0 -0.999 10 kt1.norm thawed 1 0 1e+24
These data will be simultaneously fit, assuming this model.
Example 3
sherpa> set_source([1,2], "powlaw1d.p__ID") sherpa> set_source([3,4], "brokenpowerlaw.bp__ID")
These commands set different models for different subsets of the default datastack.
sherpa> show_source() Model: 1 powlaw1d.p1 Param Type Value Min Max Units ----- ---- ----- --- --- ----- p1.gamma thawed 1 -10 10 p1.ref frozen 1 -3.40282e+38 3.40282e+38 p1.ampl thawed 1 0 3.40282e+38 Model: 2 powlaw1d.p2 Param Type Value Min Max Units ----- ---- ----- --- --- ----- p2.gamma thawed 1 -10 10 p2.ref frozen 1 -3.40282e+38 3.40282e+38 p2.ampl thawed 1 0 3.40282e+38 Model: 3 brokenpowerlaw.bp3 Param Type Value Min Max Units ----- ---- ----- --- --- ----- bp3.refer frozen 5000 1.17549e-38 3.40282e+38 angstroms bp3.ampl thawed 1 1.17549e-38 3.40282e+38 angstroms bp3.index1 thawed 0.1 -10 10 bp3.index2 thawed -0.1 -10 10 Model: 4 brokenpowerlaw.bp4 Param Type Value Min Max Units ----- ---- ----- --- --- ----- bp4.refer frozen 5000 1.17549e-38 3.40282e+38 angstroms bp4.ampl thawed 1 1.17549e-38 3.40282e+38 angstroms bp4.index1 thawed 0.1 -10 10 bp4.index2 thawed -0.1 -10 10
The use of the '__ID' data indicator assigns a model to all the datasets in a datastack of subset beginning with the model component name preceding the data indicator. The model parameters are assigned names with the string '__ID' replaced by 1,2,3,... as appropriate, as seen in the above examples.
Example 4
sherpa> freeze([], 'abs1') sherpa> thaw(ds, 'poly.c0') sherpa> thaw([1,3], 'poly.c1')
These are examples of different ways to freeze and thaw parameters using the various datastack instances.
Utility Functions for Datastacks
These functions operate on datstack instances.
- clear_models - remove all models from the datastack
- clear_stack - remove all datasets belonging to a datastack
- show_stack - print all datasets in the stack, with basic metadata
The show_stack function lists the datasets currently in the stack and it displays the contents of the OBS_ID and MJD_OBS cards in each file header, if available.
Querying Datastacks
Datastack instances can be queried to find which datasets in the stack match some given requirements. The two main methods to do so are:
- query_by_header_keyword(keyword, value)
- query(func)
query_by_header_keyword is useful for any datasets that have a 'header' attribute whose elements can be accessed, for example as a Python dictionary (e.g. header['XTENSION']), like a DataPHA.
query(func) is a general approach, and accepts any function that is defined according to the following signature:
func(dataset) returns Boolean
The query function will return a list of all dataset IDs in the stack for which 'func' returns True.
From these two methods, one can derive any number of more specific methods. The output of the query functions can be used as a stack identifier to perform some actions, e.g.:
sherpa> f = query_by_header_keyword('INSTRUME', 'ACIS') sherpa> set_source(f, "powlaw1d.p__ID") or sherpa> set_source(ds[f], "powlaw1d.p__ID")
A third call is implemented by default:
- query_by_obsid(obsid)
This query returns the datasets(s) matching an OBS_ID, by using the OBS_ID keyword in the file header, if available.
Cleaning a Session
The datastack module offers the clean() function, that clears all models, datasets, and invokes Sherpa's clean() function.
Object Oriented Approach
In this approach, the user does not need to "import *", and can just use the objects defined in the datastack module, as shown below:
sherpa> from sherpa.astro import datastack as dsmod sherpa> ds = dsmod.DataStack() sherpa> dsmod.set_template_id("ID") sherpa> ds.load_pha("@pha.lis") sherpa> ds.show_stack() sherpa> f = ds.query_by_header_keyword('INSTRUME', 'ACIS') sherpa> ds[f].set_source("powlaw1d.pID")
Changes in CIAO 4.13
The ungroup and unsubtract commands have been added to the datastack module.
Bugs
See the bugs pages on the Sherpa website for an up-to-date listing of known bugs.
See Also
- data
- copy_data, dataspace1d, dataspace2d, delete_data, fake, get_axes, get_bkg_chisqr_plot, get_bkg_delchi_plot, get_bkg_fit_plot, get_bkg_model_plot, get_bkg_plot, get_bkg_ratio_plot, get_bkg_resid_plot, get_bkg_source_plot, get_counts, get_data, get_data_contour, get_data_contour_prefs, get_data_image, get_data_plot, get_data_plot_prefs, get_dep, get_dims, get_error, get_quality, get_specresp, get_staterror, get_syserror, group, group_adapt, group_adapt_snr, group_bins, group_counts, group_snr, group_width, load_arf, load_arrays, load_ascii, load_bkg, load_bkg_arf, load_bkg_rmf, load_data, load_grouping, load_image, load_multi_arfs, load_multi_rmfs, load_pha, load_quality, load_rmf, load_staterror, load_syserror, load_table, pack_image, pack_pha, pack_table, set_data, set_quality, ungroup, unpack_arf, unpack_arrays, unpack_ascii, unpack_bkg, unpack_data, unpack_image, unpack_pha, unpack_rmf, unpack_table
- filtering
- get_filter, load_filter, set_filter
- info
- get_default_id, list_bkg_ids, list_data_ids, list_response_ids
- modeling
- add_model, add_user_pars, clean, load_table_model, load_template_interpolator, load_template_model, load_user_model, save_model, save_source
- plotting
- plot_data, set_xlinear, set_xlog, set_ylinear, set_ylog
- saving
- save_arrays, save_data, save_delchi, save_error, save_filter, save_grouping, save_image, save_pha, save_quality, save_resid, save_staterror, save_syserror, save_table
- statistics
- load_user_stat
- utilities
- calc_data_sum, calc_data_sum2d, calc_ftest, calc_kcorr, calc_mlr, calc_model_sum2d, calc_source_sum2d, get_rate
- visualization
- contour, contour_data, contour_ratio, histogram1d, histogram2d, image_data, rebin