Last modified: 11 Dec 2023

URL: https://cxc.cfa.harvard.edu/sherpa/threads/fake_pha/

Simulating X-ray Spectral Data (PHA): the fake_pha command

Sherpa Threads (CIAO 4.17 Sherpa)


Overview

Synopsis:

The Sherpa fake_pha command can be used to simulate 1-D PHA data, e.g., for Chandra proposal planning. This thread provides both "quick start" and detailed "step-by-step" examples of the fake_pha functionlity.

If you have some experience with simulating X-ray spectral data in Sherpa, you may wish to skip the introductory material in this thread and jump to one of the more succinct simulation threads tailored to a specific type of analysis, such as Simulating Chandra ACIS-S Spectra with Sherpa.

Last Update: 11 Dec 2023 - updated for CIAO 4.16 and Cycle 26; noted changes to group_counts and the addition of the method parameter to fake_pha that lets users change how the simulated values are created.


Contents


Quick Start to Simulating Data

All that is required to simulate a 1-D PHA data set in Sherpa is the following:

  • a source model expression defined with set_source
  • a .rsp or ARF & RMF instrument response files
  • exposure time for the simulation in seconds

fake_pha will do the rest:

unix% sherpa

# Search available Sherpa models, define a source model expression and assign it an ID such as "1".

sherpa> list_models()
['absorptionedge',
 'absorptiongaussian',
 'absorptionlorentz',
 'absorptionvoigt',
  ...
 'xszvphabs',
 'xszwabs',
 'xszwndabs',
 'xszxipcf']

sherpa> set_source(1, xsphabs.abs1 * xspowerlaw.p1)

# Set initial model parameter values.

sherpa> set_par(abs1.nH, val=0.07, frozen=True)
sherpa> p1.PhoIndex = 0.8

# Fake a PHA data set over the grid defined by the input responses.

sherpa> arfname = 'aciss_aimpt_cy26.arf'
sherpa> rmfname = 'aciss_aimpt_cy26.rmf'
sherpa> fake_pha(1, arfname, rmfname, 50000)

# or

sherpa> fake_pha(1, None, "source.rsp", 50000)


# Filter the simulated data on energy or wavelength and plot.

sherpa> notice(0.3, 7.)
dataset 1: 0.0073:14.9504 -> 0.292:7.008 Energy (keV)

sherpa> plot_data(xlog=True, ylog=True)

# Return information on the faked data set and associated responses.

sherpa> show_data(1)
Data Set: 1
Filter: 0.2920-7.0080 Energy (keV)
Noticed Channels: 21-480
name           = faked
channel        = Int64[1024]
counts         = Float64[1024]
staterror      = None
syserror       = None
bin_lo         = None
bin_hi         = None
grouping       = None
quality        = None
exposure       = 50000
backscal       = None
areascal       = None
grouped        = False
subtracted     = False
units          = energy
rate           = True
plot_fac       = 0
response_ids   = [1]
background_ids = []

RMF Data Set: 1:1
name     = aciss_aimpt_cy26.rmf
energ_lo = Float64[1024]
energ_hi = Float64[1024]
n_grp    = UInt64[1024]
f_chan   = UInt32[1504]
n_chan   = UInt32[1504]
matrix   = Float64[387227]
e_min    = Float64[1024]
e_max    = Float64[1024]
detchans = 1024
offset   = 1
ethresh  = 1e-10

ARF Data Set: 1:1
name     = aciss_aimpt_cy26.arf
energ_lo = Float64[1024]
energ_hi = Float64[1024]
specresp = Float64[1024]
bin_lo   = None
bin_hi   = None
exposure = 8037.3177424371
ethresh  = 1e-10


# Save simulated data.
sherpa> save_pha(1, "simulation1.pha")

This represents the quickest and simplest way to simulate a PHA data set in Sherpa: start CIAO and Sherpa; define a source model expression in Sherpa and assign it a data set ID such as "1" or "sim"; set some initial model parameter values; and then run fake_pha with your chosen exposure time and instrument response file(s).

Figure 1: 1-D PHA Spectrum Simulated with Sherpa

[The data peaks in the 1-2 keV range with a steep fall off at lower energies (a small dip up at the lowest energies), and falls off more-slowly at higher energies.]
[Print media version: The data peaks in the 1-2 keV range with a steep fall off at lower energies (a small dip up at the lowest energies), and falls off more-slowly at higher energies.]

Figure 1: 1-D PHA Spectrum Simulated with Sherpa

1-D PHA data set simulated with the Sherpa fake_pha command, with an absorbed power-law model, 50 ks exposure time, and Chandra calibration response files as input.

Read on to learn the details of all options available for faking PHA data in Sherpa, with step-by-step instructions; or, skip to one of the more-succinct simulation threads tailored to a specific type of analysis:


Step-by-Step: Getting started

Please follow the "Obtaining data used in Sherpa threads" thread.

Introduction to fake_pha

The fake_pha command creates a simulated 1-D PHA data set using a previously defined source model expression and instrument response; Poisson deviates are then added to the modeled data. If a background file is supplied via the 'bkg' argument of fake_pha, then the backscale correction is used, and background is added to the simulated data before Poisson noise. Note that while it is not necessary to supply a PHA data set to the 'id' argument of fake_pha—e.g., a real data set loaded into the session or an empty one created with the dataspace1d command—the 'bkg' argument does require such a PHA data set. All that is necessary for the id argument is a source model ID, i.e. the ID defined with the set_source command in defining the source model expression for the simuation (demonstrated in the section Defining a Source Model Expression without a Data Set).

All Sherpa models can be used in the simulations, as well as models created and implemented by the user. Individual models can be combined into composite models, and parameters can be linked across the model components.

Both source models and instrument responses must be pre-defined. It is possible to use both a redistribution matrix file (RMF) and an ancillary response file (ARF) with fake_pha, or just an RMF. The 'rmf' argument accepts both .rmf and .rsp files.

There are several steps involved in creating simulations:

  1. Loading an appropriate ancillary response file and/or redistribution matrix file.
  2. Defining a source model expression.
  3. (optional) Defining a background by reading a background data file or specifying a background model.
  4. Running the simulation using fake_pha.
  5. Defining the model normalization for the simulated data, if desired.
  6. Writing the simulated data to output files.

This thread illustrates each of these steps with an example. It also illustrates use of the fake_pha command when a source PHA data file has been previously read into the session, as well as how to plot and fit simulated data. The thread concludes with an example script for simulating multiple data sets within a single Sherpa session, as well as one for simulating hot plasma emission.

[NOTE]
If you already have a source PHA data set loaded into the session:

If you already have a source PHA data set loaded into the session, Sherpa automatically loads instrument response and background data files recorded in the header of a PHA data set when the PHA file is loaded using load_pha or load_data. If this is the case, then you may proceed to Using fake_pha with a PHA File. If there is no source PHA data set available, then proceed to the next section of this thread.


Defining a Source Model Expression for the Simulation

The source model expression is defined in the standard way with the set_source command. Before defining the model, one may use the dataspace1d command to create an empty PHA data set with which to associate the model—which will eventually store the faked data values—but this step is no longer necessary on account of enhancements to the fake_pha functionality. One may simply use the set_source command to assign a model to an ID, which will later be interpeted by fake_pha as the ID of the simulated data set (i.e, the ID defined with set_source should be supplied as the fake_pha 'id' value, at which point a data set will be created and associated with that ID).

sherpa> set_source(1, powlaw1d.modelA)
sherpa> modelA.gamma = 2

In this example, we assume a 1-D power-law model for the source.

sherpa> print(get_source())
powlaw1d.modelA
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   modelA.gamma thawed            2          -10           10           
   modelA.ref   frozen            1 -3.40282e+38  3.40282e+38           
   modelA.ampl  thawed            1            0  3.40282e+38           

Note that the commands show_source and show_model will not return the model assigned to data id=1 until it is associated with a data set.


Defining an Instrument Response

The default instrument response files for 1-D simulations of Chandra data include the redistribution matrix (RMF) and the ancillary response (ARF) files in standard FITS format. These files can be created with the specextract script, as described in the "Extract Spectrum and Response Files for" point-like, extended, or multiple source threads. Furthermore, calibration files for simulations can be downloaded from the Chandra Proposal Planning page of the CalDB (Calibration Database) website.

The ARF and RMF files may be loaded into the Sherpa session via the unpack_arf/unpack_rmf and load_arf/load_rmf commands. Here, we choose to use the unpack commands so that the response data can be stored in the variables arf1 and rmf1:

sherpa> arf1 = unpack_arf("dataA_arf.fits")
sherpa> print(arf1) 
name     = dataA_arf.fits
energ_lo = Float64[1180]
energ_hi = Float64[1180]
specresp = Float64[1180]
bin_lo   = None
bin_hi   = None
exposure = None
ethresh  = 1e-10

sherpa> rmf1 = unpack_rmf("dataA_rmf.fits")
sherpa> print(rmf1)
name     = dataA_rmf.fits
energ_lo = Float64[1180]
energ_hi = Float64[1180]
n_grp    = UInt64[1180]
f_chan   = UInt32[1189]
n_chan   = UInt32[1189]
matrix   = Float64[260775]
e_min    = Float64[512]
e_max    = Float64[512]
detchans = 512
offset   = 0
ethresh  = 1e-10

Defining a Background (optional)

Background counts may be included in the simulation by setting the fake_pha 'bkg' parameter with background data previously read into the Sherpa session from a PHA file (TIME and BACKSCALE parameters will have default settings corresponding to the values of the header keywords EXPTIME and BACKSCAL, which may be changed if necessary). The background counts are appropriately scaled; a Poisson draw is taken of the scaled background counts; and then that draw is added to the simulated source counts. (If there are multiple backgrounds, then the average of the backgrounds is added to the simulated source counts). Leaving the 'bkg' parameter empty will generate a spectrum containing only source counts.

Background data and/or models are treated as follows with fake_pha:

  1. If a background model is defined, it is evaluated on the source data grid, and the resulting background amplitudes are added to the source amplitudes (taking into account differences in exposure time and backscale). Faked data are then sampled given the sum. If background data exist, they are not altered. If the source data set was background-subtracted prior to the command fake_pha being issued, it will not be background-subtracted afterwards.

  2. If no background model is defined, and the data are background-subtracted, then the source model is evaluated directly, and the new, faked data are background-subtracted. Note that subsequently issuing an unsubtract command is unwise, because the unsubtracted data will not be integer counts data.

  3. If no background model is defined, and the data are not background-subtracted, then the source model is evaluated directly, and the (properly scaled) background data are added to the faked data.

Reading a background data file

A type I PHA background file is read so that the exposure time and backscale information may be taken into account in the simulations. The exposure time and backscale from the background file is only used for appropriate scaling of the input background data.

sherpa> bkg1 = unpack_pha("dataA_bkg.pha")
sherpa> print(bkg1)
name           = dataA_bkg.pha
channel        = Float64[512]
counts         = Float64[512]
staterror      = None
syserror       = None
bin_lo         = None
bin_hi         = None
grouping       = None
quality        = Int16[512]
exposure       = 108675.6673202161
backscal       = 0.044189453125
areascal       = 1.0
grouped        = False
subtracted     = False
units          = channel
rate           = True
plot_fac       = 0
response_ids   = []
background_ids = []

We have used the unpack_pha command to load the background PHA data set into the Sherpa session, and print to list the details of the data set. Note that we will not use the set_bkg command (or set_bkg_model in the next section) to associate the background with source data ID=1, as is usually done, since technically, data set 1 does not yet exist, only an ID defined with the set_source command. When fake_pha is run with 'id=1', a source data set will be created at that point, associated with ID=1. The set_bkg* commands cannot be used unless the source data set already exists. To work around this fact, we will set up the background component of source data ID=1 as a separate PHA data set, already named bkg1. It will be assigned as the background component of source data set 1 when the fake_pha command is run with the 'bkg' argument set to 'get_data(bkg1)'.

An instrument response does not need to be explicitly set for the background, as long as an ARF and RMF are associated with the source data set ID; this is because the scaled expected counts will be included before the source instrument model is convolved.


Defining a Background Model Expression

If a background model expression is defined, then the background model amplitude will be added to the source amplitudes (taking into account differences in exposure time and backscale). The simulated data is then sampled given the sum. In this instance, the background model is defined with the set_source command, as opposed to the set_bkg_model command, since source data ID=1 is not yet associated with a data set. We set as the background model a combination of a 1-D polynomial and Gaussian model:

sherpa> set_source("bkg1", polynom1d.bkgA + gauss1d.bkgB)
sherpa> bkgA.c0 = 6.4e-5
sherpa> bkgA.c1 = 2.3e-5
sherpa> bkgA.c3 = 1.4e-05
sherpa> bkgB.fwhm = 5.6254
sherpa> bkgB.pos = 0.057
sherpa> bkgB.ampl = 0.003
sherpa> print(get_source("bkg1"))
(polynom1d.bkgA + gauss1d.bkgB)
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   bkgA.c0      thawed      6.4e-05 -3.40282e+38  3.40282e+38           
   bkgA.c1      frozen      2.3e-05 -3.40282e+38  3.40282e+38           
   bkgA.c2      frozen            0 -3.40282e+38  3.40282e+38           
   bkgA.c3      frozen      1.4e-05 -3.40282e+38  3.40282e+38           
   bkgA.c4      frozen            0 -3.40282e+38  3.40282e+38           
   bkgA.c5      frozen            0 -3.40282e+38  3.40282e+38           
   bkgA.c6      frozen            0 -3.40282e+38  3.40282e+38           
   bkgA.c7      frozen            0 -3.40282e+38  3.40282e+38           
   bkgA.c8      frozen            0 -3.40282e+38  3.40282e+38           
   bkgA.offset  frozen            0 -3.40282e+38  3.40282e+38           
   bkgB.fwhm    thawed       5.6254  1.17549e-38  3.40282e+38           
   bkgB.pos     thawed        0.057 -3.40282e+38  3.40282e+38           
   bkgB.ampl    thawed        0.003 -3.40282e+38  3.40282e+38    

Running the Simulation Using fake_pha

Simulating a spectrum means taking the defined model expression, folding it through the instrument response, and applying Poisson noise to the counts predicted by the model. The simulation is run with fake_pha, which has four required arguments: data set ID, ARF, RMF, and exposure time. To include the background component in the simulation and set the backscale and area scale values to those associated with the background PHA data set, we also use the optional 'bkg', 'backscal', and 'areascal' arguments. We choose to simulate a spectrum resulting from a 33 ks exposure.

sherpa> fake_pha(1, arf=arf1, rmf=rmf1, exposure=33483.2, bkg=bkg1, backscal=bkg1.backscal, areascal=bkg1.areascal)

This command associates data set ID "1" with a simulated, ungrouped source data set based on the assumed exposure time, instrument response, source model expression, and background data; Poisson noise is added to the modeled data. If an ARF is not to be included in the instrument response for the simulation, the 'arf' argument may be set to None.

The 'arf' and 'rmf' arguments of the fake_pha command can accept filenames directly; e.g., we could have done the following instead:

sherpa> fake_pha(1, arf="dataA_arf.fits", rmf="dataA_rmf.fits", exposure=33483.2, bkg=bkg1, backscal=bkg1.backscal, areascal=bkg1.areascal)

Furthermore, had we decided to load the ARF and RMF into the Sherpa session in association with a data set using the load_arf and load_rmf commands (or load_pha, as shown in the section Using fake_pha with a PHA File), then we could have set the 'arf' and 'rmf' arguments using a third option, shown below.

sherpa> fake_pha(1, arf=get_arf(), rmf=get_rmf(), exposure=33483.2, bkg=bkg1, backscal=bkg1.backscal, areascal=bkg1.areascal)

For detailed information on the available options for setting the 'arf' and 'rmf' arguments of fake_pha, refer to the fake_pha ahelp file.

We may inspect some basic properties of the new data set with the show_data, get_counts, and calc_data_sum commands:

Data Set: 1
Filter: 0.0133-14.6783 Energy (keV)
Bkg Scale: 0.308102
Noticed Channels: 1-512
name           = faked
channel        = Int64[512]
counts         = Float64[512]
staterror      = None
syserror       = None
bin_lo         = None
bin_hi         = None
grouping       = None
quality        = None
exposure       = 33483.2
backscal       = 0.044189453125
areascal       = 1.0
grouped        = False
subtracted     = False
units          = energy
rate           = True
plot_fac       = 0
response_ids   = [1]
background_ids = [1]

RMF Data Set: 1:1
name     = dataA_rmf.fits
energ_lo = Float64[1180]
energ_hi = Float64[1180]
n_grp    = UInt64[1180]
f_chan   = UInt32[1189]
n_chan   = UInt32[1189]
matrix   = Float64[260775]
e_min    = Float64[512]
e_max    = Float64[512]
detchans = 512
offset   = 0
ethresh  = 1e-10

ARF Data Set: 1:1
name     = dataA_arf.fits
energ_lo = Float64[1180]
energ_hi = Float64[1180]
specresp = Float64[1180]
bin_lo   = None
bin_hi   = None
exposure = None
ethresh  = 1e-10

Background Data Set: 1:1
Filter: 0.0133-14.6783 Energy (keV)
Noticed Channels: 1-512
name           = dataA_bkg.pha
channel        = Float64[512]
counts         = Float64[512]
staterror      = None
syserror       = None
bin_lo         = None
bin_hi         = None
grouping       = None
quality        = Int16[512]
exposure       = 108675.6673202161
backscal       = 0.044189453125
areascal       = 1.0
grouped        = False
subtracted     = False
units          = energy
rate           = True
plot_fac       = 0
response_ids   = [1]
background_ids = []

Background RMF Data Set: 1:1
name     = dataA_rmf.fits
energ_lo = Float64[1180]
energ_hi = Float64[1180]
n_grp    = UInt64[1180]
f_chan   = UInt32[1189]
n_chan   = UInt32[1189]
matrix   = Float64[260775]
e_min    = Float64[512]
e_max    = Float64[512]
detchans = 512
offset   = 0
ethresh  = 1e-10

Background ARF Data Set: 1:1
name     = dataA_arf.fits
energ_lo = Float64[1180]
energ_hi = Float64[1180]
specresp = Float64[1180]
bin_lo   = None
bin_hi   = None
exposure = None
ethresh  = 1e-10


sherpa> sum(get_counts(1))
4919603.0
sherpa> calc_data_sum(id=1)
4919603.0

The latter two commands are equivalent, giving the total number of counts simulated in the data set. get_counts provides an array of the unfiltered counts per channel, and the sum function adds together each element of the array. The calc_data_sum function sums up the counts per array element for data set 1, but includes an option for filtering, which in this case is energy in units of keV. By default, calc_data_sum introduces an energy filter that excludes everything <1.0 keV (i.e. lo=1.0,hi=None), but to include all counts without filtering, we choose (lo=None,hi=None).

Similarly, we may determine the count rate of this data set in units of counts/second by doing the following:

sherpa> calc_data_sum(id=1) / get_exposure(1)
146.92750394227554

The simulated data set is not saved to disk until the command write_pha is given (see the section Writing The Simulated Data To Output Files).


Defining the Model Normalization for the Simulated Data

In order to use simulated data for scientific analysis, the data should be re-normalized to match the flux (or total counts) of the observed source. If the flux of the simulated data is known, the parameter values of the model used to create the simulated data can easily be converted to the appropriate values. This requires first simulating the data with the default model parameters, calculating the resulting simulated flux, and then rescaling the model parameters appropriately. Then, the simulation is run again with the updated model.

Assuming that we have just simulated data using default model parameters in the source model expression, with modelA.ampl=1, we calculate the flux accordingly:

sherpa> reset(modelA)

sherpa> fake_pha(1, arf=arf1, rmf=rmf1, exposure=33483.2, bkg=bkg1, backscal=bkg1.backscal, areascal=bkg1.areascal)

sherpa> calc_energy_flux(lo=2, hi=10)
2.5706384381817513e-09

This provides the unconvolved energy flux of the model, between 2.0-10.0 keV in units of ergs cm-2 s-1.

To find the normalization, we divide a given true source flux of 1.0×10-13 ergs cm-2 s-1, by the calculated flux for the above default source model (2.5706384381817513e-09); the resulting value will replace the current normalization (modelA.ampl) value of 1 photon keV-1 cm-2 s-1:

sherpa> scale = 1.0e-13 / calc_energy_flux(lo=2, hi=10)
sherpa> modelA.ampl = scale

and fake_pha is run again with the updated source model:

sherpa> fake_pha(1, arf=arf1, rmf=rmf1, exposure=33483.2, bkg=bkg1, backscal=bkg1.backscal, areascal=bkg1.areascal)

sherpa> calc_photon_flux(lo=2, hi=10)
1.5512252328055873e-05
sherpa> calc_energy_flux(lo=2, hi=10)
1e-13

In addition to the energy flux, the photon flux between 2.0-10.0 keV (in units of photons cm-2 s-1) is also available, via the calc_photon_flux command.

If the count rate of the source in a given instrument is known in advance (e.g. it can be obtained from PIMMS, or from previous observations), then a simple scaling can be used to adjust the normalization of the source model. For example, say the known count rate is 0.4 cts/sec. From the first fake_pha run (or a repeat run, with modelA.ampl returned to 1.0), we note the non-normalized count rate:

sherpa> modelA.ampl = 1.0
sherpa> fake_pha(1, arf=arf1, rmf=rmf1, exposure=33483.2, bkg=bkg1, backscal=bkg1.backscal, areascal=bkg1.areascal)
sherpa> rate = sum(get_counts(1)) / get_exposure(1)
sherpa> print(rate)
146.88392985138816

sherpa> modelA.ampl=0.4 / rate
sherpa> fake_pha(1, arf=arf1, rmf=rmf1, exposure=get_exposure(1), grouped=False, bkg=get_bkg(1))
sherpa> calc_data_sum(1)
268.0
sherpa> calc_data_sum(id=1) / get_exposure(1)
0.4258852200506523

The data set could then be normalized by scaling the model amplitude with this non-normalized count rate (146.9 cts/sec); this would be achieved by setting the model amplitude value equal to the known count rate divided by the non-normalized count rate.

[TIP]
Simulating Data Without Poisson Noise

For some use cases, the user may wish to exclude the noise that is added to the faked data set created by fake_pha. CIAO 4.16 added the method parameter which makes this easy. First define a routine like norandom:

def norandom(x, rng):
    "Return the input values with no randomization but with rounding"
    return np.rint(x)

and then set the method argument to it:

set_source("faked", powlaw1d.modelA)
fake_pha("faked", arf="arf1.fits", rmf="rmf1.fits", exposure=33483.2, method=norandom)

The "exact" value for each channel is going to be a floating-point number, hence norandom was written to pick the nearest integer value for each bin using the NumPy rint routine. Other approaches are possible, including no rounding (e.g. return x).


Writing the Simulated Data to Output Files

The simulated data are not automatically saved to disk. The user may write the data as a PHA file where the PHA header will contain exposure time and backscale values, as well as paths to the RMF, ARF, and background files.

This is done with the save_pha command:

sherpa> save_pha(1, "dataA_sim1.pha")

This creates a PHA file with columns of channel and counts. This file may then be grouped using dmgroup if necessary.

There is also the option to save the data to a FITS or ASCII table file with the save_arrays command:

sherpa> save_arrays("dataA_sim1.fits", [get_model_plot(1).xlo, get_model_plot(1).y], ascii=False)
sherpa> save_arrays("dataA_sim1.txt", [get_model_plot(1).xlo, get_model_plot(1).y], ascii=True)

Using fake_pha with a PHA File

If the user has previously read data from a PHA file:

  • An instrument response may have been automatically loaded if the response files are properly referenced in the header of the data file.
  • If the data file's BACKFILE keyword supplies a background file, then it will be used in the simulations.
  • Since input PHA files also usually contain the exposure time and backscale keywords pertinent to an observation, this information can be easily called on, especially useful for scripting.
  • The input data set will be overwritten by the simulated data set created by fake_pha.

For example, erase all current Sherpa settings with the clean command (or by quitting the session and starting a new one), and then input a PHA data file:

sherpa> clean()
sherpa> show_all()

sherpa> load_pha("dataA.pha")
systematic errors were found in file 'dataA.pha'
but not used; to use them, re-read with use_errors=True
read ARF file dataA_arf.fits
read RMF file dataA_rmf.fits
read background file dataA_bkg.pha

sherpa> get_exposure(1)
33483.25
sherpa> get_backscal(1)
0.044189453125

Loading this PHA data file resulted in an instrument response being automatically defined (using RMF and ARF files), and a background data file being automatically input. Furthermore, from the get_exposure and get_backscal commands, we see that the exposure time and backscale are readily accessed.

That is, input of this PHA data file has automatically completed two steps involved in creating simulations:

  • Defining an instrument response using a redistribution matrix file and an ancillary response file.
  • Reading a background data file.

The remaining steps are:

The following commands complete the remaining steps listed above:

sherpa> set_source(powlaw1d.modelA)
sherpa> modelA.gamma = 2

sherpa> arf = get_arf()
sherpa> rmf = get_rmf()
sherpa> etime = get_exposure()
sherpa> fake_pha(1, arf=arf, rmf=rmf, exposure=etime, bkg=get_bkg())

sherpa> calc_data_sum(id=1)
4914957.0

sherpa> modelA.ampl = 0.4 / (calc_data_sum(id=1) / get_exposure(1))
sherpa> fake_pha(1, arf=arf, rmf=rmf, exposure=etime, bkg=get_bkg())
sherpa> calc_data_sum(id=1)
14041.0

Note that the fake_pha command overwrites the input data set with the simulated data, and we have further normalized the simulated data set.

The results may be written to a PHA file:

sherpa> save_pha(1, "dataA_sim2.pha")

Plotting and Fitting Simulated Data

The simulated data set may be plotted as any other data set in Sherpa:

sherpa> plot_data()

Figure 2 shows the resulting plot.

Figure 2: The simulated data set

[Plot of simulated data set]
[Print media version: Plot of simulated data set]

Figure 2: The simulated data set

Plot showing the data set simulated with the fake_pha command.

The simulated data set may also be filtered and fit as any other data set in Sherpa. For example, we can filter the the simulated data set to include only data between 0.5 and 8.0 keV and to group to 20 counts per group within these limits:

sherpa> notice(0.5, 8)
dataset 1: 0.013293:14.6783 -> 0.471575:8.00459 Energy (keV)

sherpa> group_counts(1, 20)
dataset 1: 0.471575:8.00459 Energy (keV) (unchanged)

sherpa> plot_data(ylog=True)

Figure 3 shows the resulting plot.

Figure 3: Energy range to be fit

[Plot of filtered simulated data set]
[Print media version: Plot of filtered simulated data set]

Figure 3: Energy range to be fit

The simulated data filtered on the 0.5 - 8.0 keV energy range.

Then, we fit the simulated data set using the same source model expression that was used to create it:

sherpa> subtract()
sherpa> set_method("neldermead") 
sherpa> set_stat("chi2xspecvar") 
sherpa> fit()
Dataset               = 1
Method                = neldermead
Statistic             = chi2xspecvar
Initial fit statistic = 170.717
Final fit statistic   = 167.538 at function evaluation 284
Data points           = 157
Degrees of freedom    = 155
Probability [Q-value] = 0.23227
Reduced statistic     = 1.08089
Change in statistic   = 3.17921
   modelA.gamma   2.01613     
   modelA.ampl    0.0027036   

sherpa> plot_fit(xlog=True, ylog=True)

Figure 4 shows the resulting plot.

Figure 4: Fit to simulated data set

[Plot of fit to filtered simulated data set]
[Print media version: Plot of fit to filtered simulated data set]

Figure 4: Fit to simulated data set

Plot showing the fit to the energy-filtered, simulated data set.

Here, we examine the fit using the Sherpa covariance command:

sherpa> covariance()
Dataset               = 1
Confidence Method     = covariance
Iterative Fit Method  = None
Fitting Method        = neldermead
Statistic             = chi2xspecvar
covariance 1-sigma (68.2689%) bounds:
   Param            Best-Fit  Lower Bound  Upper Bound
   -----            --------  -----------  -----------
   modelA.gamma      2.01613   -0.0163437    0.0163437
   modelA.ampl     0.0027036 -2.91941e-05  2.91941e-05

This quick method of calculating 1σ parameter ranges using covar() may underestimate the errors for a complex parameter space. The slower, but more appropriate, confidence algorithm should be used in such cases.

sherpa> conf()
modelA.ampl lower bound:      -2.91941e-05
modelA.gamma lower bound:       -0.0163437
modelA.ampl upper bound:        2.91941e-05
modelA.gamma upper bound:       0.0163437
Dataset               = 1
Confidence Method     = confidence
Iterative Fit Method  = None
Fitting Method        = neldermead
Statistic             = chi2xspecvar
confidence 1-sigma (68.2689%) bounds:
   Param            Best-Fit  Lower Bound  Upper Bound
   -----            --------  -----------  -----------
   modelA.gamma      2.01613   -0.0163437    0.0163437
   modelA.ampl     0.0027036 -2.91941e-05  2.91941e-05

Simulating Multiple Data Sets

One may simulate multiple data sets within the same Sherpa session; each data set is assigned an individual ID.

For example, we can simulate a second data set as we did the first, above, by defining the instrument response with the ARF and RMF, defining a source model expression, and matching the exposure, backscale, and area scale to the background data set:

sherpa> arf2 = unpack_arf("dataB_arf.fits")
sherpa> rmf2 = unpack_rmf("dataB_rmf.fits")

sherpa> set_source(2, xsphabs.modelB1 * powlaw1d.modelB2)
sherpa> modelB1.nH = 0.025
sherpa> modelB2.gamma = 1.7
sherpa> modelB2.ampl = 1.0

sherpa> bkg2 = unpack_pha("dataB_bkg.pha")
WARNING: systematic errors were not found in file 'dataB_bkg.pha'
statistical errors were found in file 'dataB_bkg.pha' 
but not used; to use them, re-read with use_errors=True

sherpa> fake_pha(2, arf=arf2, rmf=rmf2, exposure=bkg2.exposure, bkg=bkg2, backscal=bkg2.backscal, areascal=bkg2.areascal)

sherpa> calc_data_sum(id=2)
31119015.0

We will now normalize the second simulated data set in the same manner as we did the first: dividing the known count rate, e.g. 0.4, by the non-normalized count rate to obtain a scaling factor, and then re-simulating the data set.

sherpa> modelB2.ampl = 0.4 / (calc_data_sum(id=2) / get_exposure(2))

sherpa> fake_pha(2, arf=arf2,rmf=rmf2,exposure=bkg2.exposure, bkg=bkg2, backscal=bkg2.backscal, areascal=bkg2.areascal)

sherpa> calc_data_sum(id=2)
11616.0

Finally, the second simulated data set can be plotted as follows:

sherpa> plot_data(2)

and written to file:

sherpa> save_pha(2, "dataB_sim1.pha")

sherpa> save_arrays("dataA_sim2.fits", [get_model_plot(2).xlo, get_model_plot(2).y], ascii=False)

sherpa> save_arrays("dataA_sim2.txt", [get_model_plot(2).xlo, get_model_plot(2).y], ascii=True)

To plot both simulated data sets at once:

sherpa> notice(0.5, 8)
dataset 1: 0.471575:8.00459 Energy (keV) (unchanged)
dataset 2: 0:14.9504 -> 0.4964:8.0008 Energy (keV)

sherpa> plot("data", 1, "data", 2, xlog=True, ylog=True)

Figure 5 shows the resulting plot.

Figure 5: Using more than one simulation at a time

[]
[Print media version: ]

Figure 5: Using more than one simulation at a time

Two simulated data sets plotted together in one window with the Sherpa plot command.

Note that only the first dataset is grouped and that there is the ability to apply plot options to all the datasets in the plot call.

To exit Sherpa:

sherpa> quit()

unix%

Using a Sherpa Script to Run Simulations

Multiple simulations with count-rate normalization

The following script does several fake_pha simulations using the count rate for normalization; this is different than the previous examples which all use the flux instead. This script defines the variables fake_time, energy, and cnt_rate, which means it can be included in a more complex script where these variables are used as parameters.

The response files used in the script—aciss_aimpt_cy26.rmf and aciss_aimpt_cy26.arf—can be downloaded from the ACIS Cycle 26 RMFs/ARFs CALDB page.

#!/usr/bin/env python
from sherpa.astro.ui import *

# Create a fake spectrum with a count rate of <cnt_rate> in the energy
# range <energy>, with a total exposure time of <fake_time>.

fake_time = 100000
elo = 0.5
ehi = 2.0
cnt_rate = 0.02

# Define instrument response and source model
#
rmf = unpack_rmf("aciss_aimpt_cy26.rmf")
arf = unpack_arf("aciss_aimpt_cy26.arf")
set_source(1, powlaw1d.pow1)
pow1.gamma = 1.9

# Set up the initial normalization.
#
pow1.ampl.val = 1e-4

# Fake the data so that we can call calc_model_sum.
#
fake_pha(1, arf=arf, rmf=rmf, exposure=fake_time)

# Calculate the correction factor for the normalization to get the
# correct normalization.
#
scale = cnt_rate / (calc_model_sum(elo, ehi) / fake_time)
pow1.ampl.val *= scale

# Fake the data with the correct normalization.
#
fake_pha(1, arf=arf, rmf=rmf, exposure=fake_time)

# Verify correct counts in fake data set and plot data
#
print("Count rate in specified energy range is")
print(calc_data_sum(lo=elo, hi=ehi) / fake_time)
plot_data()

If the script is called multi_sim.py then it can be run within Sherpa:

sherpa> %run -i multi_sim.py
Count rate in specified energy range is
0.02066

Scripts may also be launched from the Unix command line as follows

unix% sherpa multi_sim.py

If a script ends with quit(), then running the script from the command line as follows will write the screen output generated by the script to a file, such as multi_sim.out:

unix% sherpa multi_sim.py >&! multi_sim.out

Simulating Hot Plasma Emission

In this script, emission from a hot plasma is simulated using two different models. Note that the appropriate normalization is calculated within the script (plasma.py). The results of both simulations are then compared:

#!/usr/bin/env python
from sherpa.astro.ui import *

# This script simulates emission from a hot plasma using two different
# models.  The absorption component uses the xswabs model which is
# only used here for simplicity: please use a more-recent model such
# as xsphabs for actual simulations.
#
arf = unpack_arf("dataB_arf.fits")
rmf = unpack_rmf("dataB_rmf.fits")

fake_time = 5000
set_source(1, xswabs.abs1 * xsmekal.m1)

abs1.nH = 0.03
m1.kT = 4.0
m1.norm = 0.025

fake_pha(1, arf=arf, rmf=rmf, exposure=fake_time)

# renormalize to 2-10 keV flux of 1e-13 erg/s/cm**2
modflux1 = calc_energy_flux(id=1, lo=2, hi=10)
obsflux1 = 1.e-13

m1.norm.val *= obsflux1 / modflux1

fake_pha(1, arf=arf, rmf=rmf, exposure=fake_time)
save_pha(1, "sim_meka.pha")

set_source(2, abs1 * xsapec.ap1)

ap1.kT = 4.0
ap1.norm = 0.025

fake_pha(2, arf=arf, rmf=rmf, exposure=fake_time)

# renormalize to the 2-10 keV flux of 1.e-13 erg/s/cm**2
modflux2 = calc_energy_flux(id=2, lo=2, hi=10)
obsflux2 = 1.e-13

ap1.norm.val *= obsflux2 / modflux2

fake_pha(2, arf=arf, rmf=rmf, exposure=fake_time)
save_pha(2, "sim_apec.pha")

exit()

This script can be run from the Unix command line as follows:

unix% sherpa plasma.py >&! plasma.out

The plasma.out file will contain the screen output generated by the commands in the script.


Scripting It

The file fit.py is a Python script which performs the primary commands used above; it can be executed by typing %run fit.py on the Sherpa command line.

The Sherpa script command may be used to save everything typed on the command line in a Sherpa session:

sherpa> script(filename="sherpa.log", clobber=False)

(Note that restoring a Sherpa session from such a file could be problematic since it may include syntax errors, unwanted fitting trials, et cetera.)


History

08 Dec 2008 Created for CIAO/Sherpa 4.1, replacing CIAO/Sherpa 3.x FAKEIT command
29 Apr 2009 new script command is available with CIAO 4.1.2
18 Jan 2010 Updated for CIAO 4.2: simulated data is now ungrouped by default, and the save_arrays and confidence commands are available; examples modified to show that it is not necessary to run dataspace1d to fake a source data set.
13 Jul 2010 updated for CIAO 4.2 Sherpa v2: removal of S-Lang version of thread.
08 Sep 2010 figures moved inline with text
15 Dec 2010 updated with "quick start" section
15 Dec 2011 reviewed for CIAO 4.4: a work-around for a save_pha bug was added; response files used in examples were updated for Chandra proposal cycle 13
14 Mar 2012 updated with links to the CALDB Chandra Proposal Planning page, and the other simulation threads
13 Dec 2012 reviewed for CIAO 4.5: updated introductory information in the 'Define a Background' section
12 Dec 2013 reviewed for CIAO 4.6: updated screen output
15 Apr 2014 simulating without Poisson noise.
06 Feb 2015 updated for CIAO 4.7 and Cycle 17, no content change.
14 Dec 2015 updated for CIAO 4.8 and Cycle 18, no content change.
14 Dec 2016 updated for CIAO 4.9 and Cycle 19, no content change.
23 Apr 2018 updated for CIAO 4.10, revised screen output.
13 Dec 2018 updated for CIAO 4.11 and Cycle 21, no content change.
10 Dec 2019 updated for Matplotlib and minor changes in CIAO 4.12; no significant changes to the content.
22 Dec 2020 Updated for CIAO 4.13: updated to the cycle 23 response; re-created the plots using the new PHA style and took advantage of plotting improvements in CIAO 4.12 and 4.13; added a note about grouping the simulation data before fitting.
15 Mar 2022 updated for CIAO 4.14 and Cycle 24, no content change.
12 Dec 2022 updated for CIAO 4.15 and Cycle 25, no content change.
11 Dec 2023 updated for CIAO 4.16 and Cycle 26; noted changes to group_counts and the addition of the method parameter to fake_pha that lets users change how the simulated values are created.