Modeling and Fitting SED Data in Iris
Iris Threads
Overview
Synopsis:
Astronomers analyze Spectral Energy Distributions (SEDs) by fitting them with models over a wide range in wavelength, from gamma rays to radio waves. The resulting best-fit values for the SED model parameters, and associated confidence limits, are physically meaningful quantities. The Iris SED analysis tool allows for robust modeling and fitting of SED data through association with Sherpa, an extensible, Python-based, multi-waveband modeling and fitting application for astronomers.
Learn how to fit models to SED data in Iris by defining model expressions, setting initial model parameter values and ranges, and choosing an appropriate fit statistic and method.
Last Update: 25 Sep 2011 - updated for Iris 1.0
Contents
- Displaying Fit Options
- Defining Model Expressions
- Setting Model Parameters
- Choosing the Fit Statistic and Optimization Method
- Fitting and Displaying the Model
- Saving the Model
- History
Displaying Fit Options
After one or more SED data segments have been read into Iris, and data display preferences set, the data may be fit with a customized model expression using the Iris Fit feature, accessible in the upper-right corner of the Iris main display. Clicking the Fit button opens a window in which a model expression may be defined and initial model parameter values set, with model amplitudes in units of photon flux density, and model spectral coordinates in Angstroms.
When the Fit window opens, the default model that is provided, powerlaw.c1, is automatically plotted as a red line, and is calculated using the default power-law parameters. At this point, the model has not been fit to the data, therefore it appears offset from the data curve.
The Component section of the Fit window is the place to list the model components which will be used to construct the full model expression for fitting; these components can then be arbitrarily combined to form the full model expression, in the Model Expression field. The Component field is populated with a power-law model component by default, with model identifier "c1". The model component naming convention uses an ordered component list from top to bottom, starting with "c1". Removing various model components from the list will cause remaining components to potentially be renamed according to their position in the list, e.g., "c2", "c3", and so on.
Defining Model Expressions
A list of optical and X-ray Sherpa models is opened in a new window when the "Add" option in the Fit window is selected, from which you can choose a model to be added to the list of model components. For example, to fit an absorbed broken power-law model to an SED of object 3C 273, you could choose "atten" and then "brokenpowerlaw" in the list of models in order to add both model components to the Component section of the Fit window (where you may have deleted the default power-law component which is automatically added to the list upon startup, as was done in this example).
To model 3C 273 emission as a broken power-law using the 'brokenpowerlaw' component, and the ISM absorption using 'atten', we need to define our model expression as the product of these two components. This is done in the Model Expression field, where model components can be added or multiplied together arbitrarily to create a composite model expression. This allows for modeling emission and/or absorption features, as well as the ability to apply a more complex model to the continuum itself. For an expression containing one model component, it is trivial: simply add the name of the model component, e.g., "c1", to the Model Expression field. For a composite model such as an absorbed broken power-law, considered in this example, you would add "c1*c2" to represent the product of the 'atten' and 'brokenpowerlaw' components.
Setting Model Parameters
Initial model parameter values and ranges may be set by selecting a model in the Component list, and then "Edit". This will open a new window containing a list of the parameters for the selected model component, with associated fields for editing.
Entering a value into a model parameter field, and then pressing the "Return" key, will update that model parameter in the Component section of the Fit window. The "Fit" checkbox next to each parameter in the editing window is for specifying whether or not to freeze that parameter during the fit; if checked, the parameter will be allowed to vary. Selecting the button labeled with a blue arrow beneath the 'Fit' checkbox opens a separate window in which parameter units, minimum and maximum values, and links to other parameters may be specified.
The "Fix all" and "Default Fit" options in the main Fit window may be used (while the model editor window remains open) to freeze all model parameters at current values, or thaw all model parameters (even those initially fixed), respectively.
The initial model parameter values are where Sherpa starts the fit in parameter space; the parameter ranges are the bounds of parameter space. It is not possible to fit using parameter values outside the specified parameter ranges.
Choosing the Fit Statistic and Optimization Method
The next step in the preparation for fitting involves choosing a fit statistic and optimization method appropriate for your analysis. Selecting "Fit" in the main Fit window (where model components are listed) launches a new model fitting window (labeled "Fitted") containing the statistic and method options. These are listed below with brief descriptions (refer to the Statistics and Optimization pages of the Sherpa website for a detailed explanation of each of the Sherpa fit statistics and methods available in Iris).
Fit Optimization Methods
- Levenberg-Marquardt - a method to find the best-fit model parameter values, by finding the local minimum in parameter space; the functions to be minimized are nonlinear least-squares functions of the model parameters.
- Nelder-Mead - a method to find the best-fit model parameter values, by finding the local minimum in parameter space; a direct search method is used to continually move "downhill" in parameter space, until settling in a local minimum.
- Monte Carlo - a method using the differential evolution algorithm, to find a global minimum in parameter space; the search is seeded with randomly selected starting points, and can continue the search for the true minimum where the other methods could get stuck in local minima that do not represent the actual best fit. (Most useful for complex models and fits that could explore complicated regions of parameter space.)
Fit Statistics
- least-squares - Sum of the squares of the differences between data and model values.
- Cash - A maximum likelihood function based on Poisson statistics. More suitable for counts data than for fluxes.
- C-stat - A maximum likelihood function similar to Cash, but with a chi-squared-like probability distribution. More suitable for counts data than for fluxes.
- chi2datavar - Chi-squared statistic with variance computed from the data. If measured errors are provided, the variance is taken from these errors; else, the variance is computed from the y-values of the data points. This is the preferred chi-squared option to be used in Iris.
- chi2gehrels - Chi-squared statistic, where the variance is computed with a function from Gehrels et al. Suitable for low counts data (e.g., X-ray data) to correct for bias in using chi-squared.
- chi2modvar - Chi-squared statistic, where the variance is computed from the model values instead of data.
- chi2constvar - Chi-squared statistic, where the variance is set to be a constant value. That constant is the average of the y-values of the data points.
- chi2xspecvar - Chi-squared statistic, where the variance is computed as the X-ray spectral fitting program XSPEC would compute the variance (i.e., where the variance would be less than one, reset it to one). More suitable for low counts data (e.g., X-ray data).
The Iris default fitting method and statistic are "neldermead" and "leastsq", respectively, which represent good choices for a robust, quick, initial fit of a relatively simple model to a data set covering potentially many orders of magnitude in flux and/or wavelength. The fit can also be done with a chi-squared statistic, with various methods for estimating variance, or with either of two maximum likelihood statistics that are useful when the data have low numbers of counts. The fitting method can be changed to "levmar" or "moncar", but note that switching the method is less important than switching the statistic, e.g., from least-squares to chi-squared.
Nelder-Mead is the optimal fitting method to start with because it does not depend on taking derivatives of the model function. As for the fit statistic, least-squares is preferred because it does not use measured errors and thus essentially weights all data points equally (it seeks to minimize the sum of the squares of the differences between data and model values, without taking errors into account). Chi-squared fitting, in contrast, can be decidedly biased towards the data points with the lowest fluxes and smallest measured error bars (when there are data points with measured errors that are orders of magnitude smaller than the error bars on most of the data points, then these points become the biggest contributors to the chi-squared value).
In subsequent fits - e.g., after selecting a different data range to fit, adding in new model components as needed, and so on - the statistic can be switched to a chi-squared option to use measured errors (preferably chi2datavar, which is chi-squared with data variance). Since measured errors are provided, this means the variance is taken directly from the errors provided with the data, when the data file is read in.
Note: When one or multiple SED segments is fit in Iris, any data points with associated zero-value errors are ignored in the fit. This design choice is intended as a safeguard against yielding potentially misleading fit results in the analysis of your SED data. Iris interprets a zero-value error not to mean that the uncertainty on the associated data value is actually zero, but that a measure of the uncertainty is not available for that particular photometric point.
Fitting and Displaying the Model
Before initiating a fit of the defined model to a SED in Iris, the specific subset of the SED data to be included in the fit (if not the entire SED) should be specified using the "Define Range" option in the main Fit window. Selecting this option will prompt you to click on the data display twice, once to define the lower endpoint of the data range to be fit, and once for the upper endpoint.
Having built a model expression and set initial parameter values, chosen an appropriate fit statistic and optimization method for your analysis, and defining the range of data to be fit, you are reaady to initiate the fitting process by selecting "Start" in the Fitted window. When the fit is complete, a red model curve - or a curve displayed in a color of your choosing using Preferences -> Edit Preferences - appears overplotted on the SED data in the Iris main display (beyond the specified data range to be fit; this extraneous portion of the fitted model may be ignored).
When the fit has finished, the model parameter values will appear updated in the Component field of the Fit window, and fit statistics will be displayed in the Fit tab of the Fitted window, including the number of data points used in the fit, the degrees of freedom, and the probability that the fit is consistent with the data (q-value), where applicable (when fitting with the least-squares or Cash statistic, the "reduced statistic" value and q-value are not available, as they cannot be computed).
When fitting with one of the chi-squared statisics, or the chi-squared-like cstat statistic, the Confidence tab returns the requested sigma/percent confidence intervals on the best-fit model parameters, calculated using the statistic and method chosen for the fit. For example, entering "1.6" in the "sigma" field will return th 90% confidence limits on the model parameter values..
If the statistic is left at the default "leastsq", Iris will issue an error stating that least-squares cannot be used with the confidence limit function (this is because the only way calculate to confidence limits is to know the probability distribution for the fit statistic, which is known for the chi-squared and chi-squared-lke cstat statistic, but not least squares).
Also of note is that blank or "NaN" values are returned in the confidence results when a parameter bound is found to lie outside the hard limit boundary for a model parameter (which ought not to be changed by the user, so there is not an option to do so in Iris). This could result from an issue with the signal-to-noise of the data, the applicability of the model to the data, systematic errors in the data, among others things. A parameter hard limit represents either a hard physical limit (e.g., temperature is not allowed to go below zero), a mathematical limit (e.g., prevent a number from going to zero or below, when the log of that number will be taken), or the limit of what a float or double can hold (the fit should not be driven above or below the maximum or minimum values a variable can hold).
You can iterate through the fitting process in Iris as many times as necessary - adding or deleting models from the model expression; changing parameter values or ranges; including or excluding new points from the SED - until you find a satisfactory model that best describes your SED data.
Saving the Model
Once a spectral model is satisfactorily fitted, the custom fitted model parameters can be saved to a local file before exiting the fitting session, by making an appropriate selection in the File menu of the Fit window.
Selecting the File -> Write to text file option saves the spectral model to file in a human-readable text format. The "Write active components" option in this submenu writes only the model components which were used in the fit; example output is shown below:
% more atten_bpl_all_comps.txt File: 3c273.xml Fri Sep 23 15:07:05 EDT 2011 Iris 1.0 TARGNAME: 3C 273 Fit parameters: Final fit statistic: 583481.32067514351 Reduced statistic: 1762.7834461484699 Probability [Q-value]: 0.0 Degrees of freedom: 331.0 Data points: 337 Last function evaluation: 370 Component 1: atten hcol = 1.0E20 heiRatio = 0.57285625 heiiRatio = 0.9311036 Component 2: brokenpowerlaw F refer = 5000.0 angstroms ampl = 0.0082577905 index1 = -0.50560653 index2 = -0.42197618
Selecting File -> Write to file, instead, saves the model to a file in a format (CDB) that can be read back by the tool in a future data analysis session. (CDB is an XML format for reading model files in and out of Iris.)
This customized Iris fitting session may be restored by selecting the File->Read from file menu option within the Iris Fit window.
History
09 Aug 2011 | updated for Beta 2.5 |
25 Sep 2011 | updated for Iris 1.0 |