Gregory-Loredo Variability Probability

Summary

The Gregory-Loredo variability algorithm is one of the tests used to detect time variability in sources. Specifically, it provides the probability that the flux calculated from the source region is not constant throughout the observation. The Gregory-Loredo algorithm determines variability based on an odds ratio that the arrival times of the events within the source region for each science energy band are not uniformly distributed in time. The results of the Gregory-Loredo variability test are recorded in the columns var_prob and var_intra_prob in the Source Observations Table and Master Sources Table, respectively.

The algorithm is insensitive to the shape of the light curve, something that is a known problem with the current implementation of the Kolmogorov-Smirnov (K-S) Test. It also does not overinterpret the data in low count rate sources, requiring a statistically significant deviation from a flat distribution before yielding an odds ratio greater than one.

The addition of the secondary criterion—light curve fractions—results in a reliable variability test, though careful users may want to inspect the light curves of all sources with a nonzero variability index.

Background

For a detailed description of the Gregory-Loredo algorithm, refer to A New Method for the Detection of a Periodic Signal of Unknown Shape and Period (1992, ApJ 398, 146). Although the algorithm was developed for detecting periodic signals, it is a perfectly suitable method for detecting plain variability by forcing the period to the length of the observation.

The implementation of the Gregory-Loredo algorithm for the CSC consists of:

Calculating the Odds Ratio

\(N\) events are binned in histograms of \(m\) equally spaced bins, where \(m\) runs from 2 to \(m_{\mathrm{max}}\). The algorithm is based on the likelihood of obtaining, for each value of \(m\), the observed distribution of binned counts:

\[ n_{1}, n_{2}, n_{3}, \ldots, n_{m} \]

For a given \(m\), there are \(m^{N}\) possible ways of binning the \(N\) events. Out of those, the number of possible ways of getting the observed distribution of events is given by the multiplicity of \(N\) events distributed in \(m\) bins:

\[ \frac{N!}{n_1! n_2! \cdots n_m!} \]

Therefore, the probability of obtaining the observed distribution by pure chance is given by the ratio between this multiplicity and \(m^{N}\). This corresponds to the odds ratio of obtaining the observed distribution vs. obtaining a flat distribution, i.e., non-variability. The inverse of this ratio represents the significance of observed distribution. The odds are summed over all values of \(m\) (weighted in each case by the estimated probability), in order to estimate the total odds that the source is variable. The counts in all the bins are corrected by their GTI exposure and dither frequencies are excluded.

The information provided by the tool includes the \(\log_{10}\) of total odds ratio (\(\mathcal{O}\)), the corresponding probability (\(P\)) of a variable signal, the \(m\) value with the maximum odds ratio and the odds-weighted first moment of \(m\), as well as the characteristic time scales represented by these two values.

A sample output file with odds ratios as a function of \(m\): gl_out.txt.

CSC2 sources are assigned a variability index according to the following thresholds: if \(P > 0.9\), the source is variable; if \(P < 0.5\), the source is not variable. The ambiguous range, \(0.5 < P < 0.9\), is handled by analyzing light curve fractions.

Light Curves and Analyzing Light Curve Fractions

In addition to the odds ratio, the program produces a file with the light curve evaluated at the optimal binning. This light curve is not simply the binned counts, rather, it is a weighted sum (with the weights being the odds ratios) of all light curve binnings ranging from a single bin (i.e., no variability, constant light curve) to \(m_{\mathrm{max}}\) bins, with corrections for the fractional area (i.e., from dithering over chip edges and/or bad pixels and columns) in each bin. Thus, each light curve point contains a count rate that takes into account weighted contributions from all events in the entire light curve. Additionally, the standard deviation (\(\pm 3 \sigma\)) is provided for each point of the light curve.

As mentioned, there is an ambiguous range of probabilities: \(0.5 < P < 0.9\). For this range, a secondary criterion was developed; it is based on the light curve, its average \(\sigma\), and the average count rate.

The program calculates the fractions \(f_{3}\) and \(f_{5}\) of the light curve that are within \(3\sigma\) and \(5\sigma\), respectively, of the average count rate. If \(f_{3} > 0.997\) and \(f_{5}=1.0\) for cases in the ambiguous range, the source is deemed to be nonvariable.

Dither and the Light Curve:

When searching for variability, the algorithm flags the special case of sources with characteristic times that are harmonics of one of the two dither periods (707 s and 1000 s for most ACIS observations). The program normalizes the dither out of the light curve, so that the dither does not affect the search for variability. As for the Kolmogorov-Smirnov and Kuiper's tests, the dither correction is a geometrical area correction, and does not take into account any spatial dependence of the chip responses. For example, if a soft X-ray source dithers from a frontside-illuminated chip to a backside-illuminated chip, the different soft X-ray responses of the two chips could introduce a dither period-dependent modulation of the detected counts. The current catalog procedures do not correct for such a possibility; however, warning flags are set if sources dither across chip edges, and a dither warning flag is set of the variability occurs at the harmonic of a dither frequency.

Additionally, in contrast to the Kolmogorov-Smirnov and Kuiper's tests, the Gregory-Loredo test implicitly assumes that any fractional area corrections are completely uncorrelated with any intrinsic time scales of variability in the light curve. In cases where this assumption is violated, the Gregory-Loredo test might lose sensitivity.

For detailed information on how and why the dither correction is done, refer to the following memos:

Assigning the Variability Index

The program assigns a variability index based on the values of \(\mathcal{O}\), \(P\), \(f_{3}\), and \(f_{5}\) (for definitions, see above):

Variability Index	Condition	Comment
0	\(P \leq 1/2\)	Definitely not variable
1	\(1/2 < P < 2/3\) and \(f_{3} > 0.997\) and \(f_{5} = 1.0\)	Not considered variable
2	\(2/3 \leq P < 0.9\) and \(f_{3} > 0.997\) and \(f_{5} = 1.0\)	Probably not variable
3	\(0.5 \leq P < 0.6\)	May be variable
4	\(0.6 \leq P < 2/3\)	Likely to be variable
5	\(2/3 \leq P < 0.9\)	Considered variable
6	\(0.9 \leq P\) and \(\mathcal{O} < 2.0\)	Definitely variable
7	\(2.0 \leq \mathcal{O} < 4.0\)	Definitely variable
8	\(4.0 \leq \mathcal{O} < 10.0\)	Definitely variable
9	\(10.0 \leq \mathcal{O} < 30.0\)	Definitely variable
10	\(30.0 \leq \mathcal{O}\)	Definitely variable