AGASC-Gaia Cross-match

All files are in the /data/aca/analysis/agasc1p8 directory.

The main data products of the AGASC-Gaia cross-match are the following (files are big, so it might not be a good idea to just click on them in the browser):

Other interesting data products are (see below for more details):

Datasets

This section describes what is done in agasc_gaia.datasets. All this is done before any cross-match with Gaia (even if the issues were found after a cross-match)

We have used a few datasets other than Gaia and AGASC. All these are available through agasc_gaia.datasets:

There is a function to produce expanded AGASC catalog, which includes all columns as the AGASC plus some extra information that could be of use later on:

The initial intention was to use off-the-shelf cross-matches with Gaia, because Gaia DR3 already includes cross-matches with GSC2.3 and with Tycho2-TDSC. The AGASC catalog includes Tycho2 and GSC1.1 IDs, and in principle it should be possible to match these to Tycho2-TDSC and GSC2.3 respectively. This proved to be problematic.

AGASC issues

Repeated entry. The AGASC catalog has one repeated ID (AGASC ID 154534513). We kept the more recent of the two entries.

Potential duplicates (get_potential_duplicates). Looking at cross-matches with Gaia, it became apparent that there might be duplicate stars in the AGASC catalog. This can happen when the union of two catalogs is taken without accounting for stars present in both catalogs with slightly different positions and magnitudes. On the other hand, a number of known binaries/multiples have the same exact position in AGASC, and they could have similar magnitudes. In general it is difficult to know for sure given the complex update history of the catalog.

We added a check for duplicates in the AGASC summary. The criteria for marking duplicates are:

Tycho2 Issues

Tycho2-TDSC IDs. The IDs in Tycho2 and in TDSC are NOT the same, and the TDSC paper did not include a mapping from one ID set to the other. In order to do the cross-match AGASC-Tycho2-Tycho2TDSC-Gaia, we needed to match Tycho2 stars with Tycho2-TDSC stars. This is done in get_tycho2_tdsc, and is based on a radial cross-match using the probabilities in tycho2_tdsc_separation_prob and tycho2_tdsc_mag_diff_prob. This cross-match is not relevant for the final result, because we ended up doing a cross-match between AGASC and Gaia directly.

Missing Tycho2-TDSC IDs. A few stars were removed from the Tycho2-TDSC cross-match:

Wrong proper motion. When we did the direct AGASC-Gaia cross-match, we noticed that some AGASC stars had no Gaia counterpart. In a number of these cases, it could be attributed to wrong proper motions in Tycho2. For this reason, we decided to not use the RAmdeg and DEmdeg fields but use RA(ICRS) and DE(ICRS) instead. RAmdeg and DEmdeg are the ICRS position at epoch 2000, and therefore can be affected by wrong proper motions.

The positions of Tycho2 stars in the AGASC are given by RAmdeg and DEmdeg, so these positions are changed when producing the AGASC summary.

GSC Issues

Issues with the GSC catalogs are irrelevant in the end, because we ended up doing a cross-match between AGASC and Gaia DR3 directly, without using GSC.

The first issue was with GSC catalogs. We noticed that, after matching AGASC-GSC1.1-GSC2.3, there were some stars with large differences in AGASC and GSC2.3 positions. These matches were removed from the downstream dataset.

We still do not know the origin of the discrepancy.

Cross-match

Algorithms related to the cross-matches are kept in agasc_gaia.gaia_queries (functions to query Gaia) and agasc_gaia.cross_match (functions that take the results from Gaia queries and produce a list of matches).

We did two separate cross-matches between AGASC and Gaia:

The final result uses the direct cross-match, but we derived insights from comparing the two.

Direct cross-match

The cross-match happens in two steps:

NOTE: pm_ra in the condition for high-PM above should have been pm_ra*cos(dec), but using pm_ra is conservative and faster.

How to determine pmatch

This can be done in an iterative process.

First Iteration

The first iteration was the indirect AGASC-GSC-Tycho-Gaia cross-match. This produce a set of matches that are close. The vast majority of matches are clear.

Second Iteration

The second iteration was the result of looking at magnitude outliers (which prompted us to do the direct cross-match). We used a reasonable guess for the probability distribution (cross_match.agasc_gaia_match_probability_prelim). This used a Gaussian distribution in both magnitude and position.

Third Iteration

The third iteration was the result of comparing the two cross-matching algorithms. The difficult cases were two stars in Gaia with a single AGASC star in between, which actually corresponds to a blend of the two Gaia stars. What to do in these cases was another question: Should we not update it or should we match it to one of the Gaia stars? The decision was to match it to one of the stars, and in this case we need a better estimate of probability. The matching probability in this case has fatter tails to include these.

NOTE: An alternative would have been to use a Gaussian distribution with a width given by a combination of the uncertainties in AGASC and Gaia. I think what we have is good enough.

How to determine the cut on pmatch

To determine the cut on pmatch we calculated the p-value of each match pair (see notes/04-Understanding-p-value.ipynb). Roughly speaking, the p-value is the probability that we find a star with a more extreme pmatch, assuming the match probability distribution is correct.

If the match probability used to define p-value describes reality correctly, the distribution of p-value should be a uniform distribution in (0,1).

We chose the cut value to exclude a spike that occurs around 0. This spike can be caused by:

Difficult Stars

"Difficult" stars are AGASC stars that can be matched to the same Gaia star as other AGASC star(s). These stars are grouped with all the stars it could be confused with. The matches in each group are recomputed to guarantee there are no two repeated AGASC or Gaia IDs. The process is to:

  1. Select the match with the highest probability.
  2. Remove the corresponding AGASC and Gaia IDs from the candidate matches.
  3. Repeat until there are no candidate matches left.

For each of these groups, we defined "latest_pos_cat" as the POS_CATID value with the highest precedence from the AGASC entries in the group. The precedence is, in decreasing order: 5, 6, 4, 3, 2, 1. The catalog precedence IS NOT considered when recomputing the matches.

The entries with POS_CATID different than latest_pos_cat could be considered duplicates, although this is not guaranteed to be the case.

For example, AGASC 102499594 and 102499593 are two stars in Tycho2 that form a binary system (?). AGASC 102499594 is a star in GSC2.3 that lies within 0.5 arcsec from them, and has a magnitude consistent with it being a blend of the other two. Based on the catalog precedence mentioned above, 102499594 appears to be a duplicate. Gaia IDs 3151414218873077760 and 3151414218874713600 are two resolved stars in Gaia that are matched to AGASC 102499594 and 102499593 respectively. AGASC 102499594 is matched to the nearest star other than these, which happens to be 13 arcsec away, and this match is marked as background based on p-value.

ACA Magnitude Model

The ACA magnitude model fit in agasc_gaia.gaia_model.get_gaia_model. The model is implemented as a two classes:

The reason for separating the missing-value-filling and the magnitude model is that they can be used on different datasets: To estimate the missing Gaia values, one does not need the star to be observed by ACA. We first fit the missing value filler, using the training sample from the minimum bias Gaia dataset. Then we fit the magnitude model using the observed stars training set.

The ad-hoc model

A preliminary model selection was done in notes/05-agasc-gaia-model-select-1.ipynb, where we compared several models, including simple linear models, a random forest and an ad-hoc model. The ad-hoc model was chosen.

The main features of the ad-hoc model are:

The instrument response factor is 1 for magnitudes below a given threshold, and is monotonically increasing above the threshold. It is implemented as a cubic spline interpolation between linear functions below/above the magnitude threshold in gaia_model.Broken.

The Simple Color Model

After some discussion, the magnitude model was simplified to guarantee scaling. That is, considering magACA as a function of Gaia magnitudes, the model should satisfy: \[\text{mag}_{ACA}(\text{mag}_{G}+\delta, \text{mag}_{Rp}+\delta, \text{mag}_{Bp}+\delta) = \text{mag}_{ACA}(\text{mag}_{G}, \text{mag}_{Rp}, \text{mag}_{Bp}) + \delta\].

This model:

A comparison between the ad-hoc model and the simple color model is in notes/05-agasc-gaia-model-select-2.ipynb.

Even though the ad-hoc model performs better for bright stars, the simple color model is chosen because it guarantees the scaling expected from physical principles. The largest deviation in ACA magnitude is \(\approx 0.05\) and is sufficient for operational purposes.

Variance Model

To estimate the uncertainty in the magnitude model, we implemented a variance model with several additive contributions to the variance:

The final uncertainty is then given by \[ \sigma = \sqrt{\text{var}_{base} + \text{var}_{instrument} + \text{var}_{variable} + \text{var}_{missing}} \]

Outlier Analysis

In order to do outlier analysis, we produced a few reports. Unless otherwise stated, the reports were produced in the 06-cross_match_comparison_direct-indirect.ipynb notebook: NOTE: Here is a Legend detailing the meaning of the various fields in the reports.

ASPQ1 Update

ASPQ1 is a short integer spoiler code using in star selection. It is an estimate, in 50milliarcsec units, of the worst centroid offset caused by any star within 80arcsec. The values over a grid of brightness difference dm, and radial positional separation dr are calculated once and stored in data/offset_lookup_1p8rc11.h5. The values for each star are found by interpolating the stored grid values. A comparison of offsets in 1p7 and 1p8 is in 07.1-agasc-update-aspq.ipynb.

AGASC Update

Only stars with CLASS 0, 2 or 6 are updated. These correspond to star, blend or member of incorrectly resolved blend and known multiple system.

Based on this cross-match, the AGASC catalog will be updated in the following fields. The fields are updated only for the stars that have updated magnitude, unless otherwise stated: