Last modified: 15 March 2024

URL: https://cxc.cfa.harvard.edu/csc/cli/index.html

Accessing the CSC 2.1 data from the command-line


Introduction

The primary means of accessing CSC 2.1 data - for both columns of data and the data products, such as event files; source images; light curves; or spectra - is the CSCview application. There are several alternatives, available to command-line users or scripts, which are described on the Virtual Observatory Guide page.

The Astronomical Data Query Language (ADQL) is a database-query language similar to SQL that is designed for Astronomical Database access. The ADQL and CSC 2.1 page provides more information on the queries that can be written, using ADQL, to retrieve information from the CSC.

The examples below often show how the queries can be made using both the cURL and Wget tools, although occasionally only one will be used.


ADQL Queries

The following URL

http://cda.cfa.harvard.edu/csccli/getProperties

is used, along with a number of parameters that are described below, to query the CSC 2.1 archive. A query is written in ADQL which can return data from all the source properties, or perform simple calculations such as the number of rows that match a condition, and the data can be returned as an ASCII file in either tab-separated (TSV) or XML format (using the IVOA VOTable Standard). The Examples section provides several basic queries using this service.

The following table lists the parameters available to query the CSC, and is followed by a more-detailed description for each parameter. A second table lists the additional parameters which are used when making a query involving a cross match against another catalog.

Note that parameter names are case sensitive, and are all given as name=value (that is, they all require a value). A successful call will return the results as formatted text, with the format defined by the value of the outputFormat parameter.

Query parameters

Query parameters
Parameter Required Summary
query Yes The query, in ADQL, as a single-lines string.
outputFormat No The format for the output (the default is TSV).
nullAppearance No How are missing values reported (the default is an empty string).
floatFormat No The format for numeric values (the default is to use truncate the output).
coordFormat No Are coordinates reported as sexagesimal (the default) or decimal values.
version No The default is the latest released version of the catalog, which is 2.1, and so does not need to be set unless you are querying an old version of the catalog or want to ensure you always use the 2.1 release.
query=...

The contents of the query which is written using ADQL syntax, which is described in the ADQL and CSC 2.1 page. Certain applications, such as curl, let the query be defined in a file.

outputFormat=...

The default format for the returned data is Tab Separated Values (TSV), using a CSC variant which also encodes metadata about the columns, such as units, description, and display type. The format can be changed to use VOTable TABLEDATA, a XML-based format defined by the IVOA. The format can be changed with the outputFormat parameter, which accepts one of: 'tsv', 'vot', or 'votable'.

nullAppearance=...

The database contains columns which do not have a value for every row (e.g. the flux_aper_b value is not defined for a HRC-I only observation, and is not defined for all sources). The default return for such values is the empty string, but this can be changed with the nullAppearance option.

floatFormat=...

The options for floating-point values are 'default' or 'native'. The native setting uses contains more significant digits, whereas the default setting may truncate the values.

coordFormat=...

The default format used for celestial coordinates (the RA and Declination colunms in a table) is sexagesimal. The options are 'sexagesimal' and 'decimal'.

version=...

The query will use the latest released version of the catalog, so 2.1, unless the version is explicitly given. The available options are the releases 2.1 (rel2.1), 2.0 (rel2.0), 1.1 (rel1.1), and 1.0 (rel1.0.1), or cur to get the "current database" (which is not recommended at this time).

Query parameters for a cross match

The remaining parameters are only for queries that include a cross match against another catalog.

Additional query parameters for a cross match
Parameter Required Summary
UPLOAD Yes The location of the catalog to cross match against the CSC.
RA Yes The source positions (Right Ascension).
DEC Yes The source positions (Declination).
ID No The column name used to identify the cross-match targets.
SR No The search radius (defaults to 3.0).
SR_UNITS No The units for SR (defaults to arcmin).
SIGMA No The uncertainty in the position.
SIGMA_UNITS No The units for SIGMA (defaults to arcsec).
UPLOAD=...

Name,URI (Uniform Resource Identifier) pair pointing to user table. Name is required to be 'user_table'. URI maybe an external URL accessible by the CSC web server or an special URI scheme, 'param:', indicating the table is inline content within the HTTP POST request. Note that TSV and VOTable formats are supported for crossmatch.

The special URI scheme 'param:' is used to upload a user table inline. This scheme indicates that the value after the colon will be the name of the inline content. The content type used is multipart/form-data, using a 'file' type input element. The file 'name' attribute must match that used in the UPLOAD parameter.

RA=...

The column of Right Ascension source position coordinates in decimal degree units.

DEC=...

The column of Declination source position coordinates in decimal degree units.

ID=...

The cross match identification column name in the uploaded user table. If not specified, the row number is used as the user ID. The column can be either an integer value or a string.

SR=...

The search radius in units set by the SR_UNITS parameter, either a single value or name of a column containing multiple values, to apply to each source in the input list of source positions. Defaults to 3.0.

SR_UNITS=...

The units for the SR parameter, which defaults to 'arcmin'. The allowed values are: 'deg', 'arcmin', or 'arcsec'.

SIGMA=...

The 1-sigma radial Gaussian position in SIGMA_UNITS. If specified, a two-sided probability of position match can be calculated. If not specified, a one sided probability is used. Accepts a single value or name of a column containing multiple values, to apply to each source in the input list of source positions.

SIGMA_UNITS=...

The units for the SIGMA parameter, which defaults to 'arcsec'. The allowed values are: 'arcsec' or 'mas' (milli-arcseconds).

When a user supplies a table for crossmatch via the UPLOAD parameter, an association table - called cross_match is created for use in the ADQL query, and relates the user table to the CSC master_source table.

The user table can contain arbitrary columns, and the RA, DEC, and ID parameters are used to define the relevant column names in this table.

The cross_match table contains columns named:

Column name Column description
usrid A key copied from the user_table column identified by the ID parameter
msid A key to join on the master_source.msid column.
separation A floating point column representing the separation of the two sources in arcseconds.
probability A floating point column representing the probability that the two sources are the same.

Data product queries

The CSC data products - such as spectra, light curves, and images - may be accessed at the command line, but only after interactively querying the catalog within CSCview. After submitting a query in the GUI and selecting the desired data products, users are brought to the Products tab where they are given the option of downloading the selected data products via a download script or tar file. The download script contains a list of Wget commands, one for each file, which can be executed on the Unix command line for a batch download. See the CSC thread Retrieving Data Products to learn how to access catalog data products.


ADQL Examples

The ADQL and CSC 2.1 page should be referred to for more information on writing ADQL queries.

[NOTE]
Differences with the IVOA TAP service

Although the IVOA TAP service for CSC also accepts ADQL queries there is subtle difference: the getProperties queries (shown below) do not need to say where the columns are from, whereas users of the TAP service do. That is, getProperties would say

FROM master_source m

whereas the TAP query has to say

FROM csc21.master_source m

Perform a basic source-property query

This example shows how source-property columns can be selected and filtered. The query used in this example returns the the master source name, significance, broad-band aperture energy flux, and power-law model best-fit photon index for the first 50 master sources found with a significance greater than 10.0, a hard-to-soft hardness ratio greater than 0.7, and a pileup fraction less than 10% (by checking that m.pileup_flag is 0):

ADQL
SELECT TOP 50 m.name, m.significance, m.flux_aper_b, m.powlaw_gamma
FROM master_source m
WHERE (m.significance > 10.0 AND m.pileup_flag = 0 AND m.hard_hs > 0.7)

The following calls will return the data in VOTable format (which can be read by the TOPCAT application) and store it in the file out.file:

curl
unix% curl -o out.file \
  --form outputFormat=votable \
  --form query='SELECT TOP 50 m.name, m.significance, m.flux_aper_b, m.powlaw_gamma FROM master_source m WHERE (m.significance > 10.0 AND m.pileup_flag = 0 AND m.hard_hs > 0.7)' \
  http://cda.cfa.harvard.edu/csccli/getProperties
wget
unix% wget -O out.file \
  'http://cda.cfa.harvard.edu/csccli/getProperties?query=SELECT TOP 50 m.name, m.significance, m.flux_aper_b, m.powlaw_gamma FROM master_source m WHERE (m.significance > 10.0 AND m.pileup_flag = 0 AND m.hard_hs > 0.7)&outputFormat=votable' 

Perform a basic cone-search query

Queries can also perform a cone search around one or more locations. This query accesses from the catalog the master source name, R.A and Dec., and broad-band aperture energy flux for all sources located within 10 arcminutes of the location with Right Ascension 83.77333 degrees and Declination -5.68464 degrees (which is just South of the center of the Orion Nebula cluster).

ADQL
SELECT m.name, m.ra, m.dec, m.flux_aper_b
FROM master_source m
WHERE dbo.cone_distance(m.ra, m.dec, 83.77333, -5.68464) <= 10
curl
unix% curl -o out.file \
  --form query='SELECT m.name, m.ra, m.dec, m.flux_aper_b FROM master_source m WHERE dbo.cone_distance(m.ra,m.dec,83.77333,-5.68464)<=10' \
  http://cda.cfa.harvard.edu/csccli/getProperties
wget
unix% wget -O out.file \
  'http://cda.cfa.harvard.edu/csccli/getProperties?query=SELECT m.name, m.ra, m.dec, m.flux_aper_b FROM master_source m WHERE dbo.cone_distance(m.ra,m.dec,83.77333,-5.68464)<=10' 

Add extra columns and a restriction to a cone-search query

This is similar to the basic cone-search query but adds extra columns, uses a slightly different location, and also requires that the sources have a minimum significance. This example is also used in the Change the representatuion of missing values example below.

ADQL
SELECT m.name, m.flux_aper_b, m.flux_aper_w, m.acis_num, m.hrc_num
FROM master_source m
WHERE (dbo.cone_distance(m.ra, m.dec, 83.76562, -5.48538) <= 20 AND m.significance > 3)
curl
unix% curl -o out.file \
  --form query='SELECT m.name,m.flux_aper_b,m.flux_aper_w,m.acis_num,m.hrc_num FROM master_source m WHERE (dbo.cone_distance(m.ra,m.dec,83.76562,-5.48538)<20 AND m.significance > 3)' \
  http://cda.cfa.harvard.edu/csccli/getProperties
wget
unix% wget -O out.file \
  'http://cda.cfa.harvard.edu/csccli/getProperties?query=SELECT m.name,m.flux_aper_b,m.flux_aper_w,m.acis_num,m.hrc_num FROM master_source m WHERE (dbo.cone_distance(m.ra,m.dec,83.76562,-5.48538)<20 AND m.significance > 3)' 

Upload a query from a file

The following query retrieves the first 10 sources, where the sources are ordered by name, from the master source table, returning the following fields: name, location, major axis of the error ellipse, flags determining whether the source is affected by confusion or saturation, the significance, and the broad- and wide-band aperture fluxes (along with lower and upper limits). Note that a master source has either a broad- or wide-band aperture flux, not both, even if it was detected in both ACIS and HRC observations.

ADQL
SELECT DISTINCT top 10
m.name, m.ra, m.dec, m.err_ellipse_r0,
m.conf_flag, m.sat_src_flag, m.significance,
m.flux_aper_b, m.flux_aper_lolim_b, m.flux_aper_hilim_b,
m.flux_aper_w, m.flux_aper_lolim_w, m.flux_aper_hilim_w
FROM master_source m
ORDER BY name ASC

The file should contain a query written in ADQL 2.0, e.g. a query save file output by CSCview. The query above is available as cscquery.adql.

[NOTE]
Note

The query must be written as a single line - so no new-line or carriage-return characters - for the call to succeed.

curl
unix% curl -o out.file \
  --form query=@cscquery.adql \
  http://cda.cfa.harvard.edu/csccli/getProperties

Change the representatuion of missing values

This repeats the query from the Add extra columns and a restriction to a cone-search query but add the nullAppearance=NULL setting, which causes missing values to be written out as NULL instead of an empty field:

ADQL
SELECT m.name, m.flux_aper_b, m.flux_aper_w, m.acis_num, m.hrc_num
FROM master_source m
WHERE (dbo.cone_distance(m.ra, m.dec, 83.76562, -5.48538) <= 20 AND m.significance > 3)
curl
unix% curl -o out.file \
  --form nullAppearance=NULL \
  --form query='SELECT m.name,m.flux_aper_b,m.flux_aper_w,m.acis_num,m.hrc_num FROM master_source m WHERE (dbo.cone_distance(m.ra,m.dec,83.76562,-5.48538)<20 AND m.significance > 3)' \
  http://cda.cfa.harvard.edu/csccli/getProperties
wget
unix% wget -O out.file \
  'http://cda.cfa.harvard.edu/csccli/getProperties?nullAppearance=NULL&query=SELECT m.name,m.flux_aper_b,m.flux_aper_w,m.acis_num,m.hrc_num FROM master_source m WHERE (dbo.cone_distance(m.ra,m.dec,83.76562,-5.48538)<20 AND m.significance > 3)' 

Change the representation of coordinates

The default format for positions is sexagesimal, but this can be changed. The following query return the master source name, Right Ascension, and Declination for the first ten sources with a master source significance greater than 10.0, a hard-to-soft hardness ratio greater than 0.7, and a pileup fraction less than ~10% (by checking for pileup_flag being set to 0).

ADQL
SELECT TOP 10 m.name, m.ra, m.dec
FROM master_source m
WHERE (m.significance > 10.0 AND m.pileup_flag = 0 AND m.hard_hs > 0.7)

By changing the coordFormat parameter, the location values are returned as decimal degrees.

curl
unix% curl -o out.file \
  --form coordFormat=decimal \
  --form 'query=SELECT TOP 10 m.name, m.ra, m.dec FROM master_source m WHERE (m.significance > 10.0 AND m.pileup_flag = 0 AND m.hard_hs > 0.7)' \
  http://cda.cfa.harvard.edu/csccli/getProperties
wget
unix% wget -O out.file \
  'http://cda.cfa.harvard.edu/csccli/getProperties?query=SELECT TOP 10 m.name, m.ra, m.dec FROM master_source m WHERE (m.significance > 10.0 AND m.pileup_flag = 0 AND m.hard_hs > 0.7)&coordFormat=decimal'

Change the precision of numeric values

This query accesses from the catalog many observation-specific properties for the first 10 sources found in the Source Observations Table, with source property values output in native floating point format (the "native" catalog format includes numbers with many significant digits, exactly as they are output by the catalog processing pipeline). The SELECT DISTINCT option is used here because otherwise there would be repeated rows.

ADQL
SELECT DISTINCT top 10
o.obsid, o.obi, o.targname,
o.ra_targ, o.dec_targ, o.ra_pnt, o.dec_pnt, o.roll_pnt,
o.ra_nom, o.dec_nom, o.roll_nom,
o.gti_start, o.gti_stop, o.gti_elapse, o.gti_end,
o.gti_obs, o.gti_mjd_obs, o.mjd_ref,
o.instrument, o.grating, o.datamode, o.readmode, o.cycle,
o.exptime, o.timing_mode, o.ascdsver, o.caldbver, o.crdate, o.ao 
FROM observation_source o
curl
unix% curl -o out.file \
  --form floatFormat=native \
  --form 'query=SELECT DISTINCT top 10 o.obsid,o.obi,o.targname,o.ra_targ,o.dec_targ,o.ra_pnt,o.dec_pnt,o.roll_pnt,o.ra_nom,o.dec_nom,o.roll_nom,o.gti_start,o.gti_stop,o.gti_elapse,o.gti_end,o.gti_obs,o.gti_mjd_obs,o.mjd_ref,o.instrument,o.grating,o.datamode,o.readmode,o.cycle,o.exptime,o.timing_mode,o.ascdsver,o.caldbver,o.crdate,o.ao FROM observation_source o' \
  http://cda.cfa.harvard.edu/csccli/getProperties
wget
unix% wget -O out.file \
  'http://cda.cfa.harvard.edu/csccli/getProperties?query=SELECT DISTINCT top 10 o.obsid,o.obi,o.targname,o.ra_targ,o.dec_targ,o.ra_pnt,o.dec_pnt,o.roll_pnt,o.ra_nom,o.dec_nom,o.roll_nom,o.gti_start,o.gti_stop,o.gti_elapse,o.gti_end,o.gti_obs,o.gti_mjd_obs,o.mjd_ref,o.instrument,o.grating,o.datamode,o.readmode,o.cycle,o.exptime,o.timing_mode,o.ascdsver,o.caldbver,o.crdate,o.ao FROM observation_source o&floatFormat=native'

Cross-match the CSC with another catalog

This query accesses from the catalog the separation in arcseconds between the matched catalog master sources and the corresponding sources in a user-input table, along with the names and RA & Dec. positions associated with the catalog sources in the CSC Master Sources Table. The probability value of each source match is also returned, where a value of 1.0 indicates that the returned match is very likely a true match (down to many siginificant digits in source position), and a value of 0.0 represents a very unlikely true match. The my_catalog_id column values in the input table is returned in the c.usrid output column for the matching sources.

ADQL
SELECT
m.name, m.ra, m.dec,
c.usrid, c.separation, c.probability
from cross_match c, master_source m
where c.msid = m.msid
curl
unix% curl \
  --request POST \
  --form 'query=select m.name,m.ra,m.dec,c.usrid,c.separation,c.probability from cross_match c, master_source m where c.msid = m.msid' \
  --form RA=ra \
  --form DEC=dec \
  --form ID=my_catalog_id \
  --form UPLOAD=user_table,param:my_catalog \
  --form my_catalog="@/data/somewhere/table.xml" \
  http://cda.cfa.harvard.edu/csccli/getProperties