Accessing the CSC 2.1 data from the command-line
Introduction
The primary means of accessing CSC 2.1 data - for both columns of data and the data products, such as event files; source images; light curves; or spectra - is the CSCview application. There are several alternatives, available to command-line users or scripts, which are described on the Virtual Observatory Guide page.
The Astronomical Data Query Language (ADQL) is a database-query language similar to SQL that is designed for Astronomical Database access. The ADQL and CSC 2.1 page provides more information on the queries that can be written, using ADQL, to retrieve information from the CSC.
The examples below often show how the queries can be made using both the cURL and Wget tools, although occasionally only one will be used.
ADQL Queries
The following URL
http://cda.cfa.harvard.edu/csccli/getProperties
is used, along with a number of parameters that are described below, to query the CSC 2.1 archive. A query is written in ADQL which can return data from all the source properties, or perform simple calculations such as the number of rows that match a condition, and the data can be returned as an ASCII file in either tab-separated (TSV) or XML format (using the IVOA VOTable Standard). The Examples section provides several basic queries using this service.
The following table lists the parameters available to query the CSC, and is followed by a more-detailed description for each parameter. A second table lists the additional parameters which are used when making a query involving a cross match against another catalog.
Note that parameter names are case sensitive, and are all given as name=value (that is, they all require a value). A successful call will return the results as formatted text, with the format defined by the value of the outputFormat parameter.
Query parameters
Parameter | Required | Summary |
---|---|---|
query | Yes | The query, in ADQL, as a single-lines string. |
outputFormat | No | The format for the output (the default is TSV). |
nullAppearance | No | How are missing values reported (the default is an empty string). |
floatFormat | No | The format for numeric values (the default is to use truncate the output). |
coordFormat | No | Are coordinates reported as sexagesimal (the default) or decimal values. |
version | No | The default is the latest released version of the catalog, which is 2.1, and so does not need to be set unless you are querying an old version of the catalog or want to ensure you always use the 2.1 release. |
- query=...
-
The contents of the query which is written using ADQL syntax, which is described in the ADQL and CSC 2.1 page. Certain applications, such as curl, let the query be defined in a file.
- outputFormat=...
-
The default format for the returned data is Tab Separated Values (TSV), using a CSC variant which also encodes metadata about the columns, such as units, description, and display type. The format can be changed to use VOTable TABLEDATA, a XML-based format defined by the IVOA. The format can be changed with the outputFormat parameter, which accepts one of: 'tsv', 'vot', or 'votable'.
- nullAppearance=...
-
The database contains columns which do not have a value for every row (e.g. the flux_aper_b value is not defined for a HRC-I only observation, and is not defined for all sources). The default return for such values is the empty string, but this can be changed with the nullAppearance option.
- floatFormat=...
-
The options for floating-point values are 'default' or 'native'. The native setting uses contains more significant digits, whereas the default setting may truncate the values.
- coordFormat=...
-
The default format used for celestial coordinates (the RA and Declination colunms in a table) is sexagesimal. The options are 'sexagesimal' and 'decimal'.
- version=...
-
The query will use the latest released version of the catalog, so 2.1, unless the version is explicitly given. The available options are the releases 2.1 (rel2.1), 2.0 (rel2.0), 1.1 (rel1.1), and 1.0 (rel1.0.1), or cur to get the "current database" (which is not recommended at this time).
Query parameters for a cross match
The remaining parameters are only for queries that include a cross match against another catalog.
Parameter | Required | Summary |
---|---|---|
UPLOAD | Yes | The location of the catalog to cross match against the CSC. |
RA | Yes | The source positions (Right Ascension). |
DEC | Yes | The source positions (Declination). |
ID | No | The column name used to identify the cross-match targets. |
SR | No | The search radius (defaults to 3.0). |
SR_UNITS | No | The units for SR (defaults to arcmin). |
SIGMA | No | The uncertainty in the position. |
SIGMA_UNITS | No | The units for SIGMA (defaults to arcsec). |
- UPLOAD=...
-
Name,URI (Uniform Resource Identifier) pair pointing to user table. Name is required to be 'user_table'. URI maybe an external URL accessible by the CSC web server or an special URI scheme, 'param:', indicating the table is inline content within the HTTP POST request. Note that TSV and VOTable formats are supported for crossmatch.
The special URI scheme 'param:' is used to upload a user table inline. This scheme indicates that the value after the colon will be the name of the inline content. The content type used is multipart/form-data, using a 'file' type input element. The file 'name' attribute must match that used in the UPLOAD parameter.
- RA=...
-
The column of Right Ascension source position coordinates in decimal degree units.
- DEC=...
-
The column of Declination source position coordinates in decimal degree units.
- ID=...
-
The cross match identification column name in the uploaded user table. If not specified, the row number is used as the user ID. The column can be either an integer value or a string.
- SR=...
-
The search radius in units set by the SR_UNITS parameter, either a single value or name of a column containing multiple values, to apply to each source in the input list of source positions. Defaults to 3.0.
- SR_UNITS=...
-
The units for the SR parameter, which defaults to 'arcmin'. The allowed values are: 'deg', 'arcmin', or 'arcsec'.
- SIGMA=...
-
The 1-sigma radial Gaussian position in SIGMA_UNITS. If specified, a two-sided probability of position match can be calculated. If not specified, a one sided probability is used. Accepts a single value or name of a column containing multiple values, to apply to each source in the input list of source positions.
- SIGMA_UNITS=...
-
The units for the SIGMA parameter, which defaults to 'arcsec'. The allowed values are: 'arcsec' or 'mas' (milli-arcseconds).
When a user supplies a table for crossmatch via the UPLOAD parameter, an association table - called cross_match is created for use in the ADQL query, and relates the user table to the CSC master_source table.
The user table can contain arbitrary columns, and the RA, DEC, and ID parameters are used to define the relevant column names in this table.
The cross_match table contains columns named:
Column name | Column description |
---|---|
usrid | A key copied from the user_table column identified by the ID parameter |
msid | A key to join on the master_source.msid column. |
separation | A floating point column representing the separation of the two sources in arcseconds. |
probability | A floating point column representing the probability that the two sources are the same. |
Data product queries
The CSC data products - such as spectra, light curves, and images - may be accessed at the command line, but only after interactively querying the catalog within CSCview. After submitting a query in the GUI and selecting the desired data products, users are brought to the Products tab where they are given the option of downloading the selected data products via a download script or tar file. The download script contains a list of Wget commands, one for each file, which can be executed on the Unix command line for a batch download. See the CSC thread Retrieving Data Products to learn how to access catalog data products.
ADQL Examples
The ADQL and CSC 2.1 page should be referred to for more information on writing ADQL queries.
Although the IVOA TAP service for CSC also accepts ADQL queries there is subtle difference: the getProperties queries (shown below) do not need to say where the columns are from, whereas users of the TAP service do. That is, getProperties would say
FROM master_source m
whereas the TAP query has to say
FROM csc21.master_source m
Perform a basic source-property query
This example shows how source-property columns can be selected and filtered. The query used in this example returns the the master source name, significance, broad-band aperture energy flux, and power-law model best-fit photon index for the first 50 master sources found with a significance greater than 10.0, a hard-to-soft hardness ratio greater than 0.7, and a pileup fraction less than 10% (by checking that m.pileup_flag is 0):
SELECT TOP 50 m.name, m.significance, m.flux_aper_b, m.powlaw_gamma FROM master_source m WHERE (m.significance > 10.0 AND m.pileup_flag = 0 AND m.hard_hs > 0.7)
The following calls will return the data in VOTable format (which can be read by the TOPCAT application) and store it in the file out.file:
unix% curl -o out.file \ --form outputFormat=votable \ --form query='SELECT TOP 50 m.name, m.significance, m.flux_aper_b, m.powlaw_gamma FROM master_source m WHERE (m.significance > 10.0 AND m.pileup_flag = 0 AND m.hard_hs > 0.7)' \ http://cda.cfa.harvard.edu/csccli/getProperties
unix% wget -O out.file \ 'http://cda.cfa.harvard.edu/csccli/getProperties?query=SELECT TOP 50 m.name, m.significance, m.flux_aper_b, m.powlaw_gamma FROM master_source m WHERE (m.significance > 10.0 AND m.pileup_flag = 0 AND m.hard_hs > 0.7)&outputFormat=votable'
Perform a basic cone-search query
Queries can also perform a cone search around one or more locations. This query accesses from the catalog the master source name, R.A and Dec., and broad-band aperture energy flux for all sources located within 10 arcminutes of the location with Right Ascension 83.77333 degrees and Declination -5.68464 degrees (which is just South of the center of the Orion Nebula cluster).
SELECT m.name, m.ra, m.dec, m.flux_aper_b FROM master_source m WHERE dbo.cone_distance(m.ra, m.dec, 83.77333, -5.68464) <= 10
unix% curl -o out.file \ --form query='SELECT m.name, m.ra, m.dec, m.flux_aper_b FROM master_source m WHERE dbo.cone_distance(m.ra,m.dec,83.77333,-5.68464)<=10' \ http://cda.cfa.harvard.edu/csccli/getProperties
unix% wget -O out.file \ 'http://cda.cfa.harvard.edu/csccli/getProperties?query=SELECT m.name, m.ra, m.dec, m.flux_aper_b FROM master_source m WHERE dbo.cone_distance(m.ra,m.dec,83.77333,-5.68464)<=10'
Add extra columns and a restriction to a cone-search query
This is similar to the basic cone-search query but adds extra columns, uses a slightly different location, and also requires that the sources have a minimum significance. This example is also used in the Change the representatuion of missing values example below.
SELECT m.name, m.flux_aper_b, m.flux_aper_w, m.acis_num, m.hrc_num FROM master_source m WHERE (dbo.cone_distance(m.ra, m.dec, 83.76562, -5.48538) <= 20 AND m.significance > 3)
unix% curl -o out.file \ --form query='SELECT m.name,m.flux_aper_b,m.flux_aper_w,m.acis_num,m.hrc_num FROM master_source m WHERE (dbo.cone_distance(m.ra,m.dec,83.76562,-5.48538)<20 AND m.significance > 3)' \ http://cda.cfa.harvard.edu/csccli/getProperties
unix% wget -O out.file \ 'http://cda.cfa.harvard.edu/csccli/getProperties?query=SELECT m.name,m.flux_aper_b,m.flux_aper_w,m.acis_num,m.hrc_num FROM master_source m WHERE (dbo.cone_distance(m.ra,m.dec,83.76562,-5.48538)<20 AND m.significance > 3)'
Upload a query from a file
The following query retrieves the first 10 sources, where the sources are ordered by name, from the master source table, returning the following fields: name, location, major axis of the error ellipse, flags determining whether the source is affected by confusion or saturation, the significance, and the broad- and wide-band aperture fluxes (along with lower and upper limits). Note that a master source has either a broad- or wide-band aperture flux, not both, even if it was detected in both ACIS and HRC observations.
SELECT DISTINCT top 10 m.name, m.ra, m.dec, m.err_ellipse_r0, m.conf_flag, m.sat_src_flag, m.significance, m.flux_aper_b, m.flux_aper_lolim_b, m.flux_aper_hilim_b, m.flux_aper_w, m.flux_aper_lolim_w, m.flux_aper_hilim_w FROM master_source m ORDER BY name ASC
The file should contain a query written in ADQL 2.0, e.g. a query save file output by CSCview. The query above is available as cscquery.adql.
The query must be written as a single line - so no new-line or carriage-return characters - for the call to succeed.
unix% curl -o out.file \ --form query=@cscquery.adql \ http://cda.cfa.harvard.edu/csccli/getProperties
Change the representatuion of missing values
This repeats the query from the Add extra columns and a restriction to a cone-search query but add the nullAppearance=NULL setting, which causes missing values to be written out as NULL instead of an empty field:
SELECT m.name, m.flux_aper_b, m.flux_aper_w, m.acis_num, m.hrc_num FROM master_source m WHERE (dbo.cone_distance(m.ra, m.dec, 83.76562, -5.48538) <= 20 AND m.significance > 3)
unix% curl -o out.file \ --form nullAppearance=NULL \ --form query='SELECT m.name,m.flux_aper_b,m.flux_aper_w,m.acis_num,m.hrc_num FROM master_source m WHERE (dbo.cone_distance(m.ra,m.dec,83.76562,-5.48538)<20 AND m.significance > 3)' \ http://cda.cfa.harvard.edu/csccli/getProperties
unix% wget -O out.file \ 'http://cda.cfa.harvard.edu/csccli/getProperties?nullAppearance=NULL&query=SELECT m.name,m.flux_aper_b,m.flux_aper_w,m.acis_num,m.hrc_num FROM master_source m WHERE (dbo.cone_distance(m.ra,m.dec,83.76562,-5.48538)<20 AND m.significance > 3)'
Change the representation of coordinates
The default format for positions is sexagesimal, but this can be changed. The following query return the master source name, Right Ascension, and Declination for the first ten sources with a master source significance greater than 10.0, a hard-to-soft hardness ratio greater than 0.7, and a pileup fraction less than ~10% (by checking for pileup_flag being set to 0).
SELECT TOP 10 m.name, m.ra, m.dec FROM master_source m WHERE (m.significance > 10.0 AND m.pileup_flag = 0 AND m.hard_hs > 0.7)
By changing the coordFormat parameter, the location values are returned as decimal degrees.
unix% curl -o out.file \ --form coordFormat=decimal \ --form 'query=SELECT TOP 10 m.name, m.ra, m.dec FROM master_source m WHERE (m.significance > 10.0 AND m.pileup_flag = 0 AND m.hard_hs > 0.7)' \ http://cda.cfa.harvard.edu/csccli/getProperties
unix% wget -O out.file \ 'http://cda.cfa.harvard.edu/csccli/getProperties?query=SELECT TOP 10 m.name, m.ra, m.dec FROM master_source m WHERE (m.significance > 10.0 AND m.pileup_flag = 0 AND m.hard_hs > 0.7)&coordFormat=decimal'
Change the precision of numeric values
This query accesses from the catalog many observation-specific properties for the first 10 sources found in the Source Observations Table, with source property values output in native floating point format (the "native" catalog format includes numbers with many significant digits, exactly as they are output by the catalog processing pipeline). The SELECT DISTINCT option is used here because otherwise there would be repeated rows.
SELECT DISTINCT top 10 o.obsid, o.obi, o.targname, o.ra_targ, o.dec_targ, o.ra_pnt, o.dec_pnt, o.roll_pnt, o.ra_nom, o.dec_nom, o.roll_nom, o.gti_start, o.gti_stop, o.gti_elapse, o.gti_end, o.gti_obs, o.gti_mjd_obs, o.mjd_ref, o.instrument, o.grating, o.datamode, o.readmode, o.cycle, o.exptime, o.timing_mode, o.ascdsver, o.caldbver, o.crdate, o.ao FROM observation_source o
unix% curl -o out.file \ --form floatFormat=native \ --form 'query=SELECT DISTINCT top 10 o.obsid,o.obi,o.targname,o.ra_targ,o.dec_targ,o.ra_pnt,o.dec_pnt,o.roll_pnt,o.ra_nom,o.dec_nom,o.roll_nom,o.gti_start,o.gti_stop,o.gti_elapse,o.gti_end,o.gti_obs,o.gti_mjd_obs,o.mjd_ref,o.instrument,o.grating,o.datamode,o.readmode,o.cycle,o.exptime,o.timing_mode,o.ascdsver,o.caldbver,o.crdate,o.ao FROM observation_source o' \ http://cda.cfa.harvard.edu/csccli/getProperties
unix% wget -O out.file \ 'http://cda.cfa.harvard.edu/csccli/getProperties?query=SELECT DISTINCT top 10 o.obsid,o.obi,o.targname,o.ra_targ,o.dec_targ,o.ra_pnt,o.dec_pnt,o.roll_pnt,o.ra_nom,o.dec_nom,o.roll_nom,o.gti_start,o.gti_stop,o.gti_elapse,o.gti_end,o.gti_obs,o.gti_mjd_obs,o.mjd_ref,o.instrument,o.grating,o.datamode,o.readmode,o.cycle,o.exptime,o.timing_mode,o.ascdsver,o.caldbver,o.crdate,o.ao FROM observation_source o&floatFormat=native'
Cross-match the CSC with another catalog
This query accesses from the catalog the separation in arcseconds between the matched catalog master sources and the corresponding sources in a user-input table, along with the names and RA & Dec. positions associated with the catalog sources in the CSC Master Sources Table. The probability value of each source match is also returned, where a value of 1.0 indicates that the returned match is very likely a true match (down to many siginificant digits in source position), and a value of 0.0 represents a very unlikely true match. The my_catalog_id column values in the input table is returned in the c.usrid output column for the matching sources.
SELECT m.name, m.ra, m.dec, c.usrid, c.separation, c.probability from cross_match c, master_source m where c.msid = m.msid
unix% curl \ --request POST \ --form 'query=select m.name,m.ra,m.dec,c.usrid,c.separation,c.probability from cross_match c, master_source m where c.msid = m.msid' \ --form RA=ra \ --form DEC=dec \ --form ID=my_catalog_id \ --form UPLOAD=user_table,param:my_catalog \ --form my_catalog="@/data/somewhere/table.xml" \ http://cda.cfa.harvard.edu/csccli/getProperties