/ ciao4.15 / ahelp / dmtcalc.html

Jump to: Description · Examples · Parameters · Creating Vector Columns · Whitespace in Expressions · Bugs · See Also

AHELP for CIAO 4.15

dmtcalc

Context: Tools::Table

Synopsis

Modify and create columns in a table

Syntax

dmtcalc  infile outfile expression [clobber] [verbose]

Description

`dmtcalc' is a flexible general purpose table calculation utility. It can change existing columns add new ones using arbitrary expressions which can involve other columns - possibly smoothed - mathematical functions such as sqrt(), random numbers, values taken from the file header, and much more. It uses the same syntax as the dmgti and dmimgcalc tools: this syntax is described in detail in "ahelp dmmath".

The input expression can either be given directly on the command-line or stored in an external file and accessed as a stack - i.e. "@filename". The use of a stack allows more than one column to be created or modified at one time.

When dmtcalc starts, it will make an effort to assure that any data dependencies in the expressions (referenced columns or keywords, or new columns defined in previous expressions) are satisfied, that there are not any data type conflicts within the expressions, and that there are no conflicts of array size or dimensionality. If any of these tests fail, an error message will print, and the tool will exit.

Changing the value of an existing column

dmtcalc cannot be used to change the value of an existing column. Rather, users can create a new column with the same name as an existing column though they need to explicitly copy the values (if desired). For example

expression="DY=DY"

Will create a new column "DY" in the output file and the values of the new column will be set equal to the values of the input column. Similarly

expression="DZ=(int)DZ"

Will create a new DZ column that matches the datatype of the expression, in this case an integer datatype. The original values will be truncated and saved in the new output column.

Any/all other column properties are not copied including: column description, column units, WCS, NULL values, etc. As with all new columns, they are added after all the existing columns in the file so the order of the columns is not preserved.

Overwriting times in the GTI blocks

dmtcalc correctly changes the start and stop columns in the file (and thus the order can change). However, it then proceeds to copy the data subspace from the input to the output file and so the values are overwritten with the original values. To override tstart/tstop, they need to be put into a new extension:

	infile="file.fits[gti][new_gti]"

Then use the reverse notation when using as a filter:

	infile="file_2.fits[@file.fits[new_gti][gti]]"

Examples

Example 1

dmtcalc evt2.fits evt2_edit.fits expression="dtime=(time-TSTART)*1.0e-3"

This time we have used dmtcalc to create a new column called "DTIME" in the output file evt2_edit.fits (new or changed columns always have their names written in upper case in the output file, however they are written in the expression).

The contents of the DTIME column are set equal to

(time-TSTART)*1.0e-3

where TSTART is replaced with the value of the TSTART keyword from the header. The case of header keywords is unimportant, so

(time-tstart)*1.0e-3

would give the same result.

For this particular example DTIME can be considered to be the time since the start of the observation in kilo-seconds. In general the expressions used in these examples are just used to illustrate dmtcalc's capabilities and so should not be taken to have any real meaning for Chandra (or any other) data. More details of the allowed expressions are given in "ahelp dmmath".

Example 2

dmtcalc LV.newer"[cols length,diameter,launch_mass]" calc.out.txt"[opt
kernel=text/simple]"
expr="result=diameter*length*(diameter/launch_mass)"

Run dmtcalc on a fixed-format text file, creating a new column named "result"; see "ahelp dmascii" for more information on working with ASCII files.

Example 3

dmtcalc evt2.fits evt2_edit.fits expression="energy=((float)pha)*0.1"

Here we use dmtcalc to over-write the values in the energy column to be ten percent of the PHA values. Since the PHA values are stored as integers we convert them to floating-point numbers - by use of the term "(float)" - before the division.

If we had omitted the "(float)" term, the output would still have been automatically converted to a real since we are multiplying by a real number. It would, however, have changed the ENERGY column to Real8 (ie double) format instead of keeping the Real4 (float) format.

The name of the energy column in the output file (here "evt2_edit.fits") will be in upper case - whatever its case in the input file - since its values have been changed by dmtcalc.

Since "expression" is a required (automatic) parameter, you may be tempted to enter something like:

  unix% dmtcalc evt2.fits evt2_edit.fits "energy=((float)pha)*0.1"

This will fail because the parameter parser interprets this as setting a (nonexistent) dmtcalc parameter named "energy" to "((float)pha)*0.1".

Example 4

dmtcalc evt2.fits evt2_edit.fits
expression="newpha=(pha+(long)(1.25e-3*(time-TSTART)))"

The contents of the NEWPHA column are set equal to the PHA column added to the integer part of

1.25e-3*(time-TSTART)

The data type of the output column is integer (Int4), since the PHA column is an integer and the time expression has been converted (also called cast) to an integer value (here of type "long").

Example 5

dmtcalc evt2.obs1,evt2.obs2 evt2.combined
expression="newpha=(pha+(long)(1.25e-3*(time-TSTART)))"

This does the same as the previous example except that the input is now two files - evt2.obs1 and evt2.obs2. There is only one output file - evt2.combined - which contains the NEWPHA column. The row order of the output image is determined by the order of the files in the input stack: the first files contents, then the second file, etc until all the input files have been processed. The dmsort tool can be used to change the row order of the output file if required.

Example 6

dmtcalc lc.orig lc.smoothed expression="rate=(rate:9-0.123)"

Here the file lc.orig (which is assume to be a lightcurve so contains a rate column) is converted to lc.smoothed in which the RATE column is equal to the smoothed - by 9 rows - value of the rate and then a dc offset (0.123) is subtracted from it. The value for the dc offset would have to be calculated prior to running the program - e.g with dmstat as shown in the following csh/tcsh example

  unix% dmstat "lc.orig[cols rate]" sigma-
  unix% set dc=`pget dmstat out_mean`
  unix% dmtcalc lc.orig lc.smoothed expression="rate=(rate:9-$dc)"

Example 7

dmtcalc evt2_bary.fits evt2_phase.fits @calcphase.lis

The expression can also be stored in an ASCII file and accessed as a stack. Here we use the contents of calcphase.lis to create an output file in which two new columns have created - PHASE and GPHASE - which contain phase information for the events. The contents of the stack file are:

  unix% cat calcphase.lis
  .dtime=(time-TSTART)
  GPHASE=.dtime*19.794885
  PHASE=GPHASE-(long)GPHASE

We have also included the use of a temporary variable (here .dtime) even though it is not really necessary for this particular example.

The results are only valid if the frequency of the source is 19.794885 Hz (and it's period is not changing, and you have applied the barycenter correction to the events file). For more details on using dmtcalc to add phase information to an event file see the Phase Resolved Spectroscopy thread on the CIAO website.

Example 8

dmtcalc evt2_bary.fits evt2_phase.fits
expression=".dtime=(time-TSTART),GPHASE=.dtime*19.794885,PHASE=GPHASE-(l
ong)GPHASE"

Here we repeat the previous example but give the expression directly - i.e. without the use of a stack file.

Example 9

dmtcalc evt2.fits evt2_edit.fits @randpha.lis

  unix% cat randpha.lis
  newpha= pha + (long) \
    ( 1.25e-3*(time-TSTART) + 10*(#rand(1034)-0.5) )

Here we define the expression in an external file and use the "\" character to indicate a continuation line. The contents of the NEWPHA column are set equal to the PHA column added to the integer part of

1.25e-3*(time-TSTART) + 10*(#rand(1034)-0.5)

where #rand(1034) means a random number between 0 and 1 where 1034 is used to seed the random number generator. This seeding only happens once - ie it is not re-set for each row of the table.

The result is similar to the earlier example, except that a random element of +/-5 has been added onto the calculation.

Example 10

dmtcalc srclist.wav srclist_filt.wav
expression="if(psfratio==#nan)then(psfratio=0.0)"

Here we use the support for conditional expressions to filter out the NaN values from the PSRATIO column of srclist.wav, replacing any such values by 0.0.

In order for the expression "if(a)then(b)" to work correctly, the same column must be used in both sections of the comparison.

Example 11

dmtcalc srclist.wav srclist_filt.wav expression="scol=shape~='cir'"

This expression creates the column SCOL which contains boolean values: TRUE (1) or FALSE (0). The row entry is true if the entry shape column contains the string "cir" and false otherwise. The "~=" operator returns true if the string on the left-hand side contains the string given on the right-hand side.

Parameters

name	type	ftype	def	min	max	reqd	stacks
infile	file	input				yes	yes
outfile	file	output				yes
expression	string	input				yes	yes
clobber	boolean		no
verbose	integer		0	0	5	no

Detailed Parameter Descriptions

Parameter=infile (file required filetype=input stacks=yes)

Input file to be operated upon. A copy will be made of this file, with any additions/modifications indicated by the expression parameter.

Unless an extension is specified when the file is input, the first table extension will be used. A stack of files may be input, but they must all have the same structure - ie have the same extension name and contain the same columns - since the columns in the first file are assumed to exist - in the same order - in all subsequent files. The behaviour of the tool is not guaranteed if this condition does not hold.

When a stack is supplied, the input files are processed in the order they are supplied to create one output file. The dmsort tool can be used to change the order of rows in the output file.

Parameter=outfile (file required filetype=output)

Name of output file to create.

The output file will have the same extension name as that of the first input file. All the keywords and subspaces in the first file will be copied to the output, in addition to any new columns defined in the expression parameter.

Parameter=expression (string required filetype=input stacks=yes)

A stack of expressions to be evaluated for each row in the input file.

If many expressions are desired, then they should be put into an ascii file, and input as @filename. There is no limit on the number of expressions that can be evaluated. When providing expressions on the command line, note that spaces may be interpreted as stack separators, which can produce unexpected error messages. See "ahelp dmmath" for a description of the syntax allowed here.

Parameter=clobber (boolean default=no)

Clobber the output file?

Parameter=verbose (integer not required default=0 min=0 max=5)

Level of verbose information printed during processing.

Creating Vector Columns

In order to make a new vector column with dmtcalc, it must be used in conjunction with other CIAO tools. Here is a method for creating a vector column "skymod(xmod,ymod)":

First, create the skymod column and generate a new output file (temp1.fits), then filter the original sky column out of that file (temp2.fits):

dmtcalc old.fits"[cols x,y]" temp1.fits expr="xmod=...,ymod=..."
dmcopy temp1.fits"[cols -sky]" temp2.fits

The Data Model filter "[cols x,y]" indicates that only those two columns should be included in the new output file. This means that there will be only one vector and one MTYPE/MFORM pair. Update these keywords with the new vector and component names:

dmhedit temp2.fits filel="" op=add key=MTYPE1 value='skymod'
dmhedit temp2.fits filel="" op=add key=MFORM1 value='xmod,ymod'

Now paste the rest of the columns from the old file to the new one:

dmpaste old.fits temp2.fits new.fits

Whitespace in Expressions

When an expression is given on the command line - so not as a stack file - then spaces are important. This is because spaces are treated as stack separators - as discussed in "ahelp stack" - which is unlikely to be the desired behaviour here.

So, setting

expression="energy= (float) pha / 10.0"

will not work. You have to say either

expression="energy=(float)pha/10.0"

- i.e. remove all the spaces - or surround the expression in ()'s, since this stops the checks for stack separators. This means that

expression="energy=( (float) pha / 10.0 )"

is a valid expression.

An alternative solution is to write the expression in a text file ("expr.lis" say) and then set

expression="@expr.lis"

since the white space in stack files is not important.

Bugs

log() of integer values

Using the log or log10 operator on integer columns returns an integer value. The value is truncated.

Workaround:

If unsure of the datatype, the value can be explicitly cast to a real-valued number

newcol=log((float)oldcol)
newcol=log(oldcol*1.0)

When using status bits in an expression, only the last condition is checked.

When using status bits in an expression such "status==X1F,X3T,X18F", only the last condition (here, "X18F") is checked.

Workaround:

Set the logic explicitly:

logical "AND": status=((bits==X1T)&&(bits==X2T))
logical "OR": status=((bits==X1T)||(bits==X2T))

The tool does not follow the usual order of operations of mathematics when evaluating an expression.

For example, the expression

x+814*24*3600-260086780.04

is treated as

x+(814*24*(3600-260086780.04))

instead of the expected

x+(814*24*3600)-260086780.04

Workaround:

Include the parentheses, as shown in the last code snippet, to force the correct order of operations.

When creating a new double vector column, the results are incorrect if one (but not both) of the values in the expression are integers, even if it is cast as a double.

For example, this file will work:

unix% cat ok.expr
(r=(double){3.5,4.}) 

unix% dmlist output_ok.fits'[cols r]' data
---------------------------------------------
Data for Table Block HISTOGRAM
---------------------------------------------

ROW R[2]

     1 [ 3.50 4.0]
     2 [ 3.50 4.0]
     3 [ 3.50 4.0]
(etc.)

while this does not:

unix% cat not_ok.expr
(r=(double){3.5,4}) 

unix% dmlist output_not_ok.fits'[cols r]' data
-------------------------------------------------
Data for Table Block HISTOGRAM
-------------------------------------------------

ROW R[2]

     1 [ 3.50 0]
     2 [ 3.50 0]
     3 [ 3.50 0] 
(etc.)

Virtual columns cannot be used in an expression

For example, when using an event list, one cannot say

unix%  dmtcalc evt1.fits foo.fits exp='dra=(ra-ra_nom)'

because the 'ra' column is not a real column in the file; see the virtual columns dictionary entry for an explanation.

Workaround:

Use Data Model column renaming:

unix%  dmtcalc evt1.fits"[cols foo=ra,*]" foo.fits expr='dra=(foo-ra_nom)'

Units are not preserved, even for simple calculations.

If you calculate npha as npha=pha; which should have units of PHA [adu]; but it will be unitless. Even more confusing is pha=pha will be unitless since pha in the output file is treated a new column.

The speed basically scales as O(N^2): small files are not so bad, but big files slow down exponentially.

Problems accessing individual elements in a vector and/or array column

There are various problems doing things like:

pha[9]=1;
foo=pha[9];

where you try to access the individual elements in a vector and/or array column.

Changing the value of an existing column

There are issues when changing values of existing columns: what info is kept, what is thrown away (e.g. data types, null, units, descriptions, order in file). In particular there are various problems with arrays vs. vector vs. vector-array columns.

There is no way to access elements of an array column such as PHAS, which is a 3x3 array (or 5x5 in VFAINT mode).

Booleans do not work on array columns

For example, you cannot do above=(phas>20) since the PHAS array contains 9 (or 25) values. Currently only the first value in the array is checked.

Using vector components in a complicated expression may cause dmtcalc to hang. (01 Jun 2006)

For example:

expression="dist=((4125-x)^2)-((4025-y)^2)"

Workaround:

Refer to the component by vector notation instead:

expression="dist=((4125-sky[0])^2)-((4025-sky[1])^2)"

dmtcalc tries to do this subsitution internally in the first example, but fails.

dmtcalc cannot be used to modify values of vector columns.

Vector columns such as SKY(X,Y), POS(X,Y), CHIP(CHIPX,CHIPY), RD(TG_R,TG_D), cannot be modified using dmtcalc.

Internally, dmtcalc treats vector columns as arrays. So SKY(X,Y) becomes SKY[0] and SKY[1] which is what is written on output if the values are set. For example

unix% dmtcalc wav_srclist.fits modified_srclist expression="pos=pos"
unix% dmlist wav_srclist.fits | grep POS
   5   POS(X,Y)             pixel        Real8          2288.50:     4900.50 Physical coordinates
5:    EQSRC(RA_SRC ) = (+230.1725)[deg] +TAN[(-0.000136667)* (POS(X)-(+4096.50))]
ciao:/data/da/Docs/ciaoweb/ciao47/bugs% 

unix% dmlist modified_srclist | grep -i POS
 26   POS[2]                            Real8(2)       -Inf:+Inf            User defined column

shows that just copying the values in the POS vector column creates a POS array.

Workaround:

Users can modify vector columns using the Python Crates package

from pycrates import read_file

tab = read_file("wav_srclist.fits")
pos = tab.get_column("pos").values
pos[0][0] = new_X_value
pos[0][1] = new_Y_value
tab.write("modified_srclist")

The valid range of vector columns is not preserved in the output file, even when not used or modified.

When the input file contains a vector column, the legal range of values stored in the raw FITS TLMIN/TLMAX keywords associated with each raw column are not preserved. The MIN/MAX for each raw column in the vector are set equal to the 1st column.

So in this example, a wavdetect source list with the vector column POS(X,Y) (which are columns number 5 and 6), have different ranges:

unix% dmlist wav_src.fits  header,clean,raw | grep 'TLM..[56]'
TLMIN5       =    2028.50000000     /                     
TLMAX5       =    5970.50000000     /                     
TLMIN6       =    1800.50000000     /                     
TLMAX6       =    5630.50000000     /

After running dmtcalc, the values are different. The values for the Y column have been changed to be equal to the range of the X column.

unix% dmtcalc wav_src.fits new_col expression="newcol=#trand"

unix% dmlist new_col header,clean,raw | grep 'TLM..[56]'
TLMIN5       =    2028.50000000     /                     
TLMAX5       =    5970.50000000     /                     
TLMIN6       =    2028.50000000     /                     
TLMAX6       =    5970.50000000     /

These column ranges are used when the file is filtered, which can produce incorrect results.

Caveats

Output columns with same name as input require special consideration.

For example, The expression "if(a)then(b)" does not work as expected in dmtcalc if two different columns are used in the comparison.

unix% dmtcalc in.fits out.fits "expr=if(pi>500)then(pi=-1)"

works as expected ("pi" used for both pieces of the conditional), but

unix% dmtcalc in.fits out.fits "expr=if(energy>5000)then(pi=-1)"

does not work. This is because the pi value in the expression is not always initialized to the input. Instead you must explicitly initialize it

unix% dmtcalc in.fits out.fits "expr=pi=pi;if(energy>5000)then(pi=-1)"

The #nan can be use for float-point NaN checks, but integer NULL values cannot be checked the same way.