Compare values in two files.
dmdiff infile1 infile2 [outfile] [tolfile] [keys] [data] [subspaces] [units] [comments] [wcs] [missing] [error_on_value] [error_on_comment] [error_on_unit] [error_on_range] [error_on_datatype] [error_on_wcs] [error_on_subspace] [error_on_missing] [verbose] [clobber]
The dmdiff tools compares two files (FITS or ASCII) and determines whether they contain the same data. The default behavior is to compare the data values - e.g. columns in a table and pixel values in an image - as well as the metadata in the file - e.g. keyword values, units, and comments - but options exist to restrict the items being compared. There are multiple ways of comparing values - such as equality or by using absolute or relative differences - that can be specified for different values using the tolfile parameter.
Like the Unix commands `diff' and `cmp', dmdiff assigns special meaning to its exit status. An exit status of 0 means that no differences were found in the two input files. An exit status of 1 means that either differences were found or an error occurred. An exit status greater than one always indicates that an error occurred. Note that if the verbose parameter is set to 0, the tool will produce no output, but the exit status will still reflect whether differences were found in the input files. This feature can be useful in scripts that automatically compare a large number of files.
There are a few limitations in the tool:
- The number of columns in the two infiles must be the same. If the number of rows are different, dmdiff will only compare the lesser number of rows in the inputs (i.e. infile1=1000 rows, infile2=500 rows; first 500 rows of each file are compared).
- Values with the same name in the input files must also have the same datatype. This means that dmdiff cannot compare columns or images of different types and will exit with an error if there is a mismatch.
unix% dmdiff file1.fits file2.fits
Compare all header and data values in the default block of file1.fits and file2.fits.
unix% dmdiff "file1.dat[t=100:][cols x,y]" "file2.dat[cols x,y]"
Here the comparison is listed to the X and Y columns in the two files (in this case ASCII files, using the ASCII kernel support), and the data from the first file has an additional filter (only those rows with t values of 100 or more).
unix% dmdiff "file1.fits[EVENTS]" "file2.fits[EVENTS]"
Compare all the header and table values of the EVENTS block in file1.fits and file2.fits.
unix% dmdiff "file1.fits[EVENTS]" "file2.fits[EVENTS]" keys=yes data=no
Compare only the header values of the EVENTS block in file1.fits and file2.fits.
unix% dmdiff "file1.fits" "file2.fits" tolfile=tolerances.txt outfile=diffs.txt
Compare the header and data values listed in block 2 of file1.fits and file2.fits using the limits given in the file tolerances.txt. Output will be written to diffs.txt.
unix% dmdiff "image1.fits[PRIMARY]" "image2.fits[PRIMARY]" tolfile=tols.txt
Compare the header and image values listed in the PRIMARY block of image1.fits and image2.fits using the limits listed in tols.txt. In this example, we have
unix% cat tols.txt !DATE !CHECKSUM PRIMARY=range(1.0e-6)
which means that pixel values that differ by 1.0e-6 or less will be considered equal and the DATE and CHECKSUM keywords will be ignored.
Detailed Parameter Descriptions
Parameter=infile1 (file required filetype=input stacks=no)
1st input file name
The first file to use. It can contain Data Model syntax. The file does not have to have the same format as the infile2 parameter.
Parameter=infile2 (file required filetype=input stacks=no)
2nd input file name
The second file to use. It can contain Data Model syntax. The file does not have to have the same format as the infile1 parameter.
Parameter=outfile (file not required stacks=no)
Output file name
Output file listing summary of differences found. If the value is omitted or set to 'none', 'NONE', or 'stdout', output will go to the standard output device (generally the terminal). If outfile is set to 'stderr', output will go to the standard error device (also generally displayed on the terminal). Finally, if a filename is given, output will be written to that file. The clobber parameter controls whether an existing file will be overwritten.
Parameter=tolfile (file not required stacks=no)
Tolerance file name
This is an ASCII text file that governs how values are compared. The file is case insensitive, with commands on each line, and empty lines or those beginning with the '#' character are ignored. The order of the commands does not matter and commands that do not match the contents of the file are ignored.
There are multiple ways to compare numeric values, as discussed below. To refer to an image, use the block name of the image (use 'dmlist filename blocks' to find this out). The same syntax is used to refer to keyword values, rows of a column, or image pixel values, so whether the command
refers to a keyword, column, or image, depends on the input files.
A single value
Using "name=value" means that it is an error if either file does not equal the given value. The following example requires all ccd_id values to be equal to 3 and state values to match the string "finished":
A range of values
The Data Model range syntax - namely a=b:c, with b or c optional - can be used to specify that a must be within the range a to b (missing values mean lower or upper limits). That is,
require that the ccd_id values be in the range 6 to 8 (inclusive), greater than or equal to 6, and less than or equal to 8 respectively.
Note that there is no check that the values in the two files equal each other, just that they match the range filter.
An absolute difference
The range option is used to check that the absolute difference between the two files is within the given limit. So
mean that the chipx values can differ by no more than 1, and the events_image values no more than 1.0e-6.
A percentage difference
To express a relative difference, use % and then the difference as a percentage (calculated relative to the first file). Note that the % character is written before the limit, otherwise it will be taken as a string comparison. The commands
mean that the chipx values can differ by 1% or less and the events_image values by 0.01% or less.
When comparing string values (either column values or a keyword) that contain file names, the "ignorepath" directive can be used to make the comparison use only the file name in the comparison, ignoring any preceding path components. That is, with the command
the values /path1/to/file1.dat and /data/file1.dat would be considered equal when stored in either the INFILES keyword or column.
Ignoring a value
To ignore a keyword, column, or image, use the ! character followed by the name of the item. To ignore multiple item, write each out on a separate line of the file (preceeded by the ! character). You can also use the Data Model virtual file syntax for the infile1 and infile2 parameters to select (or hide) certain columns. The following commands will ignore the keywords DATE, CHECKSUM, and CREATOR:
!DATE !CHECKSUM !CREATOR
Parameter=keys (boolean not required default=yes)
Check header keywords?
Determines whether or not the header keys will be compared. See also the units and comments parameters. The tolerance file - set by the tolfile keyword - can be used to filter out certain keywords and to contol whether, when comparing file names, the path component should be ignored.
Parameter=data (boolean not required default=yes)
Check table or image data?
Determines whether or not the data values - i.e. the image pixels of rows of each column - will be compared.
Parameter=subspaces (boolean not required default=yes)
Controls whether or not the subspace record, stored in the file by CIAO tools to record the filters applied, will be compared.
Parameter=units (boolean not required default=yes)
Controls whether or not the units of keywords and columns will be compared.
Parameter=comments (boolean not required default=yes)
Controls whether or not the comments of columns and keywords will be compared. This does not refer to the COMMENT or HISTORY keywords, which are not used when comparing files.
Parameter=wcs (boolean not required default=yes)
Controls whether the WCS keywords be included in the comparison.
Parameter=missing (boolean not required default=yes)
Check for missing header keys?
Determines if missing header keys will be checked.
Parameter=error_on_value (boolean not required default=yes)
Return error when values are different?
Parameter=error_on_comment (boolean not required default=yes)
Return error when comments are different?
Parameter=error_on_unit (boolean not required default=yes)
Return error when units are different?
Parameter=error_on_range (boolean not required default=yes)
Return error when ranges are different?
Parameter=error_on_datatype (boolean not required default=yes)
Return error when datatypes are different?
Parameter=error_on_wcs (boolean not required default=yes)
Return error when wcs's are different?
Parameter=error_on_subspace (boolean not required default=yes)
Return error when subspaces are different?
Parameter=error_on_missing (boolean not required default=yes)
Return error when header key is missing?
Parameter=verbose (integer not required default=1 min=0 max=5)
Verbosity level of terminal display information to user (DataModel output included). If verbose is set to 0, the tool will produce no output, but its exit status will indicate whether differences were found in the input files. See the section "Exit Status" above.
Parameter=clobber (boolean not required default=no)
Clobber existing file
Controls whether a file is overwritten, or the tool errors out, if the outfile parameter is set to a file name and it exists.
An example tolerance file
The purpose of the tolerance file is to set parameters when comparing the values of the input files. The tolerance file is an ASCII file with one keyword rule per line; see the description of the tolfile parameter for more information on the syntax and semantics of the commands.
An example tolerance file is:
unix% cat tols.dat TSTART=83201992:83201992.7 chipx=range(50) #time=83202418: ccd_id=8 !checksum !datasum !DATE telescop=CHANDRA backfile=ignorepath
which indicates that any TSTART value must lie between the given minimum and maximum limits (the checks are inclusive, but note that there is no requirement that the TSTART values are the same in the two files, just that they both lie within this range); the CHIPX values must not differ by more than 50; the CCD_ID values must be equal to 8; the CHECKSUM, DATASUM, and DATE values are ignored (whether a column or keyword); the TELESCOP value is set to 'CHANDRA', and the BACKFILE values are compared ignoring any path component. The TIME filter is ignored because it begins with a '#' character, and note that the names of the values to be compared are case insensitive.
Changes in CIAO 4.14
- dmdiff has undergone a major overhaul. The output format between the different sections has been made more consistent. There have been improvements when comparing WCS and arrays along with additional improvements handling NaN, NULL, and Infs.
- Problem with percent sign (%) in strings
dmdiff will produce some bad results if any of the strings (comment, units, value) have a "%" in them; e.g. "90%ecf". The "%e" is getting parsed as string formatting.
- The tool does not recognize differences in vector component ranges.
- The tool does not report any difference in keywords if either input file as no keywords.