How to run CIAO tools from within Python
Aim: Show how CIAO tools can be run from Python.
Summary: We highlight several ways of running CIAO tools from Python, including the routines from the Python subprocess module (which can be used to run any command-line application) and the specialized routines from the ciao_contrib.runtool module which lets you run CIAO tools with a "parameter-based" interface.
Both the subprocess module, provided as part of the standard Python distribution, and the ciao_contrib.runtool module, which is part of the CIAO scripts package, can be used to run CIAO tools from Python as described below. The choice depends on your exact needs, but we suggest trying the ciao_contrib.runtool module first. If you need greater access to the output (such as separating out the stdout and stderr channels), or want to run a non-CIAO tool, then use the subprocess module.
The examples below can be used from both an interactive Python environment, such as provided by Sherpa or IPython, or from a Python "script".
Using the ciao_contrib.runtool module
Check for availability
The module is provided as part of the CIAO script package. If the following command exits with a "No such file or directory" error, then please install the script package.
unix% cat $ASCDS_CONTRIB/VERSION.CIAO_scripts
Loading the module
The module can be loaded using the standard Python import command, such as any of the following:
from ciao_contrib.runtool import * from ciao_contrib.runtool import acis_process_events, dmstat, dmmerge import ciao_contrib.runtool import ciao_contrib.runtool as rt
If the command returns
ImportError: No module named runtool
then please check the version of the CIAO scripts package, as described above.
In the following we assume the module has been loaded using
from ciao_contrib.runtool import *
Running a CIAO tool
The module provides routines with the same name as the CIAO tools which, when called, will run the respective tool. Parameter values can be given when the tool is run (using Python named arguments for optional parameters) or beforehand using a "pset" style interface. Any screen output from the tool is returned as a string, as shown in the example below of calling dmstat:
>>> dmstat("img.fits") PRIMARY(X, Y)[MJy/sr] min: 20.022872925 @: ( 588 665 ) max: 24.589748383 @: ( 299 487 ) cntrd[log] : ( 344.99795211 345.0132221 ) cntrd[phys]: ( 344.99795211 345.0132221 ) sigma_cntrd: ( 1708.7638456 1708.8441011 ) good: 474721 null: 0
The dmstat call generates screen output which is returned by the call to the dmstat() routine and automatically displayed by python.
In the following example we capture the screen output in a variable and show how optional parameters - in this case centroid and median - can be set:
>>> a = dmstat("img.fits", centroid=False, median=True) >>> print(a) PRIMARY[MJy/sr] min: 20.022872925 @: ( 588 665 ) max: 24.589748383 @: ( 299 487 ) mean: 20.123614861 median: 20.115715027 sigma: 0.072704951096 sum: 9553102.5703 good: 474721 null: 0
Parameters can also be set using a pset-style interface before the function is called:
>>> dmstat.punlearn() >>> dmstat.centroid = False >>> dmstat.median = True >>> dmstat.infile = "img.fits" >>> a = dmstat()
and the two styles can be combined, which allows you to say
>>> dmstat.punlearn() >>> dmstat.centroid = False >>> dmstat.median = True >>> a = dmstat("img.fits")
The current parameter settings for a tool can be found by calling print on the routine:
>>> print(dmstat) Parameters for dmstat: Required parameters: infile = Input file specification Optional parameters: centroid = False Calculate centroid if image? median = True Calculate median value? sigma = True Calculate the population standard deviation? clip = False Calculate stats using sigma clipping? nsigma = 3 Number of sigma to clip maxiter = 20 Maximum number of iterations verbose = 1 Verbosity level out_columns = Output Column Label out_min = Output Minimum Value out_min_loc = Output Minimum Location Value out_max = Output Maximum Value out_max_loc = Output Maxiumum Location Value out_mean = Output Mean Value out_median = Output Median Value out_sigma = Output Sigma Value out_sum = Output Sum of Values out_good = Output Number Good Values out_null = Output Number Null Values out_cnvrgd = Converged? out_cntrd_log = Output Centroid Log Value out_cntrd_phys = Output Centriod Phys Value out_sigma_cntrd = Output Sigma Centriod Value
The current value for a parameter can be accessed using the toolname.parname syntax, as shown below
>>> print(dmstat.centroid) False
Parameter types
Parameter values are matched to the appropriate Python type; boolean parameters such as centroid and median are set using True and False, strings and filenames are given as Python strings and Python numbers are used for numeric types (integer and float parameters).
Parameter names
As shown in the example above, required parameters can be used without names, using the order given in the parameter file, and optional parameters are given using named arguments. Required parameters can also be given by name, so the previous dmstat call could have been written
a = dmstat(infile="img.fits", median=True, centroid=False)
The routines also support the use of unique prefixes for the parameter names, in the same way the command-line interface does, so the above could also have been written as
a = dmstat(inf="img.fits", med=True, cen=False)
Note that the prefix can not match a Python reserved word, otherwise Python will get horribly confused. An example of this is trying to use in as the prefix for the infile parameter, which will result in a syntax error that looks something like:
>>> a = dmstat(in="img.fits", cen=False, med=True) ------------------------------------------------------------ File "<ipython console>", line 1 a = dmstat(in="img.fits", cen=False, med=True) ^ SyntaxError: invalid syntax
Output parameters
Some tools will set parameter values; these values are available after the tool has run using the toolname.paramname syntax. In the example below we use this to get the median value calculated by dmstat.
>>> dmstat("img.fits", centroid=False, median=True, v=0) >>> print("Median = {0}".format(dmstat.out_median)) Median = 20.115715027
The output parameter values are converted to Python types (e.g. string, int, float, bool) using the same rules as for the input parameters. In this case the output value is actually a string, since there could be multiple values stored depending on the input to dmstat.
By setting the verbose parameter to 0 we stopped any screen output, which meant that the dmstat() call returned None instead of a string.
Parameter limits
Parameter limits - such as minimum and maximum values or a list of valid answers - are enforced by this module. An error will occur if a limit test is failed, as in the examples below:
>>> dmstat("img.fits", maxiter=0) ValueError: dmstat.maxiter must be >= 1 but set to 0 >>> dmjoin.verbose = 10 ValueError: dmjoin.verbose must be <= 5 but set to 10 >>> dmjoin.interpolate = "foo" ValueError: The parameter interpolate was set to foo when it must be one of: linear first last closest furthest minimum maximum
Note that, unlike at the command-line, there is no attempt to query the user for a new parameter value.
As with the command-line interface, parameters which are restricted to a set of options - such as the interpolate parameter of dmjoin - can be set using a unique substring, as shown below.
>>> dmjoin.interpolate = "min" >>> print(dmjoin.interpolate) minimum
Errors
If the tool fails to run successfully, then an IOError will be created, with the screen output from the tool, such as in the following example where dmstat is given a file name that does not exist.
>>> dmstat("does-not-exist") IOError: An error occurred while running 'dmstat': # DMSTAT (CIAO 4.17): could not open infile does-not-exist.
An AttributeError will be raised if you try to use a parameter that does not exist, as in the following example:
>>> dmstat.nclip = 20 AttributeError: There is no parameter for dmstat that matches 'nclip'
Similarly, using a non-unique prefix for a parameter name will also raise an error, as shown below:
>>> dmstat.c = True AttributeError: Multiple matches for dmstat parameter 'c', choose from: centroid clip
Parameter files
Running a tool using the runtool module, or setting a parameter value using the toolname.parname syntax, does not change the on-disk parameter file for the tool. This means that it is safe to run multiple copies of a tool at once.
Each tool has a write_params() method to write the current settings to disk and a read_params() method to read in values from a parameter file.
>>> !pset dmjoin interp=clo >>> dmjoin.read_params() >>> print(dmjoin.interpolate) closest
The module provides a set_pfiles() routine which will adjust the PFILES environment used by the tools; this is mostly important for scripts that run multiple tools - such as specextract - or for those tools that need to refer to multiple parameter files - for instance mkinstmap uses the ardlib parameter file.
An example
In the following we use the dmhistory tool to store the settings used to create an instrument map into the parameter file of mkinstmap, read these settings into the mkinstmap tool, adjust a few of them, run the tool and print out some simple run-time details:
>>> dmhistory("imap7.fits", "mkinstmap", action="pset") >>> mkinstmap.read_params() >>> print(mkinstmap) Parameters for mkinstmap: Required parameters: outfile = imap7.fits Output File Name spectrumfile = Energy Spectrum File (see docs) monoenergy = 1.5 Energy for mono-chromatic map [keV] pixelgrid = 1:1024:#1024,1:1024:#1024 Pixel grid specification x0:x1:#nx,y0:y1:#ny obsfile = asp7.fits[asphist] Name of fits file + extension with obs info detsubsys = ACIS-7 Detector Name grating = NONE Grating for zeroth order ARF maskfile = msk1.fits NONE, or name of ACIS window mask file pbkfile = pbk0.fits NONE, or the name of the parameter block file Optional parameters: mirror = HRMA Mirror Name dafile = CALDB NONE, CALDB, or name of ACIS dead-area calibration file ardlibparfile = ardlib.par name of ardlib parameter file geompar = geom Parameter file for Pixlib Geometry files verbose = 0 Verbosity clobber = False Overwrite existing files? >>> mkinstmap.monoen = 4 >>> mkinstmap("imap7.e4.fits") >>> d = mkinstmap.get_runtime_details() >>> print(d["delta"]) 17 seconds >>> print(d["args"][:3]) [('outfile', 'imap7.e4.fits'), ('spectrumfile', ''), ('monoenergy', '4.0')]
The get_runtime_details() method returns a dictionary which contains information on the last run of the tool. Here we show the contents of the delta and args keywords which give the run time of the tool as a string, and a list of all the arguments used to run the tool (in this case we restrict the output to the first three elements). Other fields are also present; use help <toolname>.get_runtime_details to see the full list.
To finish we run mkexpmap and then dmstat on its output, inspecting the parameter settings of the tool.
>>> mkexpmap.punlearn() >>> mkexpmap.xy = "3530.5:4973.5:1,3094.5:4537.5:1" >>> mkexpmap("asp7.fits", "emap7.e4.fits", "imap7.e4.fits") >>> dmstat("emap7.e4.fits[sky=region(fov1.fits[ccd_id=7])]", centroid=False) EXPMAP[cm**2] min: 0 @: ( 4052 3096 ) max: 417.29486084 @: ( 4095 4056 ) mean: 349.53833266 sigma: 109.11054852 sum: 407873484.07 good: 1166892 null: 915357 >>> print(dmstat) Parameters for dmstat: Required parameters: infile = emap7.e4.fits[sky=region(fov1.fits[ccd_id=7])] Input file specification Optional parameters: centroid = False Calculate centroid if image? median = False Calculate median value? sigma = True Calculate the population standard deviation? clip = False Calculate stats using sigma clipping? nsigma = 3.0 Number of sigma to clip maxiter = 20 Maximum number of iterations verbose = 1 Verbosity level out_columns = EXPMAP Output Column Label out_min = 0 Output Minimum Value out_min_loc = 4052,3096 Output Minimum Location Value out_max = 417.29486084 Output Maximum Value out_max_loc = 4095,4056 Output Maxiumum Location Value out_mean = 349.53833266 Output Mean Value out_median = Output Median Value out_sigma = 109.11054852 Output Sigma Value out_sum = 407873484.07 Output Sum of Values out_good = 1166892 Output Number Good Values out_null = 915357 Output Number Null Values out_cnvrgd = Converged? out_cntrd_log = Output Centroid Log Value out_cntrd_phys = Output Centriod Phys Value out_sigma_cntrd = Output Sigma Centriod Value >>> print("Mean = {0} +/- {1}".format(dmstat.out_mean, dmstat.out_sigma)) Mean = 349.53833266 +/- 109.11054852
Running multiple copies of a tool
The Python multiprocessing module makes it possible to run tools in parallel, and is used to provide the multiprocessing feature in scripts like fluximage and merge_obs. Alternatively, you can run several copies of your script at the same time. In either case, you may see errors or problems because the same parameter file is being used for each run of the tool (this can lead to invalid results, as inconsistent parameter settings are used, or the tool can fail). To avoid this, use the new_pfiles_environment context manager to create a new copy of your PFILES environment. An example of its use is shown below (this uses Sherpa's parallel_map routine, which provides a simple way to use the multiprocessing module):
import ciao_contrib.runtool as rt import sherpa.utils # The asphist/mkinstmap/mkexpmap call below will use their own # PFILES directory and so will not collide with other runs of # these tools (or changes to the default ardlib parameter file # since this has been copied over to the new directory) # def getemap((ccd, gridstr)): """Create an ACIS exposure map for the ccd and grid that can be run in parallel. The routine can only accept a single argument as it is to be called via Sherpa's parallel_map routine. As we need multiple values, we therefore use a tuple. This routine is only for didactic purposes since it assumes a number of things, such as file names, that will not be true in general. """ # We copy over the ardlib so that if the bad-pixel file # has been set, its value is used in the new environment. # with rt.new_pfiles_environment(ardlib=True): afile = "asphist{0}.fits".format(ccd) ah = rt.make_tool("asphist") ah(infile="asol1.fits", outfile=afile, evtfile="evt2.fits[ccd_id={0}]".format(ccd), clobber=True) ifile = "imap{0}.fits".format(ccd) mki = rt.make_tool("mkinstmap") mki(outfile=ifile, monoenergy=1.7, pixelgrid="1:1024:1,1:1024:1", obsfile=afile+"[asphist]", detsubsys="ACIS-{0}".format(ccd), maskfile="msk1.fits", pbkfile="pbk0.fits", clobber=True) efile = "emap{0}.fits".format(ccd) mke = rt.make_tool("mkexpmap") mke(asphistfile=afile, outfile=efile, instmapfile=ifile, xygrid=gridstr, clobber=True) # Create the arguments for CCDs 0-3 and 6 grid = "2824.5:6552.5:#932,2868.5:5328.5:#615" ccds = [0, 1, 2, 3, 6] args = [(ccd,grid) for ccd in ccds] # Run getemap on all available CPUs sherpa.utils.parallel_map(getemap, args)
Note that new_pfiles_environment automatically creates the temporary parameter directory, changes the PFILES environment variable, runs your code, and then restores the PFILES variable and deletes the temporary directory for you.
Using the subprocess module
For simple cases, the call() and check_call() routines can be used, but for more complicated cases Popen() should be used.
The following examples assume that the subprocess module has been loaded using
from subprocess import *
Using call() or check_call()
These routines take a list of arguments, with the first one being the tool name and the remaining the command-line arguments, and run the tool. Any screen output will be displayed (not returned as a string), and the routines either return the exit status (call()) or raise an error if a non-zero exit status is returned (check_call).
In the following we show dmstat can be run using these routines:
>>> call(["dmstat", "img.fits", "cen-", "med+"]) PRIMARY[MJy/sr] min: 20.022872925 @: ( 588 665 ) max: 24.589748383 @: ( 299 487 ) mean: 20.123614861 median: 20.115715027 sigma: 0.072704951096 sum: 9553102.5703 good: 474721 null: 0 0 >>> rval = call(["dmstat", "img.fits", "cen+"]) PRIMARY(X, Y)[MJy/sr] min: 20.022872925 @: ( 588 665 ) max: 24.589748383 @: ( 299 487 ) cntrd[log] : ( 344.99795211 345.0132221 ) cntrd[phys]: ( 344.99795211 345.0132221 ) sigma_cntrd: ( 1708.7638456 1708.8441011 ) good: 474721 null: 0
As the tool ran successfully the call returned a 0 as shown on the last line of output from the first call. In the second we store the return value in the variable rval.
The check_call() works as above, but if the return value is not 0 then it raises an error:
>>> rval = check_call(["dmstat", "infile=img.fits", "verbose=0"]) >>> print(rval) 0 >>> rval = check_call(["dmstat", "infile=src.fits", "verbose=0"]) # DMSTAT (CIAO 4.17): could not open infile src.fits. CalledProcessError: Command '['dmstat', 'infile=src.fits', 'verbose=0']' returned non-zero exit status 1
Using Popen()
Using Popen() provides a great deal of flexibility in how you call a command-line tool: the contents of the stdout or stderr channels can be accessed rather than displaying on screen, long-running processes can be handled, and more. The module documentation for the class provides all the details; below we highlight some common use cases.
Running a tool
>>> p = Popen(["dmstat", "img.fits", "centroid=no"]) >>> PRIMARY[MJy/sr] min: 20.022872925 @: ( 588 665 ) max: 24.589748383 @: ( 299 487 ) mean: 20.123614861 sigma: 0.072704951096 sum: 9553102.5703 good: 474721 null: 0 >>> p.wait() 0
Here we start the dmstat tool and the interactive prompt appears. Then the tool finishes and creates screen output, which appears as if it is a command (the exact location of the output depends on how fast the tool is run in the background), and then we use the wait() method to ensure that the tool has finished and get its exit status.
Accessing the screen output
Setting the stdout and stderr arguments to PIPE makes it possible to access these channels, as shown below.
>>> p = Popen(["dmstat", "img.fits", "centroid=no"], stdout=PIPE, stderr=PIPE) >>> (sout, serr) = p.communicate() >>> print(sout) PRIMARY[MJy/sr] min: 20.022872925 @: ( 588 665 ) max: 24.589748383 @: ( 299 487 ) mean: 20.123614861 sigma: 0.072704951096 sum: 9553102.5703 good: 474721 null: 0 >>> print(serr == '') True >>> print(p.returncode) 0
The communicate() returns the data from the stdout and stderr channels and waits for the program to end. In this case there was no output to stderr so the empty string was returned.
If the stderr argument was set to STDOUT instead of PIPE, the first value returned bycommunicate() would contain all the screen output from the tool, and the second argument would be None.