task_schedule.pl - Run a set of tasks at predetermined intervals ala crontab.
task_schedule.pl -config <config_file> [options]
This option is mandatory and gives the name of a file containing the task scheduler configuration. This file specifies the jobs to be run, email addresses for alerts, and all other program options. The test config file (t/data/test.config) has further documentation.
Show exactly what task_schedule is doing
Print error alerts but do not actually send emails.
For development and testing, ignore task cron specifications and instead run each job every <time> seconds.
Run tasks a maximum of <number> iterations. The default is no limit, but another typical value is 1, in which case task_schedule will run the task once and quit. This makes sense for tasks that run infrequently.
print this usage and exit.
Task_schedule is a tool to run various jobs at regular intervals in the manner of crontab. This tool runs jobs, captures job output in log files and make sure jobs finish in a timely manner. Alerts are sent to an email list if any severe errors occur.
The tool can be run interactively for development purposes, but in a production environment it is intended to be run as a regular cron job which is scheduled every minute. If task_schedule detects that an instance is already successfully running then it simply quits. This ensures maximum reliability in case of temporary hardware issues or reboots.
To detect another instance of running jobs, task_schedule looks at a "heartbeat" file that is touched every minute. If the file is older than a specified age (typically 200 seconds) then the jobs are not running and task_schedule begins. Note that the heartbeat file is specific to the particular configuration, so there can be multiple sets of jobs running as long as none of the files collide.
Task_schedule can be shut down gracefully by deleting its heartbeat file. (If you think suddenly finding yourself without a heartbeat could be graceful).
/proj/sot/ska/bin/task_schedule.pl -config my_task.config
The example config file below illustrates all the available configuration options.
loud 0 # Run loudly subject task_schedule: task # Subject line of emails email 1 # Set to 0 to disable emails (alert, notify) timeout 1000 # Default tool timeout heartbeat_timeout 120 # Maximum age of heartbeat file (seconds) iterations 0 # Maximum task iterations. Zero => no limit. master_log watch_cron.log # Master log (from all tasks) if checking is enabled # Data files and directories. The *_dir vars can have $ENV{} vars which # get interpolated. The '/task' would be replaced by the actual task name.
data_dir $ENV{SKA_DATA}/task # Data file directory log_dir $ENV{SKA_DATA}/task/logs # Log file directory bin_dir $ENV{SKA_SHARE}/task # Bin dir (optional, see task def'n) heartbeat task_sched_heartbeat # File to ensure sched. running (in data_dir) heart_attack task_sched_heart_attack # File to kill task_schedule nicely disable_alerts task_sched_disable_alerts # File to stop alerts from being sent disable_alerts 0 # If set to a false value then never disable alerts
## Master File that will kill all $ENV{SKA} task_schedules nicely ## (don't change this in the local config unless you really know what you are doing) # master_heart_attack $ENV{SKA_DATA}/task_schedule/master_heart_attack
## File that will prevent master_heart_attack from having an effect. This is for ## some jobs (e.g. Replan Central) that should generally just keep on trying ## because they don't update corruptable data products. # no_master_heart_attack task_sched_no_master_heart_attack
# Email addresses that receive an alert if there was a severe error in # running jobs (i.e. couldn't start jobs or couldn't open log file). # Processing errors *within* the jobs are caught with watch_cron_logs
alert first_person@head.cfa.harvard.edu alert another_person@head.cfa.harvard.edu
# Email addresses that receive notification that task ran. This # will be sent once per task cron interval, so this list should # probably be left empty for tasks running every minute!
notify first_person@head.cfa.harvard.edu notify another_person@head.cfa.harvard.edu
# Optional message to include in the notification email notify_msg <<NOTIFY Please see the web page to check on the weather: http://weather.yahoo.com/forecast/USNH0169_f.html NOTIFY
# Define task parameters # cron: Job repetition specification ala crontab. Defaults to '* * * * *' # check_cron: Crontab specification of log (processing) checks via watch_cron_logs. # Defaults to '0 0 * * *'. # exec: Name of executable. Can have $ENV{} vars which get interpolated. # If bin_dir is defined then bin_dir is prepended to non-absolute exec names. # log: Name of log. Can have $ENV{} vars which get interpolated. # If log is set to '' then no log file will be created (not recommended) # If log is not defined it is set to <task_name>.log. # If log_dir is defined then log_dir is prepended to non-absolute log names. # timeout: Maximum time (seconds) for job before timing out # check: Specify reg-ex's to watch for in output from task1. This is done # with a call to watch_cron_logs based on this definition. Flagged # errors are sent to the alert list. If no <check ..> # parameter is given then no checking is done. See # watch_cron_logs doc for more info. If an alert is sent then further # alerts are disabled until the file specified by disable_alerts # in the data directory is removed.
# Typical task setup to run something called task1.pl with argument 20 # and a few checks for error/warning messages. The '*' glob for the # check file means to look in any file in the log directory. # This example runs every minute and checks the output logs once a day # at 1am. At that time the log output is archived in daily.? directories.
<task task1> cron * * * * * check_cron 0 1 * * * exec task1.pl 20 timeout 15 <check> <error> # File Reg. Expression (case insensitive) # ---------- --------------------------- * use of uninitialized value * warning * (?<!Program caused arithmetic )error * fatal </error> </check> </task>
# This has multiple jobs which get run in specified order # Note the syntax 'exec <number> : cmd', which means that the given command is # executed only once for each <number> of times the task is executed. In the # example below, the commands are done once each 1, 2, and 4 minutes, respectively. # The 'context 1' enables print context information in the log file # which includes the name and a timestamp for each output.
<task task2> cron * * * * * log task2_with_nonstandard.log exec task1.pl 1 exec 2 : $ENV{SKA_BIN}/task1.pl 2 exec 4 : task1.pl 3 timeout 100 context 1 </task> =head1 AUTHOR
Tom Aldcroft (taldcroft@cfa.harvard.edu) Copyright 2004-2006 Smithsonian Astrophysical Observatory