NAME

task_schedule.pl - Run a set of tasks at predetermined intervals ala crontab.


SYNOPSIS

task_schedule.pl -config <config_file> [options]


OPTIONS

-config <config_file>

This option is mandatory and gives the name of a file containing the task scheduler configuration. This file specifies the jobs to be run, email addresses for alerts, and all other program options. The test config file (t/data/test.config) has further documentation.

-loud

Show exactly what task_schedule is doing

-no-email

Print error alerts but do not actually send emails.

-fast <time>

For development and testing, ignore task cron specifications and instead run each job every <time> seconds.

-iterations <number>

Run tasks a maximum of <number> iterations. The default is no limit, but another typical value is 1, in which case task_schedule will run the task once and quit. This makes sense for tasks that run infrequently.

--help

print this usage and exit.


DESCRIPTION

Task_schedule is a tool to run various jobs at regular intervals in the manner of crontab. This tool runs jobs, captures job output in log files and make sure jobs finish in a timely manner. Alerts are sent to an email list if any severe errors occur.

The tool can be run interactively for development purposes, but in a production environment it is intended to be run as a regular cron job which is scheduled every minute. If task_schedule detects that an instance is already successfully running then it simply quits. This ensures maximum reliability in case of temporary hardware issues or reboots.

To detect another instance of running jobs, task_schedule looks at a "heartbeat" file that is touched every minute. If the file is older than a specified age (typically 200 seconds) then the jobs are not running and task_schedule begins. Note that the heartbeat file is specific to the particular configuration, so there can be multiple sets of jobs running as long as none of the files collide.

Task_schedule can be shut down gracefully by deleting its heartbeat file. (If you think suddenly finding yourself without a heartbeat could be graceful).


EXAMPLE

 /proj/sot/ska/bin/task_schedule.pl -config my_task.config


EXAMPLE CONFIG FILE

The example config file below illustrates all the available configuration options.

 loud              0                  # Run loudly
 subject           task_schedule: task  # Subject line of emails
 email             1                  # Set to 0 to disable emails (alert, notify)
 timeout           1000               # Default tool timeout
 heartbeat_timeout 120                # Maximum age of heartbeat file (seconds)
 iterations        0                  # Maximum task iterations.  Zero => no limit.
 master_log        watch_cron.log     # Master log (from all tasks) if checking is enabled
 
 # Data files and directories.  The *_dir vars can have $ENV{} vars which
 # get interpolated.  The '/task' would be replaced by the actual task name.
 data_dir       $ENV{SKA_DATA}/task          # Data file directory
 log_dir        $ENV{SKA_DATA}/task/logs     # Log file directory
 bin_dir        $ENV{SKA_SHARE}/task         # Bin dir (optional, see task def'n)
 heartbeat      task_sched_heartbeat         # File to ensure sched. running (in data_dir)
 heart_attack   task_sched_heart_attack      # File to kill task_schedule nicely
 disable_alerts task_sched_disable_alerts    # File to stop alerts from being sent
 disable_alerts 0                            # If set to a false value then never disable alerts
 ## Master File that will kill all $ENV{SKA} task_schedules nicely 
 ## (don't change this in the local config unless you really know what you are doing)
 # master_heart_attack $ENV{SKA_DATA}/task_schedule/master_heart_attack
 ## File that will prevent master_heart_attack from having an effect.  This is for
 ## some jobs (e.g. Replan Central) that should generally just keep on trying
 ## because they don't update corruptable data products.
 # no_master_heart_attack task_sched_no_master_heart_attack
 # Email addresses that receive an alert if there was a severe error in
 # running jobs (i.e. couldn't start jobs or couldn't open log file).
 # Processing errors *within* the jobs are caught with watch_cron_logs
 alert       first_person@head.cfa.harvard.edu
 alert       another_person@head.cfa.harvard.edu
 # Email addresses that receive notification that task ran.  This
 # will be sent once per task cron interval, so this list should
 # probably be left empty for tasks running every minute!
 notify      first_person@head.cfa.harvard.edu
 notify      another_person@head.cfa.harvard.edu
 # Optional message to include in the notification email
 notify_msg <<NOTIFY
  Please see the web page to check on the weather:
  http://weather.yahoo.com/forecast/USNH0169_f.html
 NOTIFY
 # Define task parameters
 #  cron: Job repetition specification ala crontab.  Defaults to '* * * * *'
 #  check_cron: Crontab specification of log (processing) checks via watch_cron_logs.
 #        Defaults to '0 0 * * *'.  
 #  exec: Name of executable.  Can have $ENV{} vars which get interpolated.  
 #        If bin_dir is defined then bin_dir is prepended to non-absolute exec names.
 #  log: Name of log.  Can have $ENV{} vars which get interpolated.
 #        If log is set to '' then no log file will be created (not recommended)
 #        If log is not defined it is set to <task_name>.log.
 #        If log_dir is defined then log_dir is prepended to non-absolute log names.
 #  timeout: Maximum time (seconds) for job before timing out
 #  check: Specify reg-ex's to watch for in output from task1.  This is done
 #         with a call to watch_cron_logs based on this definition.  Flagged
 #         errors are sent to the alert list.  If no <check ..> 
 #         parameter is given then no checking is done.  See
 #         watch_cron_logs doc for more info.  If an alert is sent then further
 #         alerts are disabled until the file specified by disable_alerts
 #         in the data directory is removed.
 # Typical task setup to run something called task1.pl with argument 20
 # and a few checks for error/warning messages.  The '*' glob for the
 # check file means to look in any file in the log directory.
 # This example runs every minute and checks the output logs once a day
 # at 1am.  At that time the log output is archived in daily.? directories.
  <task task1>
        cron       * * * * *
        check_cron 0 1 * * *
        exec task1.pl 20
        timeout 15
        <check>
           <error>
             #    File          Reg. Expression (case insensitive)
             #  ----------      ---------------------------
                *               use of uninitialized value
                *               warning
                *               (?<!Program caused arithmetic )error
                *               fatal
           </error>
       </check>
  </task>
 # This has multiple jobs which get run in specified order
 # Note the syntax 'exec <number> : cmd', which means that the given command is
 # executed only once for each <number> of times the task is executed.  In the
 # example below, the commands are done once each 1, 2, and 4 minutes, respectively.
 # The 'context 1' enables print context information in the log file
 # which includes the name and a timestamp for each output.
 <task task2>
       cron * * * * *
       log  task2_with_nonstandard.log
       exec task1.pl 1
       exec 2 : $ENV{SKA_BIN}/task1.pl 2
       exec 4 : task1.pl 3
       timeout 100
       context 1
 </task>
  
=head1 AUTHOR

Tom Aldcroft (taldcroft@cfa.harvard.edu) Copyright 2004-2006 Smithsonian Astrophysical Observatory