x

ACIS Real Time Web Page Operation




Machines are located in different buildings to minimize the risk of multiple failures because one building loses it's power or network connection.

Each machine is hosted on a different web server, so that the failure of one server does not affect other instantiations of the R/T web pages. acisway and ishmael have their own web server on the machines themselves.

The web server assignments are:

  1. acis60-v
    URL: hea-www.cfa.harvard.edu/~acisweb/htdocs/acis/RT-ACIS60-V
    SERVER DIRECTORY: /data/wdocs/acisweb/htdocs/acis/RT-ACIS60-V/


  2. aciscdp-v
    URL: cxc.cfa.harvard.edu/acis/RT-ACISCDP-V
    SERVER DIRECTORY: /proj/web-cxc-dmz/htdocs/acis/RT-ACISCDP-V/


  3. acisway
    URL: https://acisway.cfa.harvard.edu/acis/RT-ACISWAY/acis-mean.html
    SERVER DIRECTORY: /var/www/html/acis/RT-ACISWAY


  4. ishmael
    URL: https://ishmael.cfa.harvard.edu/acis/RT-ISHMAEL/acis-mean.html
    SERVER DIRECTORY: /var/www/html/acis/RT-ISHMAEL


MSID data plots available on the web pages come from EHS Data Dumps. Keep in mind that only specific machines have access to the data dumps. R2D2-v and colossus-v are two of them. So when the web page software, on any machine, wants to generate plots from dump data, the data must be extracted on one of those few machines that have access to the dump data. This is a dependency that cannot be gotten around.

After logging onto R2D2-V, you can see the most recent dump data in the directory:

/dsops/GOT/input

It is for this reason that dump data is extracted by a process on R2D2-v. If you SSH onto R2D2-v, as acisdude and execute a crontab -l, you will find this:

r2d2-v.acisdude:~[101]> crontab -l 1 * * * * /export/acis-flight/acis/bin/PLOTTERS/ENGPLOTTER/CODE/CreatePlotterTracelog.csh 2>&1



For the rest of this document, whenever operations are carried out on a particular machine (acis60-v, aciscdp-v, acisway or ishmael), it is a given that you have:



The heartbeat emails, are periodically sent from the web pages just to give you a warm fuzzy feeling that they are still operating, and that the email system is operative. The emails start at 9pm and the last one arrives at 8am the next morning. This is to let the on-call person know that there was not a problem with the network or emails during the usual bedtime hours. Should you not get those emails, then it's time for the on-call person to investigate the situation.

There are at least 2 reasons why you have not gotten the emails:


With regard to a problem with the emails, all R/T Web page email action ocurs through the HEAD server. In the past, the HEAD server has failed and ACISDUDE did not received any emails. Since then, Systems have taken steps to make the HEAD server system more robust. Still, be aware that the email system can fail. You can test this by sending an email to acisdude or acisweb.

Should you find that the HEAD email server is broken, you can always sent emails directly to the acis dudes by using their individual email addresses.

Also keep in mind that emails to individual's email address will get through even if the HEAD server is down.

Back To Top




Starting Real-Time Monitoring

The Real Time Web pages all run off of CRON jobs. The location of the latest and greatest operational cron file is:

/export/acis-flight/LINUX_crontab_static.txt

on each of the 4 virtual machines. Each CRON file is slightly different.


Here are the basic steps necessary to start up a Real Time Web page. This example uses acis60-v as the target machine but the steps are the same for any of the four:


Within 5 minutes, you will receive CRON email informing you that ACORN is up and running.

NOTE: If a computer is rebooted, the cron jobs will start up automatically. You needn't start them.

Added step when the Operating System is upgraded

When the computer's operating system has been updated, acorn - the real time telemetry decomm tool - will not run. This is because it attempts to write a pid into a file located at: /var/tmp/acisweb.pid, and that file does not exist after the system has been upgraded.

The steps you must take once the computer has been rebooted are:
  1. Log onto the computer as acisweb
  2. cd to: /var/tmp
  3. At the prompt, type: touch acisweb.pid

NOTE: Once acorn is running, it's very likely that there will be RED ALERTS sent out to the ACIS team because some MSID's will be in the red. This is a FALSE alert. The values displayed on the web pages are averages of multiple readings. When the system is rebooted the readings are wiped out and you can get bogus red vales for one or more MSID's.

So be ready to send out an instant FALSE ALARM email.

Copying Up-To-Date Data Files Prior to Startup

It is not absolutely neccesary to update some or all of the data files - the Web pages will operate with stale data. However information will not be up to date until the orbit ends ( e.g. Flux, Fluence) and until the next Comm is complete (e.g. voltages, temperatures).

There is a deeper discussion of the directory structure in a section below. However for the purposes of startup, you need to work with 2 directories:

/export/acis-flight/FLU-MON
/export/acis-flight/acis/bin

What you should do is cd into each of these directories, and then sftp up-to-date files (if they exist) from one of the other operating machines.

/export/acis-flight/FLU-MON

Here are the files you need to update in this directory:
  • ACE-fluence.arc
  • ACIS-fluence.arc
  • ACE-flux.dat
  • ACIS-FLUENCE.dat
  • all.dat
  • current.dat
  • cxodirect.dat
  • falert.dat
  • fluace.dat
  • protons.dat
  • DITHHIST.dat
  • FPHIST.dat
  • FPHIST-2001.dat
  • GRATHIST-2001.dat
  • OBSHIST.dat
  • TLMHIST.dat
  • TSCHIST.dat


"gephem.dat" is vital but it is created by a CRON task.


/export/acis-flight/acis/bin

The files you need to update in this directory are all the .tl files which are usually generated by ACORN. Chances are, when you are trying to revive a web page, you will be between Comms. Therefore to get the latest values for temperatures and voltages, you want to sftp all the latest .tl files from another instantiation of the R/T Web pages.

The .tl files you need are:
  • 1CRBT-anomaly-27July2002.tl
  • 1crat-anomaly-27July2002.tl
  • acisEPHIN_00511039900.17.tl
  • acisFORMAT_00511207640.44.tl
  • acisHRCmcpshield_00511156813.38.tl
  • acisHWLEDs_00511207377.52.tl
  • acisISIM_00511141896.98.tl
  • acisLED1_00511039967.82.tl
  • acisLED2_00511039935.28.tl
  • acisRADMON_00511207259.14.tl
  • acisSIMFOCUS_00511207414.68.tl
  • acisSIMMOTOR_00511039876.09.tl
  • acisSWLEDs_00511207377.52.tl
  • acisda_00511207597.39.tl
  • acisdea_00511039848.41.tl
  • acisdpa_00511040025.99.tl
  • acismech_00511039935.54.tl
  • acisother_00511039784.09.tl
  • acistempa_00511141579.23.tl
  • acistempb_00511141546.68.tl

I strongly recommend that you delete any .tl files that exist on the machine you are trying to revive. Then sftp all the .tl files from one of the other, operating machine.

However, it's possible (though not likely) that no machine has the latest and greatest .tl files. In this case, the only thing you can really do is wait until the next Comm for new .tl files to be created.

If you are concerned about some engineering values (voltages, currents, temperatures etc.), and don't want to sweat it out until the next Comm, you can always run a Ska fetch on the latest data and plot the MSID(s) you are interested in.

Another alternative is to extract the latest dump data, run ACORN on the dump files, and copy the resultant .tl files into the web page(s).

Back To Top

Stopping real-time monitoring

Clearing out the Web Page is a two step process:

  1. End the CRON jobs with "crontab -r".

  2. Kill all the processes that are active.

Some of the CRON jobs will restart a process if you kill the process first. Acorn is an example of that. So be sure you eliminate the cron processes first. Then you need to stop several running processes.
These include:

acorn catnrt getnrt pmon

You need only stop those processes associated with the R/T web pages.



  • doing a 'ps -fu acisdude' will give you a full listing of the current acisdude/acisweb processes running on the machine in question.
  • This should look like:
         UID   PID  PPID   C    STIME TTY         TIME CMD
    acisdude 16334     1   0   May 27 ?           0:00 /bin/sh /export/acisops/pmon_luke/catnrt
    acisdude 16343 16334   0   May 27 ?           0:08 /usr/bin/ssh luke setenv TERM xterm ; /export/acisops/PMON/pmon -B -b -p -P /ex
    acisdude 14510     1   0   May 27 ?           1:59 /export/acisops/real-time/back-up/acorn-1.33/acorn -u 5979 -C /export/acisops/r
    acisdude 16362 16360   0   May 27 ?           0:00 tcsh -c setenv TERM xterm ; /export/acisops/PMON/pmon -B -b -p -P /export/aciso
    acisdude 16368 16362   0   May 27 ?           0:22 /export/acisops/PMON/pmon -B -b -p -P /export/acisops/PMON/pmon.eproc -h /expor
    acisdude 16344 16343   0   May 27 ?           0:35 perl /export/acisops/pmon_luke/sv-getnrt -u -O -R udp://luke:5978
    acisdude 16360 16345   0   May 27 ?           0:09 /usr/lib/ssh/sshd
    acisdude 28301 19590   0 11:57:51 pts/7       0:00 /usr/bin/ps -fu acisdude
    acisdude 19590 19588   0 09:11:44 pts/7       0:00 -tcsh
           
  • In the above example, you would want to 'kill -9' pid's: 16334, 14510, 16368, 16344
Back To Top




How to handle system hangups.

Back To Top




Are We Getting Packets?.

Suppose you are wondering if the link between one or more of the ACIS Ops R/T machines:

  1. acisway-v
  2. aciscdp-v
  3. acis60-v
  4. ishmael

...and the COG is working. Are we getting real time Telemetry packets from the COG?

Reminder: Each of those 4 machines are listening to the GOT via 2 datagram sockets: socket numbers are 15000 (R/T web page) and 16000 (PMON).

Each machine gets its own 15000 and 16000 port. So that's why you have to log onto the specific machine you want to test.

It's important to remember that only one process can access each socket at a time. So when you want to test a socket you have to shut down the process(es) presently using that socket. You must also prevent the cron jobs from trying to restart those processes. For example, this cron job:

/export/acis-flight/acis/bin/acis-acorn-check.pl

...checks every 5 minutes to see if an acorn process is running. If it is not, it activates a new process. If you want to test the socket that feeds telemetry data to the acorn process (socket number 15000), on a particular machine, then you need to kill this cron job on that machine.

The simplest way is to kill all cron tasks with:

crontab -r.

Here is how you test a socket:

  1. Log on to the machine in question as acisweb.
  2. Stop all Cron jobs by issuing "crontab -r"
  3. Kill the appropriate process(es) for the socekt number under investigation. If you want to test the feed to the R/T web pages that would be socket number 15000. So you would kill the acorn process which is running.

    If you want to test socket number 16000 then kill the PMON CRON processes on the node in question.

    To kill the process(es) you first have to ascertain their PID number. Execute a "ps -u acisdude -f"; note all R/T processes relevant to the R/T web pages and PMON (e.g. getnrt,catnrt, pmon, or acorn) and kill them (kill -9 pid).

  4. cd to /export/acis-flight/UTILITIES
  5. Execute the following command: "Port_Listener "

    For example, to test port number 15000 (15000 is the default), type:

    Port_Listener

    To test port number 16000 or any other port number, type:

    Port Listener -p 16000

  6. At the start, running on aciscdp-v, what you will see is this:
  7.     aciscdp-v.acisdude:UTILITIES[122]> Port_Listener
    
        I am:  aciscdp-v
        Port Number is:  15000
        

    If you forgot to kill the process using that port, you will get an error message which ends with this line:

               socket.error: [Errno 98] Address already in use
        

    If no data is coming across the port, you will see this:

               I am:  aciscdp-v.cfa.harvard.edu
               Port Number:  15000
    
               Opening file:  rawUDPdata15000aciscdp-v.cfa.harvard.edu81619RTcomm.1252
    
               Entering WAITING loop:  time.struct_time(tm_year=2019, tm_mon=8, tm_mday=16, tm_hour=12, tm_min=52, tm_sec=1, tm_wday=4, tm_yday=228, tm_isdst=1)
        
    ....and it will sit there like that forever until you control-C out of it.

    - Note that it will store any data it gets in a file.

  8. But if the connection to the GOT is good and data is being transferred then you will then see a series of report lines telling you the size of the packet.
               I am:  aciscdp-v.cfa.harvard.edu
               Port Number:  15000
    
               Opening file:  rawUDPdata15000aciscdp-v.cfa.harvard.edu81619RTcomm.1252
    
               Entering WAITING loop:  time.struct_time(tm_year=2019, tm_mon=8, tm_mday=16, tm_hour=12, tm_min=52, tm_sec=1, tm_wday=4, tm_yday=228, tm_isdst=1)
               Killing time - len data:  58  hour:  12  min   25  sec   15
    	   Killing time - len data:  58  hour:  12  min   25  sec   16
    	   Killing time - len data:  58  hour:  12  min   25  sec   17
               ...................
        

    Now, if we are NOT in real time Comm, the packet size will be 58. This is the heartbeat packet that the COG always sends out if we are not in Comm. So if you see these lines you know the connection is good

    
       
    And it will go on like that forever until you hit Control-C..
  9. If we ARE in Comm, you will see a packet size of 1024:
  10. 
               Acq loop, data length: len data: 1024 hour:  12  min   25  sec   15
    	   Acq loop, data length: len data: 1024 hour:  12  min   25  sec   16
    	   Acq loop, data length: len data: 1024 hour:  12  min   25  sec   17
               ...................
        
    And it will go on like that forever until you hit Control-C.

  11. If you time it carefully, you can be watching the print outs switch from a packet size of 58 to 1024 (when the telemetry stream begins); or 1024 to 58 (when the telemetry stream ends).
  12. Entering WAITING loop:  time.struct_time(tm_year=2014, tm_mon=6, tm_mday=20, tm_hour=9, tm_min=50, tm_sec=57, tm_wday=4, tm_yday=171, tm_isdst=1)
    Heartbeat Packet size:  58
    Heartbeat Packet size:  58
    Heartbeat Packet size:  1024
    Heartbeat Packet size:  1024
    ...................
       

  13. If you see either the 58 or 1024 packet size (as appropriate) then you know the link is ok. If the link is not ok, the display will simply hang. You get a new line roughly once per second.
  14. And that's all there is to it. When done be sure to control-C out of the program. AND be sure to restart the cron jobs. To do that you cd to:

    cd /export/acis-flight

    ...And then start the cron jobs:

    crontab LINUX_crontab_static.txt

Back To Top




Stale Tracelogs and What if We Are NOT Getting Packets?.

Assuming you got the email messages that the Tracelog files are stale, you have to determine why that is so.

First off, if anyone anywhere does anything with the network that supports the telemetry data streams going to the R/T Web pages and PMON, the data stream will become fouled and ACORN cannot handle that. Therefore, you must always ask George Leussis or the GOT to reset the COG telemetry streams. Send him an email with content that looks like this:

----------------------------
Hi George,

Would you please reset the telemetry streams on the following machines:

acisway
aciscdp-v
acis60-v
ishmael

All four use Port Numbers 15000 and 16000
---------------------------

He understands what needs to be done but it is the job of the On Call person to remind him. Once he resets the telemetry streams, you should get data at the next Comm and the tracelog files will be updated. If you feel like checking the packet stream then you can do so by going here:

Are We Getting Packets?


If you don't get updating .tl files, then kill the acorn process. The CRON jobs will restart acorn.



Commonality and GetUDP.pl.

In order to acheive commonality of code such that one set of code can be used on all the different computers, several factors had to fall into place:

  • The code had to run on identical machine types
  • The code had to run in the same directory structure on all three machines.
  • A means had to be found to inform the software what UDP port to use to obtain the data from ACORN and PMON.

All of the web page (and PMON) machines are Red Hat Linux virtual machines. Acis60-v, acisway, ishmael and aciscdp-v are under total ACIS Ops control.

NOTE: It is up to ACIS Ops to ask for periodic updates to the acis[60,occ,cdp]-v machines. Systems will not do it automatically.

ACIS Ops was able to build a directory structure under /export/acis-flight to contain all the R/T Web page (and PMON) programs and data.

One Perl script called GetUDP.pl is located in /export/acis-flight/UTILITIES. The output of this program is:

  1. Acorn UDP Port number
  2. Pmon UDP Port Number
  3. Base webserver directory for that virtual machine
  4. Base webserver directory for the R/T web page plot directory

As of now the last two directories are different between all machines, but the same on each machine. In other words,

"cxc.cfa.harvard.edu/acis/RT-ACISCDP-V"

is both the URL directory for aciscdp-v and the base directory for the plots for aciscdp-v. For acis60-v the two directories are:

"hea-www.cfa.harvard.edu/~acisweb/htdocs/acis/RT-ACIS60-V".

After the switch to the GOT feeds (and the end of MultiMon) all the web page machines could use the exact same UDP ports for ACORN. Thus all the machines use 15000 for ACORN and 16000 for PMON. This handy feature means that nothing need be adjusted by use to run on the BUOCC.

Identical Machines and their locations.

Same Directory structure.

The Limit Table Complexity.
Near the top of each Real Time Web page is a link called:

"Check the ACIS Limits Table"

Clicking on this link shows you the latest known limits for ACIS MSID's.

Each machine contains the text, html and xls files on that machine's web server, in that machine's URL base directory. For example, for aciscdp-v, the URL base directory is:

/proj/web-cxc-dmz/htdocs/acis/RT-ACISCDP-V

and that directory contains:

  1. limits.html
  2. limits_v1.17.html
  3. limits_v1.17.txt
  4. limits_web.htm
  5. header.html


limits.html is a soft link to limits_v1.17.html

header.html has the web server directory hard coded. So header.html cannot presently be copied across each of the R/T web page machines.

All the limits_v1.17*'s are identical across all the R/T Web Page machines

Port Selection.

All four use Port Numbers 15000 for acorn and 16000 is for PMON.

Back To Top




Directory Structure and Script Locations

acis-rtsw-tree5.gif

Back To Top



OCC Backup Web Page Operations


aciscdp-v is our backup machine. Under normal operations all data feeds come from the OCC. But under backup operations, that is not the case. The change is transparent to aciscdp-v - you needn't do anything! This magic is accomplished by the facts:

  • If the OCC went completely down, operations are moved to the CDP - where aciscdp-v is located.

  • An alternative data feed, at the CDP, is fired up which gets data from the DSN and provides it to any backup application that requires it.

  • It has been arranged that the backup feed will use the same UDP port numbers that the aciscdp-v R/T web page and PMON use under normal operations. The port number stays the same; the source of the data changes under back up operations. Therefore the change is transparent to aciscdp-v.


As usual, should the switch to BUOCC (Back Up OCC) occur, monitor the affect on aciscdp-v.

Back To Top



Annoying CRON messages that occur from time to time.


From time to time, various CRON messages appear which, are annoying, are usually out of your control and fairly inocuous. Generally they occur because of system and/or network glitches. If they occur once or twice or even three times but then stop, then you can ignore them.
If they persist then perhaps you might have to look into things. It could be that one of the machines has lost its network connection

  • Subject: Cron /export/acis-flight/acis/bin/PLOTTERS/ENGPLOTTER/CODE/CreatePlotterTracelog.csh 2>&1
    Body of the email: ERROR: Cannot copy

    - This one can be due to the fact that the acisdude area on acisdude on r2d2-v is full. Do a df -k or quota -v to see if it is. If you are maxed out, either free up some space or asks Systems for more, or both.


  • Subject: Cron /export/acis-flight/FLU-MON/ace-fluxLWP.pl 2>&1
    Body of the email: Couldn't get http://www.swpc.noaa.gov/ftpdir/lists/ace/ace_epam_5m.txt at /export/acis-flight/FLU-MON/ace-fluxLWP.pl line 53.

    - This one is due to some issue at the server end. You will find the other R/T web pages will be able to get to the web page without any trouble.
    - Also do not be misled because the example above says "acis60-v"...it can happen on any of the machines.


  • Subject: Cron /export/acis-flight/acis/bin/PLOTTERS/ENGPLOTTER/CODE/CreatePlotterTracelog.csh 2>&1
    Body of the email: rm: cannot remove `/export/acis-flight/acis/bin/PLOTTERS/ENGPLOTTER/CODE/LOGFILES/systemlog': No such file or directory

    Just ignore.





Back To Top




Reasons for the 2015 Virtual Machine Modifications

ACIS R/T web page operations can be compromised by other processes running on the virtual machine which have nothing to do with ACIS or the ACIS Real/Time (R/T) web pages. This could be because of disk mounts and/or process hangs, for example. ACIS Ops decided that three new virtual machines should be created, and that these three machines should be as isolated from other HEAD operations as is possible.

No Other operations will be executed on our 4 virtual or real machines other than the R/T web pages and a PMON instantiation. .

One of the impacts of this isolation is that a new username/account must be created which lives only on these four machines and nowhere else. This is why we do not log on to these machines as "acisdude". Were we to log on as "acisdude" a HEAD disk (which holds the acisdude home directory) would then be mounted on the new virtual machines. Any other process within HEAD that uses that disk could interfere with the ACIS R/T web pages.

Consequently, a new account - "acisweb" - was created.

Back To Top




Limits Checking and Limit File Modifications

The web page HTMl that you see when you look at an R/T web page is created by a program called:

acis-www.pl.

It's located in:

/export/acis-flight/acis/bin

Data to insert in the displays are obtained from Tracelog files produced by ACORN. These values are compared to limits specified for each MSID to determine if they are Nominal, in the Yellow Low/High range, or the Red Low/High range. The entries on the web page are color coded: white for Nominal; Red for Red ranges and yellow for yellow ranges.

In the past, the limit values were hard coded into acis-www.pl. As these limits are also used by other programs, a limits data file was created. Now acis-www.pl reads the limits for the MSIDs out of this file and uses those values to make the comparisons. The data file exists on all 4 machines. So when yuo edit one you have to sftp a copy to the others in the same location.

The files affected are:

/export/acis-flight/acis/bin
--------------------------------
acis-www.pl
AlertStatus.pl


/export/acis-flight/UTILITIES
-----------------------------------
ReadLimitsFile.pl


The limits file itself is located at:
-------------------------------------
/export/acis-flight/acis/bin/PLOTTERS/ENGPLOTTER/CODE/engplot_limits

Back To Top


Last updated: 09/16/22