ProTop's default behavior for multiple issues in the same context, like more than one QAD app being offline, is to alert for the first issue, then exit. This calls attention to the subsystem that is exhibiting a problem.
To override this default behavior, you must create an alert definition specific to each resource using an alert filter outlined below.
As a part of the QAD app mon config, we create an etc/alert.intanceName_qad.cfg file that looks something like this:
appStatus char <> "Running" "3:3" "hourly" "myApp &2 &3 (&1)" pagestatCode num <> 0 "" "hourly" "Appmon &2 &3 (&1)” alert
These alert definitions apply to all resources defined in this appmon.instanceName_qad.cfg context. The first outage gets the alert, and then the checker exits.
Make these more granular and get an alert for each line of your appmon.cfg if it is offline. Add a filter to tie each alert definition to a specific QAD app.
The filter amounts to three components added to the end of the alert definition:
1. The temp table name and the appName metric are defined in the same temp table as the metric this alert is defined for. You find the temp table name by looking for the data collector that defines it. In this case, we search the KBase for appStatus and statCode, which are in the appActivity Data collector. The first line in the table in the article shows the temp table name, in this case, tt_app. You also see appName listed here, which can be used to tie the alert definition to that app. This is the second component of the appmon.instanceName_qad.cfg definition:
47 "Gettysburg" ./bin/lincoln.sh "Running,Down"99 "Red Balloons" ./bin/99rb.sh "Running,Down"
These app names can tie an alert definition to just that resource.
2. An operator like <> or =
3. The specific app name you are alerting for.
Now we take each alert definition from above and a filter to each, like this:
statCode num <> 0 "" "hourly" "Appmon &2 &3 (&1)” alert tt_app.appName = “Gettysburg”appStatus char <> "Running" "" "hourly" "myApp &2 &3 (&1)” alarm tt_app.appName = “Gettysburg”
appStatus char <> "Running" "" "hourly" "myApp &2 &3 (&1)” alarm tt_app.appName = “Red Balloons”statCode num <> 0 "" "hourly" "Appmon &2 &3 (&1)” alert tt_app.appName = “Red Balloons”
Repeat this for each alert definition, type, and app you use. If you have only two apps and two alert definitions as we have here, you are done. You will get an alert when either or both apps are down, overriding the default behavior.
Save the alert.instance_Name.cfg, and ProTop will read it shortly.