MultipleThresholds

From Zenoss Wiki
This is the approved revision of this page, as well as being the most recent.
Jump to: navigation, search

Requirement

Have a performance template with several thresholds with different values generating Warning, Error and Critical events. Only one event should be generated and open at any given time. The example here demonstrates a Command-based template but the same applies for any performance template.

Solution

Use the MinMax threshold technique for “falls within a pre-defined range” - see Zenoss Core 4 Administrators Guide, chapter 6, page 71 (search on MinMax) - http://wiki.zenoss.org/Zenoss_Core_4.2.x . I find this a bit counter-intuitive.

As an example, we want a Warning on a value between 10 and 20, an Error between 21 and 30 and a Critical if the value is over 30. The trick with ranges is to set the Minimum threshold to the highest value within the bad range and the Maximum threshold to the lowest value within the bad range - see screenshot below.


Multiple thresholds screenshot1.jpg


These thresholds are all setup on the test1.sizevar data point. The next trick is to ensure that the events generated by the thresholds (not the command template itself, but the threshold) are different. The /Cmd event class is standard. I have created event subclasses for /Cmd/Warning, /Cmd/Error and /Cmd/Critical and used each for the appropriate threshold. The trick here is that the auto-clear fingerprint (the way Zenoss determines whether one event clears another) is:

  • If component UUID exists:
    • component UUID
    • eventClass
    • eventKey (can be blank)
  • If component UUID does not exist:
    • device
    • component (can be blank)
    • eventClass
    • eventKey (can be blank)

Either way, eventClass and eventKey are significant (note that prior to Zenoss 4 eventKey was not part of the auto-clear fingerprint, neither was component UUID).


Multiple thresholds screenshot2.jpg


This way, only one threshold is true at any one time. If a value gets “worse” (say 19 to 22) then it ceases to meet the previous threshold; 22 now triggers the size21To30 threshold but the threshold with Min=20 and Max=10 is no longer met so this event is Cleared. Different event Classes are necessary otherwise any threshold will clear all others with a matching device, component and eventKey.


Multiple thresholds screenshot3.jpg


I have a very small test script to use in a command template, test1.sh. It simply echoes values for timevar, sizevar, percentvar and countervar. Make sure this is executable and use it as the command to run in a command template. Ensure you create data points that match these variables exactly (you can leave them all as GAUGES). Set the Cycle time to 60s for testing.


Multiple thresholds screenshot4.jpg


Make sure that you bind this template to a test device. Simply edit the file and change the values to test.

test1.sh

#!/bin/sh
 
# Nagios return codes
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
#
exitstatus=$STATE_OK
 
# Nagios format echos information and status followed by pipe |
#  followed by <var name>=<value> tuples
# Note that Zenoss datapoints must match these var names exactly
#
echo "This is a test - status OK | timevar=1s sizevar=22B percentvar=10% countervar=123c"
exit $exitstatus